asynchronous – Asynchronously Iterable Queue Implementation for JavaScript

While studying Deno which is a kind of Node with everything normalized to modern JS and TS, I quickly met the;

import { serve } from "https://deno.land/std@0.87.0/http/server.ts";

const server = serve({ hostname: "0.0.0.0", port: 8080 });

for await (const request of server) {
  request.respond({ status: 200, body: "Whatever" });
}

Here the server object is an asynchronous iterator. Whenever a request is received it turns the for await of loop once. I believe server must be instantiated from an Asynchronously Iterable Queue type.

So i tried to implement a similar asynchronous Queue over the best synchronous Queue type in JS that i am aware of.

Obviously, in an AsyncQueue we can not have .dequeue() or .peek() functions since in this case dequeueing happens automatically by the resolution of the first promise and there is no point to peek at the first promise in a Queue of promises. Instead here we have a .next() function for dequeueing alongside with an .enqueue() function to insert new promises.

An important aspect of this AsyncQueue is that it must be able to accept new promises both sychronously and asynchronously which means it will run forever.

So the code is below and i would like to know if it can be simplified any further or perhaps there are things to enhance the performance or whatnot. Please bear with my indentation style since i will not change it. Also i like to use the Private Class Fields very much.

class AsyncQueue {

  #HEAD;
  #LAST;

  static #LINK = class {
    #RESOLVER;

    constructor(promise,resolver){
      this.promise = new Promise(resolve => ( this.#RESOLVER = resolve
                                            , promise.then(value => resolver({value, done: false}))
                                            ));
      this.next    = void 0;
    }

    get resolver(){
      return this.#RESOLVER;
    }
  }

  constructor(){
    new Promise(resolve => this.#HEAD = new AsyncQueue.#LINK(Promise.resolve(null),resolve));
  };

  enqueue(promise){
    this.#LAST = this.#LAST ? this.#LAST.next = new AsyncQueue.#LINK(promise,this.#LAST.resolver)
                            : this.#HEAD      = new AsyncQueue.#LINK(promise,this.#HEAD.resolver)
  }

  next(){
    var promise = this.#HEAD.promise.then(value => ({value, done: false}));
    this.#HEAD.next ? this.#HEAD = this.#HEAD.next
                    : this.#LAST = void 0;
    return promise;
  };

  (Symbol.asyncIterator)() {
    return this;
  };
};


var aq = new AsyncQueue();

async function getAsyncValues(){
  for await (let item of aq){
    console.log(`The Promise resolved with a value of ${item.value}`);
  };
};


getAsyncValues();
// synchronous insertion of promises
for (var i=1; i <= 5; i++) aq.enqueue(new Promise(resolve => setTimeout(resolve, 1000*i, `done at ${1000*i}ms`)));
// asynchronous insertion od a promise
setTimeout(aq.enqueue.bind(aq),7500,Promise.resolve("this is an asynchronnous insertion"));

object oriented – A Tiny Image Tagger Implementation in C#

I am trying to implement a tiny image tagger with customized tag options in C#. The main window is as follows.

MainUIFigure

The left block is a picture box MainPictureBox and there are two buttons, PreviousButton and NextButton. Furthermore, there are some tags in a group box AttributeGroupBox. A save button SaveButton is used for saving tagged information.

The implemented functions:

  • Load images automatically in the folder which the program located.

  • Each image could be tagged.

  • The tag information of each image could be saved as a text file LabelResult.txt.

The experimental implementation

class Attribute
{
    public enum AttributeEnum
    {
        Mountain,
        Cat
    }

    public AttributeEnum attributeEnum;

    public override string ToString()
    {
        if (this.attributeEnum.Equals(AttributeEnum.Cat))
        {
            return "Cat";
        }
        if (this.attributeEnum.Equals(AttributeEnum.Mountain))
        {
            return "Mountain";
        }
        return "";
    }
}
public partial class Form1 : Form
{
    List<string> ImagePaths;
    List<Attribute> AttributeList;
    int index = 0;
    string targetDirectory = "./";
    public Form1()
    {
        InitializeComponent();

        
        System.IO.FileSystemWatcher watcher = new FileSystemWatcher()
        {
            Path = targetDirectory,
            Filter = "*.jpg | *.jpeg| *.bmp | *.png"
        };
        // Add event handlers for all events you want to handle
        watcher.Created += new FileSystemEventHandler(OnChanged);
        // Activate the watcher
        watcher.EnableRaisingEvents = true;

        ImagePaths = new List<string>();
        
        string() fileEntries = System.IO.Directory.GetFiles(targetDirectory);
        foreach (string fileName in fileEntries)
        {
            string filenameExt = System.IO.Path.GetExtension(fileName);
            if (filenameExt.Equals(".jpg") ||
                filenameExt.Equals(".jpeg") ||
                filenameExt.Equals(".bmp") ||
                filenameExt.Equals(".png")
                )
            {
                ImagePaths.Add(fileName);
            }
        }
        MainPictureBox.SizeMode = PictureBoxSizeMode.Zoom;
        MainPictureBox.Image = Image.FromFile(ImagePaths(0));

        AttributeList = new List<Attribute>();
        
    }

    private void OnChanged(object source, FileSystemEventArgs e)
    {
        ImagePaths.Clear();
        string() fileEntries = System.IO.Directory.GetFiles(targetDirectory);
        foreach (string fileName in fileEntries)
        {
            string filenameExt = System.IO.Path.GetExtension(fileName);
            if (filenameExt.Equals(".jpg") ||
                filenameExt.Equals(".jpeg") ||
                filenameExt.Equals(".bmp") ||
                filenameExt.Equals(".png")
                )
            {
                ImagePaths.Add(fileName);
            }
        }
    }

    private void PreviousButton_Click(object sender, EventArgs e)
    {
        index--;
        if (index <= 0)
        {
            index = 0;
        }
        MainPictureBox.Image = Image.FromFile(ImagePaths(index));
        GC.Collect();
    }

    private void NextButton_Click(object sender, EventArgs e)
    {
        NextAction();
    }

    private void NextAction()
    {
        index++;
        if (index >= ImagePaths.Count)
        {
            index = ImagePaths.Count - 1;
        }
        MainPictureBox.Image = Image.FromFile(ImagePaths(index));
        GC.Collect();
    }

    private void SaveButton_Click(object sender, EventArgs e)
    {
        Attribute attribute = new Attribute();
        
        if (radioButton1.Checked)
        {
            attribute.attributeEnum = Attribute.AttributeEnum.Mountain;
        }
        if (radioButton2.Checked)
        {
            attribute.attributeEnum = Attribute.AttributeEnum.Cat;
        }
        MessageBox.Show(attribute.ToString());
        AttributeList.Add(attribute);
        AttributeListToTxt();
        NextAction();
    }

    private void AttributeListToTxt()
    {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < this.AttributeList.Count; i++)
        {
            sb.Append(this.ImagePaths(i) + "t" + this.AttributeList(i).ToString() + Environment.NewLine);
        }
        File.WriteAllText("LabelResult.txt", sb.ToString());
    }
}

All suggestions are welcome.

If there is any possible improvement about:

  • Potential drawback or unnecessary overhead
  • The design of implemented methods

please let me know.

assembly – setjmp and longjmp implementation in mmix

I’ve written an implementation of setjmp and longjmp in MMIX (assuming no name mangling).
I also hand-assembled it.
Are there any mistakes anyone can spot?

    // Memory stack pointer is stored in $254.
    // jmp_buf is rO, next address after setjmp call, memory stack pointer,
    // then possibly other data (for sigsetjmp/siglongjmp).
    // rG is preserved over a longjmp call
    // (not that it should change over one, anyway)
    // You'll have to store the frame pointer to the memory stack
    // yourself.
setjmp IS @
    GET    $1,rO            // FE01000A
    STOU   $1,$0,0          // AF010000
    GET    $1,rJ            // FE010004
    STOU   $1,$0,8          // AF010008
    STOU   $254,$0,16       // AFFE0010
    SETL   $0,0             // E3000000
    POP    1,0              // F8010000
longjmp IS @
    LDOU   $254,$0,0        // 8FFE0000
    SAVE   $255,0           // FAFF0000
    GET    $1,rG            // FE000013
    // why 15? We save 13 special registers, two local registers,
    // and the number 2, as well as any global registers.
    // That's 256-rG + 16, and we add only 15 because $255 is the address
    // of the saved rGA.
    SETL   $0,271           // E300010F
    SUBU   $1,$1,$0         // 26010100
    SLU    $1,$1,3          // 39000003
    // now $255 is topmost saved register, $255+$1 is bottommost such,
    // $254 is rO after.
    SUBU   $0,$254,$1       // 2600FE01
    LDOU   $2,$255,$1       // 8E02FF01
    STOU   $2,$0,$1         // AE020001
    8ADDU  $1,$1,1          // 2D010101
    PBNZ   $1,@-12          // 5B01FFFD
    OR     $255,$0,0        // C1FF0000
    UNSAVE 0,$255           // FB0000FF
    // now we have restored rO, but not other stuff
    LDOU   $254,$0,16       // 8FFE0010
    LDOU   $0,$0,8          // 8F000008
    PUT    rJ,$0            // F6040000
    OR     $0,$1,0          // C1000100
    POP    1,0              // F8010000

The register stack was the hard part here. Everything between the SAVE and the UNSAVE inclusive is essentially just “set register stack pointer properly”; after that it takes no time at all to fix up the other registers and return.

If you have any other questions, I’m happy to explain my reasons for each tetra of that code.

beginner – gpuIncreaseOne Function Implementation in CUDA

I am trying to perform the basic operations + with CUDA for GPU computation. The function vectorIncreaseOne is the instance for the operation details and gpuIncreaseOne function is the structure for applying the operation to each element in the parameter data_for_calculation.

The experimental implementation

The experimental implementation of gpuIncreaseOne function is as below.

#include <stdio.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include <helper_cuda.h>
#include <math.h>

__global__ void CUDACalculation::vectorIncreaseOne(const long double* input, long double* output, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        if (input(i) < 255)
        {
            output(i) = input(i) + 1;
        }
    }
}

int CUDACalculation::gpuIncreaseOne(float* data_for_calculation, int size)
{
    // Error code to check return values for CUDA calls
    cudaError_t err = cudaSuccess;

    // Print the vector length to be used, and compute its size
    int numElements = size;
    size_t DataSize = numElements * sizeof(float);

    // Allocate the device input vector A
    float *d_A = NULL;
    err = cudaMalloc((void **)&d_A, DataSize);
    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector A (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Allocate the device input vector B
    float *d_B = NULL;
    err = cudaMalloc((void **)&d_B, DataSize);
    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector B (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Allocate the device output vector C
    float *d_C = NULL;
    err = cudaMalloc((void **)&d_C, DataSize);
    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector C (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Copy the host input vectors A and B in host memory to the device input vectors in
    // device memory
    err = cudaMemcpy(d_A, data_for_calculation, DataSize, cudaMemcpyHostToDevice);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Launch the Vector Add CUDA Kernel
    int threadsPerBlock = 256;
    int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
    printf("CUDA kernel launch with %d blocks of %d threadsn", blocksPerGrid, threadsPerBlock);
    vectorIncreaseOne <<<blocksPerGrid, threadsPerBlock>>>(d_A, d_C, numElements);

    err = cudaGetLastError();

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Copy the device result vector in device memory to the host result vector
    // in host memory.
    err = cudaMemcpy(data_for_calculation, d_C, DataSize, cudaMemcpyDeviceToHost);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Free device global memory
    err = cudaFree(d_A);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector A (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    err = cudaFree(d_B);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector B (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    err = cudaFree(d_C);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector C (error code %s)!n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }
    return 0;
}

Test cases

The test case for gpuIncreaseOne function is as below.

auto data_pointer = (float*)malloc(100 * sizeof(float));
for (int i = 0; i < 100; i++)
{
    data_pointer(i) = static_cast<float>(1);
}
CUDACalculation::gpuIncreaseOne(data_pointer, 100);


free(data_pointer);

All suggestions are welcome.

If there is any possible improvement about:

  • Potential drawback or unnecessary overhead
  • Error handling

please let me know.

An expirable LRU cache implementation c++?

I implemented an ExpireLRUCache class, which will clear the data when it’s times out. There are two ways to achieve this.

  1. Use a timer to clear the expire data
  2. Call the clean function in Add and Get

If I use a strut in class, and it use the template typename, how to deal with this?

I put the declaration of Node at the top and “, because it will be used when it is used. Is it reasonable?

template <typename K, typename V>
class ExpireLRUCache {
 private:
  using Timestamp = std::chrono::time_point<std::chrono::system_clock>;
  struct Node {
    K key;
    V value;
    Timestamp timestamp;
  };

 public:
  using NodePtr = std::shared_ptr<Node>;
  using NodeIter = typename std::list<NodePtr>::iterator;
  using ExpiredCallBack = std::function<void(K, V)>;
  
  // Default timeout is 3000ms.
  ExpireLRUCache(size_t max_size)
     : max_size_(max_size), time_out_(3000), expired_callback_(nullptr) {}

  ExpireLRUCache(size_t max_size, uint32_t time_out, ExpiredCallBack call_back)
     : max_size_(max_size), time_out_(time_out), expired_callback_(call_back) {}

  void Add(K key, V value);
  V Get(K key);

  size_t Size() const;

 private:
  void Expired();

  mutable std::mutex mutex_;
  std::list<NodePtr> list_;
  std::unordered_map<K, NodeIter> map_;

  size_t max_size_;
  // ms
  uint32_t time_out_;

  ExpiredCallBack expired_callback_;  
};

template <typename K, typename V>
void ExpireLRUCache<K, V>::Add(K key, V value) {
  std::lock_guard<std::mutex> lock(mutex_);
  // if full, delete oldest
  if (list_.size() >= max_size_) {
    auto oldest = list_.back();
    list_.pop_back();
    map_.erase(oldest->key);
  }

  // if exist, delete it in the list, and then add to the front
  // then overwrite in map.
  if (map_.find(key) != map_.end()) {
    NodeIter iter = map_(key);
    list_.erase(iter);
  }

  auto timestamp = std::chrono::system_clock::now();
  NodePtr node = std::make_shared<Node>(Node{key, value, timestamp});
  list_.push_front(node);
  map_(key) = list_.begin();
}

template <typename K, typename V>
V ExpireLRUCache<K, V>::Get(K key) {
  std::lock_guard<std::mutex> lock(mutex_);
  
  // Todo(zero): how to call
  Expired();

  if (map_.find(key) != map_.end()) {
    return (*map_(key))->value;
  } else {
    return V{};
  }
}

template <typename K, typename V>
void ExpireLRUCache<K, V>::Expired() {
  auto time_now = std::chrono::system_clock::now();

  while( !list_.empty() ) {
    auto oldest = list_.back();
    auto diff = std::chrono::duration_cast<std::chrono::milliseconds>(
                    time_now - oldest->timestamp);
    if (diff.count() > time_out_) {
      list_.pop_back();
      map_.erase(oldest->key);
      expired_callback_(oldest->key, oldest->value);
    } else {
      break;
    }
  }
}

template <typename K, typename V>
size_t ExpireLRUCache<K, V>::Size() const {
  std::lock_guard<std::mutex> lock(mutex_);
  return map_.size();
}
```

ArrayDowncasters Implementation for Downcasting from System.Array to Array of Specific Type in C#

There are a few alternative solutions which might work

LINQ

It is possible to use Enumerable.Cast<T>() to achieve the same results (at least for a 1D array)

(TestMethod)
public void CheckConversion()
{
    //    One Dimensional Array with int element convert to int()
    var array1 = Array.CreateInstance(typeof(int), 10);
    for (int i = 0; i < array1.Length; i++)
    {
        array1.SetValue(3, i);
    }

    var intArray1 = ArrayDowncasters.ToIntArray1(array1);
    var intArray2 = array1.Cast<int>().ToArray();
    CollectionAssert.AreEqual(intArray1, intArray2);

    //    One Dimensional Array with double element convert to double()
    var array2 = Array.CreateInstance(typeof(double), 10);
    for (int i = 0; i < array2.Length; i++)
    {
        array2.SetValue(3.1, i);
    }

    var doubleArray1 = ArrayDowncasters.ToDoubleArray1(array2);
    var doubleArray2 = array2.Cast<double>().ToArray();
    CollectionAssert.AreEqual(doubleArray1, doubleArray2);

}

Generics

The naming of the functions e.g. ToDoubleArray1() indicates that there might be 2 or 3 or n dimensional versions to be added. I don’t know if a LINQ solution can be made to work for these but the current code can be re-written as a single generic function

public static T() Convert<T>(Array input)
{
    Type elementType = input.GetType().GetElementType();
    if (input.Rank != 1)
    {
        throw new System.InvalidOperationException();
    }

    if (!elementType.Equals(typeof(T)))
    {
        throw new System.InvalidOperationException();
    }

    var output = new T(input.GetLength(0));
    for (int i = 0; i < input.GetLength(0); i++)
    {
        output(i) = (T)input.GetValue(i);
    }
    return output;
}

which means that a single generic function per dimension may suffice

(TestMethod)
public void CheckConversionGeneric()
{
    //    One Dimensional Array with int element convert to int()
    var array1 = Array.CreateInstance(typeof(int), 10);
    for (int i = 0; i < array1.Length; i++)
    {
        array1.SetValue(3, i);
    }

    var intArray1 = ArrayDowncasters.ToIntArray1(array1);
    var intArray2 = ArrayDowncasters.Convert<int>(array1);
    CollectionAssert.AreEqual(intArray1, intArray2);

    //    One Dimensional Array with double element convert to double()
    var array2 = Array.CreateInstance(typeof(double), 10);
    for (int i = 0; i < array2.Length; i++)
    {
        array2.SetValue(3.1, i);
    }

    var doubleArray1 = ArrayDowncasters.ToDoubleArray1(array2);
    var doubleArray2 = ArrayDowncasters.Convert<double>(array2);
    CollectionAssert.AreEqual(doubleArray1, doubleArray2);

}

plotting – Von Neumann Equation Density Matrix Implementation

I’m trying to implement the von Neumann Equation for a given 4×4 density Matrix with a time dependent Hamiltonian Hp[t_] in Mathematica but I get stuck.

Format[y[a__]] := Subscript[y, a]
rho[t_] := Array[x[##]

sol = NDSolve[{I*rho'
   rho[0] == rhoIni}, {rho}, {t, 0, 10}]

However I only get the output

{{rho -> rho}}

So I guess something is structurally wrong with my code. I try to extract a solution by writing

rho[t_] = rho

But this doesn’t work as there is no solution anyways.
Maybe you can help me

Thanks in advance

design – How to avoid repeating same states in a finite state machine implementation?

I have been implementing a finite state machine which basically ensures configuration of the external chip which communicates with my MCU via I2C. The skeleton of the configuration state machine looks like this:

enter image description here

Whereas the Configure Register N states are state machines in their own.
It is pretty simple state diagram but there is one complication. The chip can asynchronously generate a “fault” event which is signalized via assertion of the “fault” pin. State of the fault pin is read with the same period with which the state machine is executed. Service of the fault pin activation is based on another state machine. So the basic state diagram incorporating the fault pin activation service looks like this:

enter image description here

At first glance there are several repeating patterns in the state diagram. At first the state diagram for the register configuration and secondly the state diagram for fault pin activation handling. I would
like to exploit these patterns and avoid of repeating the same code at several places. Can anybody suggest me any solution?

architecture – Implementation doubts on ECS and DDO

I’m growing an interest in entity component systems and data-oriented design, but I’m having an hard time clearing out some doubts and filling in some of the details. I get that both ECS and DDO are really broad ideas that don’t have clear cut definitions, but I’d appreciate hints from anyone on these concepts as a starter.

  1. With DDO should memory always be allocated statically? So for example should I forecast the maximum number of entities that will be active at any time and allocate memory accordingly at the begininning of the game?
  2. Does that mean that I have to also allocate memory for each possible type of component that could be associated to an entity for the maximum number of entieties? So if my maximum number of entities is 100 and I’ve got a Position component, an Health component and a Texture component, should I allocate 100 Position components, 100 Health Components and 100 Texture Components even though it might be unlikely that all of these will ever end up being used at the same time?
  3. Is it correct to think of entities and components as big arrays which elements are associated with each other through their index? For example position[i] will belong to entity[i]. So, is i the id of an entity?
  4. How do I mark an entity in the array as active in the game loop? Of course I thought about using bools, but I heard a lot on how these types of internal states should be avoided because they cause a lot of branching and then so possible mispredictions which is not good for performance. On the same note, how do I flag if a component has been assigned to an entity? And what about “istancing” new entitities? Should I check each time if all the slots are occupied? Am I overthinking this branching problem?

Sorry if this is a lot and pretty broad, but I’m kinda lost. If you know any good resources to read/watch/study from about the basics of these concepts, I’d really appreciate if you shared those with me. Thank you for your help!

c++ – Simple read/write lock implementation without starvation

I’ve built a read/write lock and have been testing it without encountering any problems. It was made to avoid writer starvation, but I believe it works against reader starvation as well. I’ve seen alternatives online, but was wondering if this is a solid implementation.

If you use a normal shared mutex, new read actions can still be queued, which will prevent write actions from ever being processed while there is any read action present. This will cause starvation. I used a second mutex which will be locked by the write process and prevents any new read processes to be queued. Thank you!

class unique_priority_mutex
{
public:

    void lock_shared(void)
    {
        // If there is a unique operation running, wait for it to finish.
        if( this->_is_blocked ){
            // Use a shared lock to let all shared actions through as soon as the unique action finishes.
            std::shared_lock<std::shared_mutex> l(this->_unique_mutex);
        }

        // Allow for multiple shared actions, but no unique actions.
        this->_shared_mutex.lock_shared();
    }

    void unlock_shared(void)
    {
        this->_shared_mutex.unlock_shared();
    }

    void lock(void)
    {
        // Avoid other unique actions and avoid new shared actions from being queued.
        this->_unique_mutex.lock();

        // Redirect shared actions to the unique lock.
        this->_is_blocked = true;

        // Perform the unique lock.
        this->_shared_mutex.lock();
    }

    void unlock(void)
    {
        this->_shared_mutex.unlock();
        this->_is_blocked = false;
        this->_unique_mutex.unlock();
    }

    std::shared_mutex _shared_mutex;
    std::shared_mutex _unique_mutex;
    std::atomic<bool> _is_blocked = false;
};