performance – Converting an array of `Float32` (` float) to an array of `UINT8` (` unsigned char`) with AVX2

Given table of entry of Float32 (fleet) with numElements how could one efficiently convert to a table of UINT8 (unsigned character)?
The tricky part here is to apply the unsigned saturation to the conversion.

For example, here is a vanilla code for this (Warning, there is an operation of scaling):

void ConvertToUint8 (float * mO, unsigned character * mI, int numElements, float scalingFctr)
{
int ii;
for (ii = 0; ii <numElements; ii ++) {
mO[ii] = (unsigned character) (fmin (fmax (mI[ii] * scalingFctr, 0,0), 255,0));
}
}

Or mO is the exit chart.
Pay attention the above does not apply to unsigned saturation (does there exist a function for this in C?) whereas the solution should.

I need a code that uses up to AVX2 intrinsics.
The goal is to generate a faster code than vanilla example, as in Compiler Explorer – ConvertToUint8.

For simplicity, one could assume that the tables are aligned.