mysql – Which configuration to tune to best-utilize fast SSD

I am doing some testing on the MySQL performance on top of different devices, including SMR-HDD, SAS-HDD, SATA-SSD, NVMe-SSD, Optane-SSD. I want to find configurations that exclusively friendly to specific type(s) of device(s) with respect to performance.

I am using TPC-H benchmark as workload. Could you please suggest some candidate configurations to test?


3d – Fast self collision/intersection detection algorithm/library for tetrahedral meshes?

I want to play with deformation of tetrahedral mesh (soft-body simulation) but i don’t want to implement self-collision detection stuff manually. Can anyone suggest me a library for this problem? I found SOFA collision detection but i’m not sure that it fits for self-intersection of tet mesh.

If there are no good library for this problem, can anyone suggest me good algorithm for self-collision detection? As far as i can understand, something like BVH of tetrahedra can help me, but it would be great if somebody with expertise shows me right direction

performance – C++ Fast Fourier transform

This is a very simple FFT, I am wondering what I can do to make this faster and more memory efficient from the programming side (better data types, and maybe some tricks like unrolling loops or using the preprocessor if that is useful here), and not by using a more efficient mathematical algorithm. Obviously I would appreciate advice on best practices as well.

#include <stdio.h>
#include <vector>
#include <iostream>
#include <complex>
#include <cmath>
#include <algorithm>

#define N 1048576
#define PI 3.14159265358979323846

Creating the table of all N'th roots of unity.
We use notation omega_k = e^(-2 pi i / n).
template<typename U>
std::vector< std::complex<U> > rootsOfUnityCalculator() {
    std::vector< std::complex<U> > table;

    for (size_t k = 0; k < N; k++) {
        std::complex<U> kthRootOfUnity(std::cos(-2.0 * PI * k / N), std::sin(-2.0 * PI * k / N));

    return table;

Fast Fourier transform, T is the precision level, so float or double.
table is a look up table of the roots of unity. Overwrites the input.
For now only works for N a power of 2.
template<typename T>
void FFT(std::complex<T>* input, const std::vector< std::complex<T> >& table, size_t n) {

    if (n % 2 == 0) {
        // Split up the input in even and odd components
        std::complex<T>* evenComponents = new std::complex<T>(n/2);
        std::complex<T>* oddComponents = new std::complex<T>(n/2);

        for (size_t k = 0; k < n/2; k++) {
            evenComponents(k) = input(2 * k);
            oddComponents(k) = input(2 * k + 1);

        // Transform the even and odd input
        FFT(evenComponents, table, n/2);
        FFT(oddComponents, table, n/2);

        // Use the algorithm from Danielson and Lanczos
        for (size_t k = 0; k < n/2; k++) {
            std::complex<T> plusMinus = table(N / n * k) * oddComponents(k); // omega_n^k = (omega_N^(N/n))^k = omega_N^(Nk/n)
            input(k) = evenComponents(k) + plusMinus;
            input(k + n/2) = evenComponents(k) - plusMinus;

        delete() evenComponents;
        delete() oddComponents;

    } else {
        // The Fourier transform on one element does not do anything, so
        // nothing needed here.

int main() {
    std::complex<double>* input = new std::complex<double>(N);

    for (size_t k = 0; k < N; k++) {
        input(k) = k;

    const std::vector< std::complex<double> > table = rootsOfUnityCalculator<double>();

    // Overwrites the input with its Fourier transform
    FFT<double>(input, table, N);

    delete() input;

    return 0;

Fast international WP hosting for small site

I have a small wordpress website, <4k monthly visitors, serving about 2 MB per page, total 8 pages.

I have a small wordpress website, <4k monthly visitors, serving about 2 MB per page, total 8 pages.

It is a hotel website so it i…

performance tuning – Fast Evaluation of a series of dot product


I have a function that depends on 3 real variables x,y and z and that is defined by a series of matrix products. The evaluation of f for a specific (x,y,z) is fast ~0.02 sec but I want to evaluate the function on a huge number of points (a regularly spaced grid of x,y and z values) which in the end makes the evaluation really slow if not unmanageable. I have already tried what was proposed in this answer, but my function is not compilable, and ParalellTable is faster that vectorizing on my laptop.


For the sake of simplicity let me illustrate this with

weight = RandomReal(1, 200);
pts = RandomReal(1, 200);
M = RandomReal(1, {200, 200});
f(x_,y_) = (weight*Exp(-pts*x)).Exp(M).(weight*Exp(-pts*y))//N

How would one make the evaluation of f on multiple couples of (x,y) faster than relying on ParallelTable ?

ans = ParallelTable(f(x,y),{x,Range(100)},{y,Range(100)})

Thanks a lot for your help!

python – I think my program is “too fast”… Not sure how to solve this data issue without reducing speed

So I’ve got a problem. I’m reading JSON data from an API on a loop of 10 seconds, and given a specific attribute of True the program will then store a message to my server for every user with that True value. There’s also a delete loop concurrently running at 10 seconds. when the attribute flips to false the message is removed from the server. Here is an example API request:

    "data": [
            "user": "John",
            "posted": True,
            "user": "Greg",
            "posted": False,
            "user": "Mary",
            "posted": True,
    "pagination": {}

So in this case, John and Mary would both be posted. If Greg flipped to True he would be posted. If any flip to false, the message is deleted.

But here’s the problem:

Every time a post to my server occurs, a notification is then generated to multiple people. A majority of these True values stay up for hours at a time which is great – but sometimes the posted attribute flips to true and false multiple times over a minute. So Mary may be posted, removed, and re-posted. You can see why this may be annoying.

A logical solution to this would be only to remove the message after a minute, so if the attribute flipped to False and then back to True within the one minute, the message would never be deleted anyways, but this is sacrificing speed which irks my brain to think is the only solution.

Are there any other solutions anyone can think of? I considered a secondary database to cross reference posts with some sort of identifier, but this seems like I’d run into the same issue where time sacrificed.

physical – What is the benefit of 2 drive thru lanes at a fast food restaurant?

the payment and receipt of food is still done serially, which would seem to negate any benefit from orders being placed in parallel.

There is no reason to assume this is true.

The answer, from a UX perspective has to be:

The benefit for the user is getting their food faster because the restaurant can process more meals per hour, leading to shorter queues.

There are two things to consider here to understand why the basic premise of this question is probably wrong.

Simply moving from a concurrent step to a serial step does NOT negate the benefits of the earlier paralellism.

Let’s look at an analogous example from web development. We want to fetch data from two different servers (parallel processes) and then use that data to render text into a web page from top to bottom (serial process).

If it takes 3 seconds to hit each server and 1ms to render the response from a server we can obviously either take 6 seconds or 3 seconds to finish the task depending on our choice of parallel or serial for the first step ONLY.

The second step’s parallelism changes the overall situation by 1ms, making it completely irrelevant.

You see the same situation at the fast food restaurant.

Time to take an order, create and package the food ~= 5 minutes.
Time to take money and hand food through a window ~= 30 seconds.

Provided that the internal team has the facilities to prepare at least 2 orders at once, the result is:

Step 1 in parallel = 5.5 minutes total
Step 1 in serial = 10.5 minutes in total

The parallelism of step 2 doesn’t negate anything here.

A system with a series of sequential steps is constrained ONLY by the slowest step.

This is called theory of constraints. It has a wikipedia article too.

The summary is that in a system with a chain (or even multiple converging or diverging chains) of processes that need to be completed, the system moves only as quickly as the slowest step.

Attempts to optimise any step other than the slowest step have no positive impact and can even have a negative impact on overall productivity.

Imagine a traffic jam at peak hour where many lanes (A) converge into fewer lanes (B) and then diverge back to many (C). Something like this:

traffic jam - theory of constraints

It should be clear that adding additional lanes at:

A – will make traffic worse by increasing congestion ahead of the bottleneck (incidentally, this is why we use words like “bottleneck” to talk about a limiting factor in a process)
B – will increase throughput of the system
C – will have no positive or negative impact on the traffic

Your example of a fast food restaurant (or literally any other system in the world) has just one severely limiting factor at any point in time. Nothing else is worth optimising.

Think of the following stages of throughput in real space/time terms of burgers per minute per square meter, after all, this is how they pay rent (I’m just guessing rough numbers):

  • Collecting orders (~ 2 mins) from a drive thru lane (~20 m2) = 0.025 b/min/m2
  • Preparing orders (~ 3 mins) from a burger grill (~1 m2) = 0.3 b/min/m2
  • Delivering orders (~ 0.5 min) from a kiosk (~2 m2) = 1 b/min/m2

Now also consider that the nothing can be done before you place the order yet preparing and delivering orders can be done in parallel with taking new orders.

It should be clear why:

  • A very large amount of physical space shown in the OPs aerial photo is dedicated to taking orders, relative to other tasks
  • Taking orders is handled in parallel (it is the step that needs to be optimised)
  • Every other step (before and after) can be safely handled in serial (trading throughput/latency for less physical space requirements) as their throughput does not effect the throughput of the system overall

It also hints as to why perhaps having 2 lanes for orders makes sense but not 3, or 10 or 50 – this would be out of proportion for the size of the kitchen and other available facilities required to process burgers.

One potential avenue for followup, would be whether this restaurant is really optimised for consistent user experience (latency) or just total revenue per square meter (throughput). As far as sending items concurrently through parallel/sequential processes goes, they are not necessarily the same thing. It’s quite possible to optimise overall throughput in such a way that individual items may suffer additional latency (like when the OPs order is forgotten at the kiosk).