We need to implement a solution that must store something on an order of half a million records per request and be able to query it.
It depends on the user interface, which means that it must be done within 2 to 3 seconds.
It is a relational data that will be divided into three or four tables, each record has about 20 columns and a length of 500 bytes.
Requests can follow one another quickly (say we'll have 100 hits per minute, each generating 100 to 500,000 records). Once the data is stored, we will have to query them based on a or clause involving 1-5 columns, possibly with a range of dates.
Until now, I've looked at / looked at the following options:
- MS SQL, probably with sharding (I have not tried sharding yet). It takes about 4 to 5 seconds to store 100,000 records on my machine, one table, no index. This is the default option, but it does not seem to evolve very well on concurrent requests and I'm concerned about performance as the data grows (may it be solved by sharding? ).
- MySQL. Slightly slower than MS SQL.
- RavenDB. Slower than MS SQL and probably not the best suited to the type of data and queries we have.
- Apache Ignite. Only tried locally (in the fixed menu), actually slower than MS SQL, but it's literally the first time I look at it, so it's very likely that I missed something. In theory, should evolve very well.
I've also watched Apache Spark and Cassandra, but I do not think they fit our use case.
I am pretty sure that this problem has been solved by many people. I am therefore looking for directional suggestions and a technology / database that can handle this use case.