I have a PostgreSQL database that I use to store time-series (finance) data. each table contains the same table schema but has a different name based on the market pair and timeframe.
Ex. I have tables called
These tables sum up to around 9000 tables in total.
Note that I know about
InfluxDB, I already tested both and will post a reason I’m not using them at the end of this post.
So, since this is time-series data, it means that I’m only doing
INSERT write operations and very rarely some
SELECT to retrieve some data.
My issue is that the database memory usage seems to grow infinitely until I get an OOM crash. I configured my
postgresql.conf using solutions as
PGTune to a system with 1GB of RAM, 6 cores, and 120 connections and I limited my docker container to 4GB and still got an OOM after around one day with the system on.
I also tried other configs as 4GB of ram and 8GB in the container but PostgreSQL never respects the limit stipulated by the config and keeps using more and more RAM.
Is this the expected behavior? Maybe PostgreSQL has some other obscure config I can use to limit the memory usage in cases where there is a huge number of tables.. I’m not sure..
The reason I’m guessing this issue has something to do with my high number of tables is because the opened connections from my connection pool keep growing in memory usage faster at the start of my system (the first hours) and then the growth gets slower (but never stops).
That behavior reflects my
INSERT intervals when hitting the tables.
For example, a table with a timeframe
five_minutes means that every five minutes I will insert a new row to it, which means that I’m accessing these tables for the first time faster when the system starts than tables with higher timeframes as
And monitoring the memory growth, it seems that the connection process grows a little bit when it accesses a new table for the first time.
So, assuming this is right, it would mean that after some months, all the connections would have accessed all the tables at least one time and memory growth would stop. The problem with that is that I don’t know how much memory this would use at the end and it’s not ideal since trying to limit the memory via
postgresql.conf becomes meaningless.
Here is the schema for one of the tables (as I said before, all tables has the same columns, index, etc):
data_db_prod=# d+ candles_one_minute_btc_usdt
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
timestamp | timestamp without time zone | | not null | | plain | |
open | numeric | | not null | | main | |
close | numeric | | not null | | main | |
high | numeric | | not null | | main | |
low | numeric | | not null | | main | |
volume | numeric | | not null | | main | |
"candles_one_minute_btc_usdt_timestamp_desc_index" btree ("timestamp" DESC)
"candles_one_minute_btc_usdt_timestamp_index" btree ("timestamp")
Access method: heap
About other solutions
As I said before, I already tried
TimescaleDB I would be able to use a single
candles table and create 2 partitions to store the market pair and the timeframe, fixing the high number of tables and probably the RAM issue I’m having, but I cannot use this because
TimescaleDB uses too much storage, so I would need to use their compression feature, but a compressed hypertable doesn’t allow write operations, meaning that to be able to do a backfill (which I do often) I would need to basically decompress the whole database each time.
InfluxDB the issue is simply because they don’t support any
decimal type, and I cannot lose precision using
Feel free to suggest some other alternative I’m not aware of that would fit nicely into my use case if there is one.