I'm running a large ETL operation on over 100,000 jsonl files. I use RabbitMQ to insert file paths into a queue, and then 25 users to transform the files and perform batch inserts in a number of Postgres tables.
I noticed that if I increased the number of workers, I could end up with less data at the end, but no error would occur.
Is it possible to "overload" the database instance and lose data?
- At the end, the largest table should have 2B lines
- The average size of the table is 250 million rows
- Use the npm package
node-postgreshandle the inserts