I have a mongodb cluster with 2 data nodes and 1 arbiter.
I'm using mongodb 4.0.7 for all my centos computers.
A few days ago, one of my servers (let's call it data-2) had a fatal crash and requested a complete resynchronization of the data. After restarting mongodb on data-2, the resynchronization has started. However, immediately after, the use of RAM on data-1 (primary) began to soar.
However, on data-2, memory consumption was almost constant:
During the normal usage time, the memory consumption remains close to that of data-2 during resynchronization.
After a few hours, the worst scenario would occur and the last data holder (data-1) would get the kernel-killed OOM after using the entire ram + swap (~ 50 GB). I could restore data-1 as the principal with little effort, but every time I try to start resynchronization, the situation recurs.
This behavior does not seem to be related to the actual use of the database during resynchronization. (Taking a database prod for a resynchronization is definitely a no-go).
Total data size ~ 500GB, the largest 480GB database with ~ 300 collections.
Now my questions:
- What uses so much memory when resynchronizing?
- How to effectively analyze memory consumption?
- How can I prevent the primary from crashing due to resynchronization?