memory – Slow Death by Swap despite physical RAM not being used

We have a problem where we are seeing a very slow increase in swap usage over a number of days, despite physical RAM only being around 30% utilised. This is on a CentOS 7.6 host with 12GB RAM allocated, and 30 CPU.

We have Splunk software running on the node – only carrying out search head capabilities and each time we have observed OOMKiller invoked it is against the Splunk service, again despite less than 50% of the host physical RAM being used.

We recently reduced the swappiness value from 60 to 10, and whilst this has certainly gone some way to helping (slower increase), we are still seeing the usage slowly creep up again.

Looking at the following graph, we can see at the top the amount of swap free on 3 nodes suffering the same problem (all operating as Splunk Search Heads), then at the bottom the amount of actual physical RAM usage (between 25%-45% on each of the same nodes).

Swap Free (Top) vs Used RAM (bottom)

If I monitor one of the systems using a vmstat 1 then I see hardly any swap usage. In fact looking at the last 5 mins, I have seen just three entries where there was swap out activity and even then it was small amounts, again despite physical RAM utilisation appearing fairly static

vmstat 1

Can anyone offer their thoughts on the cause of this please? It’s not something I’ve seen in over 10 years of working on multiple UNIX platforms!