This morning we have witnessed a gradual increase in the CPU usage and load of our web server, call it host1. The overall CPU utilization has increased from about 100% to 500% (on 16 cores, 32 hyperthreads) and the average load <1 to ~ 6. It then stayed at this increased level. By using top, it is clear that the added use of the processor is entirely done by Apache (httpd process). (We are on Apache 2.4.39, on CentOS 7.6.)
PHP processes and mariadb connections are unchanged from their base levels. The data transferred over the server's network connection is unchanged. The requests served by the minute, based on the number of lines of the domlogs of the server, are unchanged. There was no increase in disk usage, waiting time or service time. The use of memory is also flat.
I've looked at the Apache error log to know the time at which the load jumped. I did not see anything new and Apache did not restart at that time.
As a test, I moved all our web traffic from host1 to an identical backup server, host2. Once the traffic migrated, the CPU utilization of the host computer1 dropped to near zero (a drop of about 400% at that time), while use on the host2 increased to only exceed 80%, about the level that the host1 was using before the sudden increase. So no matter what has changed, it has something to do with this specific server, not with the nature of our web traffic.
I've checked the Apache settings on both servers and, as expected, they are identical. Looking at the Apache state, the number of simultaneous connections seems to be about twice as much on host1 as on host2. This is what you expect given a higher usage of the charge / CPU. It takes longer to respond to requests, so we have more simultaneous connections for the same throughput. Subjectively, it seems that some requests come back in a normal time on the host1, while others take a lot more time than usual (a few seconds rather than the expected milliseconds). This could however be a cause or consequence of the higher load.
I tried to disable mod_security on host1 and I excluded it as the cause. Then I tried to restart completely host1, but after the transfer of traffic, the load remained high. Do you have any idea of what would cause Apache on a particular server to start suddenly and persistently to use more CPUs and take more time to process the same volume of requests?