Hi all, unfortunately it will be a long post but it's a bit hard to describe what's happening....
Let's start from the beginning:
- In a remote past (2 years ago) we had a physical server with imscp (if i remember well 1.4) and debian 7, apache+php-fpm, everything working great and everyone was happy
- We have a nagios check in place, it checks every minute if http / https port was responding and also collect apache server_status data (via http://localhost/server_status)
- one bad day apache stopped responding (no reply on port 80 and trying to query server_status browser remaing in "making an http connection to 127.0.0.1" undefinitely. Nothing in apache, php and system log. So I decided to restart apache service -> everything was great again!
- then this thing started running more frequently, sometimes I had to restart apache, sometimes php-fpm, etc etc Looking inside I started discovering that apache (before the stop) had lot of child "logging", or in "sending reply", and quickly it saturate all child available.
- I started understanding slowloris attack and tuned the system mitigating it (mod_evasive, keepalive, etc)
- In any case the problem continuosly represent, at the end we thought that problem was caused by a particular site that we added, we moved to another site and problem disappeared for a lot of time!
- some months later it started happening again :(, and in some weeks we move to a new server....
- It is a completely different server, different hardware (with a virtualized system), debian 8 (at that time) and last version of imscp.
- for first months everything was ok, then it started happening again......
I'm completely afflicted, now the setup is: debian 9 + apache + php-fpm + imscp 1.5.3, but it still suddenly stop
- I tryied tuning everything in apache, testing ram, testing hard disk, tryied tuning sysctl parameter for tcp connection... nothing worked.
The problem randomly simply happen:
- If i connect strace to apache PIDs they output nothing, simply the process is waiting for????
- on server_status no evidence, no growth of child, from one minute to another it stopped answerying query
- if I restart apache everything started working again
- sometime I've to restart also php-fpm
in apache error.log nothing at the time of the event, sometimes I've this error:
[core:notice] [pid 27873:tid 140018606080064] AH00052: child pid 5756 exit signal Segmentation fault (11)
are this in some way connected?
I thought to migrate to apache ITK, but, unfortunately, we had more than one PHP version and so I can't use that MPM.
I'm really frustrated because I've no notice in any error of what's happening and my idea are exhausted, so I'm kindly asking if someone can help me or can point me in some direction.
Thank you very much!