Segfaults due to defective RAM
If you get various segfaults on your Linux machine, like these:
spamd child\[2656\]: segfault at 200251c208 ip 00007fa039223684 sp 00007fff77953680 error 4 in libperl.so.5.14.2\[7fa03916a000+177000\]
or:
clamd\[3311\]: segfault at 1000000008 ip 00007f00200b3751 sp 00007fff3e2cef60 error 4 in libclamav.so.6.1.17\[7f001fff1000+988000\]
or
php5\[14914\]: segfault at 7fff7d2939c8 ip 00000000006bf04d sp 00007fff6d293860 error 6 in php5\[400000+6f3000\]
or
PassengerHelper\[11644\]: segfault at ffffffffca4ef420 ip 0000000000492fea sp 00007f5b81e991d0 error 7 in PassengerHelperAgent\[400000+203000\]
Then no, your system is not suddenly crazy, nor are you. It simply could be that a RAM module is defective! To diagnose, we should run the RAM test from the boot manager.
If we are operating a server that we can’t reboot, there is an excellent tool called memtester
. It is a memory test for a running system. It is part of the Debian distribution, install it with apt-get install memtester
.
Check top
to see how much free RAM there is available. Say, you have 10GB of free RAM, then tell memterst
to test 8GB of it (so that 2GB remain free for the running system to operate).
In my case, memtester
indeed detected faulty RAM:
memtester 8000 3
Loop 1/3:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing 30FAILURE: 0xffffffffffffffff != 0xfffffffbffffffff at offset
0x36e77910.
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
So, when I replaced the RAM, the segfaults stopped.
You could run memtester
regularly to make sure the RAM is okay. Needless to say, healty RAM is a very crucial part of every hosting operation!
In my case however, the segfaults corrupted MySQL tables, which I had to clean up manually.