This one was a bit of a rush job, because our servers were having load problems this morning which ate into my slide-writing time. I was late already because I worked too hard bringing up a second new ppsw machine on Monday (which entailed lots of install script bug fixing) so I had trouble concentrating yesterday afternoon.
The excitement this morning was a result of us moving the imap and pop proxy from the 5 old machines to the 2 new ones. They should have been able to cope with the load, but crapped out at 3000 concurrent connections, complaining “kernel: ldt allocation failed” and randomly failing to fork. 1600 processes per machine is rather les than the 2000 processes on our webmail server which is running fine. The cause is probably due to the rapid forking of the proxy server - the webmail server pre-forks so doesn’t chew the pid space so rapidly. Anyway, it appears that there’s a fixed-size 8192-entry table of LDT entries in the Linux 2.4 kernel which we were exhausting. We found this with help from my colleague Anton Altaparmakov, kernel hacker extraordinaire.
After the excitement the talk went reasonably well. The most amusing moment was when I got to the Stephen Hawking slide and temporarily lost my voice (I’ve had a bit of a cough recently) without a voice synthesizer to hand!