Every so often, my Linode goes into a state of apparent frantic I/O. Page loads slow down a bit, and I get regular email alerts indicating a potential problem:
Subject: Linode Alert - disk io rate Your Linode, linode90147, has exceeded the notification threshold (800) for disk io rate by averaging 2146.05 for the last 2 hours. The dashboard for this Linode is located at: ...
This is the first time this happened since I switched entirely to nginx. My first test was to install iostat/sar, to see what is going on.
apt-get install sysstat
The initial output of iostat looks like this:
avg-cpu: %user %nice %system %iowait %steal %idle 0.75 0.28 0.19 1.21 0.01 97.56 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvda 12.53 168.70 73.94 250258986 109688664 xvdb 23.49 127.81 86.97 189603528 129015512
This shows point in time output for the read/write rate, which doesn’t look nearly as high as Linode is reporting. You can do a continuous reporting by doing the following:
iostat -d 2
This showed the read/write rates running anywhere from 0 to 5500 blocks/second, about 2.8 MB/s (512 bytes/block). Some points to note: xvda is Xen Virtual Disk. Watching the usage for a while, both disks about simultaneously, but most of the writes are xvdb, which may indicate loading a lot of data from disk into memory (swap) space.
To find out which process(es) are doing the disk use, I ran the following:
pidstat -d 2 300
This takes 300 I/O samples at two second intervals (i.e. for ten minutes). It prints out each sample and an average summary. Running this, I got the following output:
Average: PID kB_rd/s kB_wr/s kB_ccwr/s Command Average: 996 0.01 2.31 0.00 kjournald Average: 1930 2.43 0.07 0.00 rsyslogd Average: 1958 0.10 0.00 0.00 atd Average: 1959 109.13 14.20 0.00 cron Average: 1971 0.26 0.00 0.00 memcached Average: 1978 47.54 0.48 0.00 mysqld Average: 2045 26.32 0.11 0.03 munin-node Average: 2131 10.21 0.01 0.00 sendmail-mta Average: 2234 2.62 0.01 0.00 ntpd Average: 2397 10.71 0.00 0.00 fail2ban-server Average: 13689 0.29 0.00 0.00 pidstat Average: 14427 0.23 0.00 0.00 cron Average: 14428 0.18 0.00 0.00 sh Average: 14431 0.01 0.00 0.00 munin-cron Average: 14432 9.95 0.01 0.00 munin-update Average: 14433 6.95 5.61 0.00 munin-update Average: 14434 10.14 0.04 0.01 munin-node Average: 14813 0.04 0.00 0.00 vmstat Average: 14814 0.15 0.00 0.00 vmstat Average: 15685 12.96 0.00 0.00 php-cgi Average: 15686 13.79 0.00 0.00 php-cgi Average: 15687 20.47 0.59 0.15 php-cgi Average: 15688 15.01 0.00 0.00 php-cgi Average: 15689 33.72 0.00 0.00 php-cgi Average: 15690 15.53 0.00 0.00 php-cgi Average: 15691 9.60 0.00 0.00 php-cgi Average: 15692 19.13 0.00 0.00 php-cgi Average: 15693 12.64 0.00 0.00 php-cgi Average: 15694 15.76 0.00 0.00 php-cgi Average: 15695 16.30 0.01 0.00 php-cgi Average: 15696 18.60 0.00 0.00 php-cgi Average: 15697 12.65 0.00 0.00 php-cgi Average: 16334 30.81 0.72 0.15 php-cgi Average: 16338 14.99 0.00 0.00 php-cgi Average: 21209 1.21 0.00 0.00 sshd Average: 31579 1.03 0.86 0.79 nginx Average: 31580 0.67 0.03 0.00 nginx Average: 31581 1.11 0.07 0.00 nginx Average: 31582 1.82 0.17 0.00 nginx
There are ares to research as an avenue for research- php, mysql, and cron. I know I added some jobs, so I tested that first. To see all available cron jobs:
for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l; done
From this output, I removed two obsolete hourly tasks I had created. For good measure, I also decreased the frequency of man-db lookups from daily to monthly, removed apache2 cleanup (no longer used) and popularity-contest. Everything remaining appears to be important to system maintenance. The following is a second performance log, after this runs. Very little has changed.
Average: 1 1.22 0.02 0.01 init Average: 996 0.03 3.51 0.00 kjournald Average: 1930 2.74 0.26 0.00 rsyslogd Average: 1959 131.20 14.24 0.00 cron Average: 1971 0.01 0.00 0.00 memcached Average: 1978 78.06 0.63 0.11 mysqld Average: 2045 27.10 0.09 0.03 munin-node Average: 2131 23.92 0.03 0.00 sendmail-mta Average: 2234 2.77 0.00 0.00 ntpd Average: 2397 13.69 0.00 0.00 fail2ban-server Average: 15685 24.56 0.01 0.00 php-cgi Average: 15686 17.49 0.00 0.00 php-cgi Average: 15687 40.37 0.00 0.00 php-cgi Average: 15688 21.09 0.01 0.00 php-cgi Average: 15689 29.78 0.00 0.00 php-cgi Average: 15690 10.94 0.00 0.00 php-cgi Average: 15691 23.94 0.00 0.00 php-cgi Average: 15692 16.95 0.01 0.00 php-cgi Average: 15693 8.02 0.01 0.00 php-cgi Average: 15694 7.47 0.01 0.00 php-cgi Average: 15695 23.33 0.00 0.00 php-cgi Average: 15696 8.47 0.01 0.00 php-cgi Average: 15697 13.05 0.01 0.00 php-cgi Average: 16334 13.13 0.01 0.00 php-cgi Average: 16338 11.27 0.00 0.00 php-cgi Average: 21209 0.60 0.00 0.00 sshd Average: 22998 0.41 0.00 0.00 pidstat Average: 31579 0.51 0.03 0.00 nginx Average: 31580 1.37 3.08 1.43 nginx Average: 31581 1.32 1.55 1.44 nginx Average: 31582 2.55 1.77 0.00 nginx
The php work is a little lower, but likely not enough to be significant. Next up: PHP. I had APC working when I was running Apache, but perhaps it’s not working now, with Nginx as the primary server.
I rebuilt APC from scratch, in case there was a newer version. The lynchpin of this was discovering multiple php.ini files on the VPS. The instructions for building APC are as follows:
wget http://pecl.php.net/package/APC tar -xzf APC-3.1.9.tgz cd APC-3.1.9 phpize ./configure --enable-apc --enable-apc-mmap --with-apxs --with-php-config=/etc/php5/cgi/php.ini make make test make install vi /etc/php5/cgi/php.ini
Add this line at the end:
extension=apc.so
Then restart phpd/php-cgi. E.g. if you installed nginx/fast_cgi as an init.d service, do something like this:
/etc/init.d/php-fastcgi restart
I re-ran the performance test. PHP activity is pretty much gone. It looks like traffic is lower at the moment as well, but apc.php shows about 80% cache hits. For memory sake, it would be nice to share WordPress installations, but this has some significant challenges (e.g. handling upgrades). For now, disk use has slowed, so I will leave mysql tuning for another day.
Average: 8 0.00 0.01 0.00 0.01 - kworker/1:0 Average: 271 0.00 0.00 0.00 0.00 - kswapd0 Average: 996 0.00 0.00 0.00 0.00 - kjournald Average: 1730 0.00 0.01 0.00 0.01 - kworker/3:1 Average: 1864 0.00 0.00 0.00 0.00 - kworker/2:1 Average: 1930 0.00 0.00 0.00 0.00 - rsyslogd Average: 1959 0.00 0.00 0.00 0.00 - cron Average: 1971 0.00 0.00 0.00 0.00 - memcached Average: 1978 0.19 0.09 0.00 0.28 - mysqld Average: 2045 0.00 0.00 0.00 0.01 - munin-node Average: 2131 0.00 0.00 0.00 0.00 - sendmail-mta Average: 2234 0.00 0.00 0.00 0.01 - ntpd Average: 2280 0.00 0.00 0.00 0.00 - flush-202:0 Average: 2397 0.02 0.00 0.00 0.02 - fail2ban-server Average: 8895 0.25 0.02 0.00 0.27 - php-cgi Average: 8896 0.23 0.03 0.00 0.26 - php-cgi Average: 8897 0.20 0.02 0.00 0.23 - php-cgi Average: 8898 3.89 0.01 0.00 3.90 - php-cgi Average: 10837 0.16 0.41 0.00 0.57 - pidstat Average: 23460 0.00 0.01 0.00 0.02 - sshd Average: 23599 0.00 0.01 0.00 0.01 - kworker/0:1 Average: 25025 0.01 0.02 0.00 0.03 - nginx Average: 25026 0.00 0.00 0.00 0.01 - nginx Average: 25027 0.00 0.00 0.00 0.01 - nginx Average: 25028 0.00 0.00 0.00 0.01 - nginx