Fixing RRD Database Spikes

I’ve run into this a few times but so intermittently that I forget the quick fix when it pops up again so here’s the quick solution so I don’t forget again.

The issue is that once in a while some performance statistic spikes so far beyond the average that the graphs become unreadable. Since all it takes is one measurement to create this problem I find it’s worth going into the database and manually changing the spiked entry so the graphs are ok again. Here’s the procedure:

Backup your RRD database.
# cp nginx.rrd nginx.rrd-20120605
Dump the database to XML format.
# rrdtool dump.xml nginx.rrd > dump
Edit the XML file with a text editor.
# vi dump.xml
Then search in the file for the irregular spike. A slow but effective way is to search for e+0[1-9] which looks for any positive exponent. This works if most of the stats collected are negative exponents. Once you’ve found the culprits change them to whatever the average is by looking at the lines above and below. Save the file and then

Delete the old RRD database file.
# rm nginx.rrd
Restore the modified XML back to RRD format.
# rrdtool restore dump.xml nginx.rrd
Backup your old graphs just in case something goes wrong.
# cp *png /var/tmp
Delete the old graphs so new ones will be generated when you view your stats page.
# rm *png

Then reload the web page with your graphs.