Using PyRRD to gather system statistics

python observability

Last week, I spent sometime benchmarking the state-of-the-art WSGI application server for SurveyMonkey Contribute. It is quite challenging to configure various WSGI app servers for apple-to-apple comparison due to the diverse concurrency paradigms. The system statistics, such as the CPU consumption and memory usage will provide another perspective of the performances. RRDtool is the go-to solution in general for system statistics collecting, retrieving and visualization. And the pyrrd and subprocess module allow us to use python other than perl to glue the rrdtool and other system utilities:

from pyrrd.rrd import DataSource, RRA, RRD
dss = [
    DataSource(dsName='cpu', dsType='GAUGE', heartbeat=4),
    DataSource(dsName='mem', dsType='GAUGE', heartbeat=4)
]
rras = [RRA(cf='AVERAGE', xff=0.5, steps=1, rows=100)]
rrd = RRD('/tmp/heartbeat.rrd', ds=dss, rra=rras, step=1)
rrd.create()

The above snippet creates /tmp/heartbeat.rrd RRD file with two data sources, cpu and mem respectively; both are defined as GAUGE type. Then we define a round-robin archive(RRA) to save up to 100 data points, sampled every step. At the end, we create a RRD file with above data configuration with 1 second sampling intervals. It is quite obvious that the pyrrd modules use the same terminology as rrdtool, thus you can leverage the existing knowledge and enjoy the convenience in the python land.

With subprocess module, we manipulate the pipe just as easy as bash and perl:

pattern = re.compile('\s+')
command = '/bin/ps --no-headers -o pcpu,pmem -p %s' % ' '.join(pids)
while True:
    ps = subprocess.check_output(command, shell=True)
    pcpu = 0.0
    pmem = 0.0
    for line in ps.split('\n'):
        if line.strip():
            cpu, mem = map(float, pattern.split(line.strip()))
            pcpu += cpu
            pmem += mem
    rrd.bufferValue(time.time(), pcpu, pmem)
    rrd.update()
    time.sleep(1)

ps did all the heavy lifting for us in the sampling phase: it printed out the %CPU and %MEM for all pids we are interested in; then the output is parsed, aggregated and dumped to the rrd file.

Please bear in mind that this is not a typical rrdtool use case: the system statistics are sampled in the real-time fashion as the benchmarking session is relative short. In the real world, the data are usually sampled in a much more coarse granularity, and consolidated in a statistic fashion. You can download the sampling script here.

I have a hard time to grasp the pyrrd.graph module though, the extra abstraction does not make things less complicated and I end up using rrdtool directly, for example:

rrdtool graph /tmp/heartbeat.png --start 1401919870 --end 1401919879 \
    DEF:cpu=/tmp/heartbeat.rrd:cpu:AVERAGE LINE2:cpu#FF0000  \
    DEF:mem=/tmp/heartbeat.rrd:mem:AVERAGE LINE:mem#ccff00