Please take a few minutes to complete this short survey on service testing.

Three Techniques for Catching Memory Leaks in Python and How to Write Smart Code

  • 2017-02-05

Eye catching chart showing memory utilization before and after the fix.

Sometime ago I wrote a worker that periodically polls third party service for data. We started noticing that the worker process gets killed by the kernel for reaching memory limits. The container for the worker was given 512MB and that should be more than enough for the job it was doing. The amount of data it fetches can go anywhere from 25MB to a 100MB and it uses this data to sync some internal state of our systems with the data provided by the third party. I was able to find weird memory consumption patterns and refactor the code to take memory usage from ~50% to ~13% and stop getting worker process OOM killed. This post is about the tools I used to find memory problems in a Python application.

resource module

One of the first things to do is start tracking memory allocation from within the process. Python comes with resource module which lets you ask how much memory a running process is consuming.

import resource
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

This system call returns various information about resource usage, not only memory. You can read more about it and available parameters in the linux man pages. This code block be useful in adding around critical section to identify when and how much memory get allocated. It's also a useful metrics to report to your metrics aggregator to track memory usage over time.

objgraph package

The next tool under the belt is the objgraph. You can install it with pip install objgraph. Objgraph lets you explore Python object graphs. It is very useful in finding dead objects and who still references them.

import objgraph

objgraph.show_most_common_types()              # List most common object types
objgraph.show_growth()                         # Shows object change

show_growth can be used before/after a critical section to see what objects were allocated.

Heapy from Guppy

Guppy is a toolchain for memory analysis and profiling. Heapy seems to be the most commonly referenced submodule when it comes to digging into memory issues. You can install it with pip install guppy. Heapy is fairly complicated tool. There's a great tutorial on heapy that you should check out. To take a diff of your heap you can do this:

import guppy
h = guppy.hpy()

heap_snap1 = h.heap()
# Critical section
heap_snap2 = h.heap()
heap_diff = heap_snap2 - heap_snap1

Fixing Memory Usage

After digging around with the tools above, I noticed that the process allocates about 12MB of data after every fetch from the third party service. Each iteration allocates tons of unicode and string objects after parsing JSON response. This all makes sense since the strings are fairly long and each fetch contains mostly unique data. Python's string interning won't help much. The worker is a long running process that periodically receives a job, fetches big batch of data and syncs internal systems. It seem's that python's garbage collector should kick in and clean up obosolete data. Invoking garbage collector manually was of little help. It's unclear as to why new chunk of heap was getting allocated instead of reused.

I'm not a fan of refactoring code for the sake of refactoring, however this was a good case to do so.

Here's the pseodocode of doing the data sync:

for job in get_job():
    for internal_item in items_to_update():
        third_party_data = get_big_batch(
        filter_and_process(third_party_data, iternal_item)

The implementation above is really inefficient. The get_big_batch large chunk of data for some day. The response includes a bunch of unrelated information to the internal_item. As mentioned before, third_party_data can be quite large, 12MB to 100MB. The optimization was to change the code to only retrieve data related to the internal_item.

for job in get_job():
    for internal_item in items_to_update():
        third_party_data = get_thirdparty_data_for_item(internal_item)
        process(third_party_data, internal_item)

This change is both more efficient in terms of resource consumption and processing times. Instead of getting large batches of data and filtering things out in the process, we are asking for specific data from the thirdparty. This code adds more calls to the external service. On the other hand, it's about 10K of thirdparty data per item. The best part there's no more OOM issues.

The problem was not exactly a memory leak, at least to my knowledge, however this example serves as a great reminder that you should always think about the data you're fetching and only fetching what you need. Don't make Gilfoyle hate you.