Python Memory Management and Weak References¶
Chris Barker
PythonCHB@gmail.com
https://github.com/PythonCHB
Memory Management¶
You don’t want python objects that are no longer in use taking up memory.
You don’t want to keep track of all that yourself.
Most “scripting languages” or “virtual machines” have some sort of automated memory management
Many ways to do “Garbage Collection”
Reference Counting¶
- How memory is managed is not part of the Python language spec:
Jython uses the JVM
Iron Python uses the CLR - Both are garbage collected
PyPy uses Minimark
- The CPython interpreter uses a reference counting scheme:
Every time there is a new reference to a Python object, its reference count is increased
Every time a reference is removed – the count is decreased
When the reference count goes to zero: the object is deleted (memory freed)
What makes a reference?¶
Binding to a name:
x = an_object
Putting it in a container:
l.append(an_object)
Passing it to a function:
func(an_object)
Most of the time, you don’t need to think about this at all.
How do I see what’s going on?¶
import sys
sys.getrefcount(object)
NOTE: This will always return one more than you’d expect, as passing the object to the function increases its refcount by one:
In [5]: a = []
In [6]: sys.getrefcount(a)
Out[6]: 2
- The Heisenberg Uncertainty Principle:
you can’t observe it without altering it
Playing with References¶
(live demo)
In [7]: a = []
In [8]: sys.getrefcount(a)
Out[8]: 2
In [9]: b = a
In [10]: sys.getrefcount(a)
Out[10]: 3
In [11]: l = [1,2,3,a]
In [12]: sys.getrefcount(a)
Out[12]: 4
In [13]: del b
In [14]: sys.getrefcount(a)
Out[14]: 3
In [15]: del l
In [16]: sys.getrefcount(a)
Out[16]: 2
# function local variables
In [17]: def test(x):
....: print "x has a refcount of:", sys.getrefcount(x)
....:
In [18]: sys.getrefcount(a)
Out[18]: 2
In [19]: test(a)
x has a refcount of: 4
In [20]: sys.getrefcount(a)
Out[20]: 2
In [21]: x = 3
In [22]: sys.getrefcount(x)
Out[22]: 428
WHOA!!
(hint: interning….)
The Power of Reference Counting¶
You don’t need to think about it most of the time.
Code that creates objects doesn’t need to delete them
Objects get deleted right away
. They can “clean up” on deletion (files, for instance) – and it will happen right away.
Performance is predictable
The Limits of Reference Counting¶
Performance overhead on all operations. But the big one:
Circular references
If a python object somehow references itself – i.e. it references another object that references the first object:
You have a circular reference …
Circular References¶
In [8]: l1 = [1,] ; l2 = [2,]
In [9]: l1.append(l2); l2.append(l1)
In [10]: l1
Out[10]: [1, [2, [...]]]
In [11]: l2
Out[11]: [2, [1, [...]]]
In [12]: l1[1]
Out[12]: [2, [1, [...]]]
In [13]: l2[1][1][1]
Out[13]: [1, [2, [...]]]
(demo) – simple_circular.py
The Garbage Collector¶
As of Python 2.0 – a garbage collector was added.
It can find and clean up “unreachable” references.
It is turned on by default:
In [1]: import gc
In [2]: gc.isenabled()
Out[2]: True
or you can force it:
In [4]: gc.collect()
Out[4]: 64
But it can be slow, and doesn’t always work!
How does the garbage collector work?
Not a full “mark and sweep” type.
It searches for reference cycles – then cleans those up.
It doesn’t have to bother checking non-container types (ints, strings, etc.)
Faster, and not as dependent on having a clear “root” namespace.
Details here:
http://arctrix.com/nas/python/gc/ (or in the source!)
Big issue: classes that define a __del__
method are not cleaned up.
__del__
methods often act on references that may no be there if they are cleaned up in the wrong order.
NOTE: you can work with gc.garbage() – but tricky and messy
Tools¶
If these objects are no longer “reachable” – how do you find out what’s going on?
We saw sys.getrefcount()
– but you need a reference to the object to use it.
You can see what the refcount is before you delete the last reference, but that isn’t always easy.
Process Memory Use¶
A really coarse way to find a memory leak is to see if the process memory is growing.
It can be subtle –python (and the OS) do tricks to re-use memory, etc.
But if you have a “real” leak – you’ll see it. (Example to follow)
provides functions that report the memory use of the current running process.
(*nix and Windows code)
id checks¶
As it happens, the Python id()
function returns a memory address.
It’s really dangerous, but that means we can examine an object if we know its id, even if we don’t hold a reference to it.
Bill Bumgarner wrote a nifty extension module that returns the python object pointed to by an id (memory address) – “di”:
http://www.friday.com/bbum/2007/08/24/python-di/
I added a function that returns the reference count of an object from its id.
https://github.com/PythonCHB/di_refcount
NOTE: it would be a really bad idea to use these in production code!
Examples¶
uses the ref_by_id() function to see what’s going on with a circular reference and garbage collection.
More real examples in iPython notebook:
CircularReferenceExample.ipynb
Or: circular.py
memcount.py
is a test
file that show memory growth if circular references are not cleaned up.
( mem_check.py
)
is code that reports process memory use.
You can find this code in the main repo here:
https://github.com/PythonCHB/PythonTopics/tree/master/Sources/code/weak_references
Weak References¶
For times when you don’t want to keep objects alive, Python provides “weak references” – we saw this in the examples.
(https://docs.python.org/2/library/weakref.html)
The built-in containers:
WeakKeyDictionary
WeakValueDictionary
WeakSet
Proxy
objects
act much like regular references – client code doesn’t know the difference
WeakRef
objects
When you want to control what happens when the referenced object is gone.
Exercise¶
Build a “weak cache”:
For large objects that are expensive to create:
Use a WeakValueDictionay to hold references to (probably large) objects.
When the client requests an object that doesn’t exist – one is created, returned, and cached (weakly).
If the object is in the cache, it is returned.
when no other references exist to the object, it is NOT retained by the cache.