Recently, fell into this trap as I wanted to speed up a slow instance method by caching it.
When you decorate an instance method with
functools.lru_cache
decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them.
Let’s consider this example:
# src.py
import functools
import time
from typing import TypeVar
Number = TypeVar("Number", int, float, complex)
class SlowAdder:
def __init__(self, delay: int = 1) -> None:
self.delay = delay
@functools.lru_cache
def calculate(self, *args: Number) -> Number:
time.sleep(self.delay)
return sum(args)
def __del__(self) -> None:
print("Deleting instance ...")
# Create a SlowAdder instance.
slow_adder = SlowAdder(2)
# Measure performance.
start_time = time.perf_counter()
# ----------------------------------------------
result = slow_adder.calculate(1, 2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time-start_time} seconds, result: {result}.")
start_time = time.perf_counter()
# ----------------------------------------------
result = slow_adder.calculate(1, 2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time-start_time} seconds, result: {result}.")
Here, I’ve created a simple SlowAdder
class that accepts a delay
value; then it sleeps
for delay
seconds and calculates the sum of the inputs in the calculate
method. To avoid
this slow recalculation for the same arguments, the calculate
method was wrapped in the
lru_cache
decorator. The __del__
method notifies us when the garbage collection has
successfully cleaned up instances of the class.
If you run this program, it’ll print this:
Calculation took 2.0021052900010545 seconds, result: 3.
Calculation took 5.632002284983173e-06 seconds, result: 3.
Deleting instance ...
You can see that the lru_cache
decorator is doing its job. The second call to the
calculate
method with the same argument took noticeably less time compared to the first
one. In the second case, the lru_cache
decorator is just doing a simple dictionary lookup.
This is all good but the instances of the ShowAdder
class never get garbage collected in
the lifetime of the program. Let’s prove that in the next section.
Garbage collector can’t clear up the affected instances
If you execute the above snippet with an -i
flag, we can interactively prove that no
garbage collection takes place. Let’s do it:
$ python -i src.py
Calculation took 2.002104839997628 seconds, result: 3.
Calculation took 5.566998879658058e-06 seconds, result: 3.
>>> import gc
>>>
>>> slow_adder.calculate(1,2)
3
>>> slow_adder = None
>>>
>>> gc.collect()
0
>>>
Here on the REPL, you can see that I’ve reassigned slow_adder
to None and then explicitly
triggered the garbage collector. However, we don’t see the message in the __del__
method
printed here and the output of gc.collect()
is 0. This implies that something is holding a
reference to the slow_adder
instance and the garbage collector can’t clear up the object.
Let’s inspect who has that reference:
$ python -i src.py
Calculation took 2.00233274600032 seconds, result: 3.
Calculation took 5.453999619930983e-06 seconds, result: 3.
>>> slow_adder.calculate.cache_info()
CacheInfo(hits=1, misses=1, maxsize=128, currsize=1)
>>> slow_adder.calculate(1,2)
3
>>> slow_adder.calculate.cache_info()
CacheInfo(hits=2, misses=1, maxsize=128, currsize=1)
>>> slow_adder.calculate.cache_clear()
>>> slow_adder = None
Deleting instance ...
>>>
The cache_info()
is showing that the cache container keeps a reference to the instance
until it gets cleared. When I manually cleared the cache and reassigned the variable
slow_adder
to None
, only then did the garbage collector remove the instance. By default,
the size of the lru_cache
is 128 but if I had applied lru_cache(maxsize=None)
, that
would’ve kept the cache forever and the garbage collector would wait for the reference count
to drop to zero but that’d never happen within the lifetime of the process.
This can be dangerous if you create millions of instances and they don’t get garbage collected naturally. It can overflow your working memory and cause the process to crash! I accidentally did it where the infected class was being instantiated millions of times via HTTP API requests.
The solution
To solve this, we’ll have to make the cache containers local to the instances so that the reference from cache to the instance gets scraped off with the instance. Here’s how you can do that:
# src_2.py
import functools
import time
from typing import TypeVar
Number = TypeVar("Number", int, float, complex)
class SlowAdder:
def __init__(self, delay: int = 1) -> None:
self.delay = delay
self.calculate = functools.lru_cache()(self._calculate)
def _calculate(self, *args: Number) -> Number:
time.sleep(self.delay)
return sum(args)
def __del__(self) -> None:
print("Deleting instance ...")
The only difference here is—instead of decorating the method directly, I called the
decorator function on the _calculate
method just as a regular function and saved the
result as an instance variable named calculate
. The instances of this class get garbage
collected as usual.
$ python -i src.py
>>> slow_adder = SlowAdder(2)
>>> slow_adder.calculate(1,2)
3
>>> slow_adder.calculate.cache_info()
CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)
>>> import gc
>>> slow_adder = None
>>> gc.collect()
Deleting instance ...
11
Notice that this time, clearing out the cache wasn’t necessary. I had to call gc.collect()
to invoke explicit garbage collection. That’s because this shenanigan creates cyclical
references and the GC needs to do some special magic to clear the memory. In real code,
Python interpreter will clean this up for you in the background without you having to call
the GC.
The self dilemma
Even after applying the solution above, a weird thing happens in the case of instance
methods. Let’s run the src_2.py
script interactively to demonstrate that:
$ python -i src_2.py
>>> slow_adder = SlowAdder(2)
>>> slow_adder.calculate(1,2)
>>> slow_adder
<__main__.SlowAdder object at 0x7f92595f9b40>
>>> slow_adder_2 = SlowAdder(2)
>>> slow_adder_2.calculate(1,2)
3
>>> slow_adder.calculate.cache_info()
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
>>> slow_adder_2.calculate.cache_info()
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
>>>
Here, I’ve created another instance of the SlowAdder
class and called calculate
with the
same arguments. But whenever I called the calculate
method on the slow_adder_2
instance
with the same parameters as before, the first time, it recalculated it instead of returning
the result from the cache. How come!
Underneath, the lru_cache
decorator uses a dictionary to cache the calculated values. A
hash function is applied1 to all the parameters of the target function to build the key
of the dictionary, and the value is the return value of the function when those parameters
are the inputs. This means, the first argument self
also gets included while building the
cache key. However, for different instances, this self
object is going to be different and
that makes the hashed key of the cache different for every instance even if the other
parameters are the same.
But what about class methods & static methods
Class methods and static methods don’t suffer from the above issues as they don’t have any
ties to their respective instances. In their case, the cache container is local to the
class, not the instances. Here, you can stack the lru_cache
decorator as usual. Let’s
demonstrate that for classmethod
first:
# src_3.py
import functools
import time
class Foo:
@classmethod
@functools.lru_cache
def bar(cls, delay: int) -> int:
# Do something with the cls.
cls.delay = delay
time.sleep(delay)
return 42
def __del__(self) -> None:
print("Deleting instance ...")
foo_1 = Foo()
foo_2 = Foo()
start_time = time.perf_counter()
# ----------------------------------------------
result = foo_1.bar(2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time - start_time} seconds, result: {result}.")
start_time = time.perf_counter()
# ----------------------------------------------
result = foo_2.bar(2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time - start_time} seconds, result: {result}.")
You can inspect the garbage collection behavior here:
$ python src_3.py
Calculation took 2.0022965140015003 seconds, result: 42.
Calculation took 4.4819971662946045e-06 seconds, result: 42.
>>> foo_1 = None
Deleting instance ...
>>>
Static methods behave exactly the same. You can use the lru_cache
decorator in similar
fashion as below:
import functools
import time
class Foo:
@staticmethod
@functools.lru_cache
def bar(delay: int) -> int:
return 42
def __del__(self) -> None:
print("Deleting instance ...")
Recent posts
- Function types and single-method interfaces in Go
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited
- Topological sort
- Writing a circuit breaker in Go
- Discovering direnv
- Notes on building event-driven systems