Patch where the object is used

· 3 min

I was reading Ned Bachelder’s blog Why your mock doesn’t work and it triggered an epiphany in me about a testing pattern that I’ve been using for a while without being aware that there might be an aphorism on the practice.

Patch where the object is used; not where it’s defined.

To understand it, consider the example below. Here, you have a module containing a function that fetches data from some fictitious database.

Partially assert callable arguments with 'unittest.mock.ANY'

· 2 min

I just found out that you can use Python’s unittest.mock.ANY to make assertions about certain arguments in a mock call, without caring about the other arguments. This can be handy if you want to test how a callable is called but only want to make assertions about some arguments. Consider the following example:

# test_src.py

import random
import time


def fetch() -> list[float]:
    # Simulate fetching data from a database.
    time.sleep(2)
    return [random.random() for _ in range(4)]


def add(w: float, x: float, y: float, z: float) -> float:
    return w + x + y + z


def procss() -> float:
    return add(*fetch())

Let’s say we only want to test the process function. But process ultimately depends on the fetch function, which has multiple side effects - it returns pseudo-random values and waits for 2 seconds on a fictitious network call. Since we only care about process, we’ll mock the other two functions. Here’s how unittest.mock.ANY can make life easier:

Apply constraints with 'assert' in Python

· 4 min

Whenever I need to apply some runtime constraints on a value while building an API, I usually compare the value to an expected range and raise a ValueError if it’s not within the range. For example, let’s define a function that throttles some fictitious operation. The throttle function limits the number of times an operation can be performed by specifying the throttle_after parameter. This parameter defines the number of iterations after which the operation will be halted. The current_iter parameter tracks the current number of times the operation has been performed. Here’s the implementation:

Stream process a CSV file in Python

· 6 min

A common bottleneck for processing large data files is - memory. Downloading the file and loading the entire content is surely the easiest way to go. However, it’s likely that you’ll quickly hit OOM errors. Often time, whenever I have to deal with large data files that need to be downloaded and processed, I prefer to stream the content line by line and use multiple processes to consume them concurrently.

Bulk operations in Django with process pool

· 4 min

I’ve rarely been able to take advantage of Django’s bulk_create / bulk_update APIs in production applications; especially in the cases where I need to create or update multiple complex objects with a script. Often time, these complex objects trigger a chain of signals or need non-trivial setups before any operations can be performed on each of them.

The issue is, bulk_create / bulk_update doesn’t trigger these signals or expose any hooks to run any setup code. The Django doc mentions these bulk_create caveats in detail. Here are a few of them:

Read a CSV file from s3 without saving it to the disk

· 4 min

I frequently have to write ad-hoc scripts that download a CSV file from AWS S3, do some processing on it, and then create or update objects in the production database using the parsed information from the file. In Python, it’s trivial to download any file from s3 via boto3, and then the file can be read with the csv module from the standard library. However, these scripts are usually run from a separate script server and I prefer not to clutter the server’s disk with random CSV files. Loading the s3 file directly into memory and reading its contents isn’t difficult but the process has some subtleties. I do this often enough to justify documenting the workflow here.

Safer 'operator.itemgetter' in Python

· 6 min

Python’s operator.itemgetter is quite versatile. It works on pretty much any iterables and map-like objects and allows you to fetch elements from them. The following snippet shows how you can use it to sort a list of tuples by the first element of the tuple:

In [2]: from operator import itemgetter
   ...:
   ...: l = [(10, 9), (1, 3), (4, 8), (0, 55), (6, 7)]
   ...: l_sorted = sorted(l, key=itemgetter(0))

In [3]: l_sorted
Out[3]: [(0, 55), (1, 3), (4, 8), (6, 7), (10, 9)]

Here, the itemgetter callable is doing the work of selecting the first element of every tuple inside the list and then the sorted function is using those values to sort the elements. Also, this is faster than using a lambda function and passing that to the key parameter to do the sorting:

Return JSON error payload instead of HTML text in DRF

· 3 min

At my workplace, we have a large Django monolith that powers the main website and works as the primary REST API server at the same time. We use Django Rest Framework (DRF) to build and serve the API endpoints. This means, whenever there’s an error, based on the incoming request header - we’ve to return different formats of error responses to the website and API users.

The default DRF configuration returns a JSON response when the system experiences an HTTP 400 (bad request) error. However, the server returns an HTML error page to the API users whenever HTTP 403 (forbidden), HTTP 404 (not found), or HTTP 500 (internal server error) occurs. This is suboptimal; JSON APIs should never return HTML text whenever something goes wrong. On the other hand, the website needs those error text to appear accordingly.

Decoupling producers and consumers of iterables with generators in Python

· 5 min

Generators can help you decouple the production and consumption of iterables - making your code more readable and maintainable. I learned this trick a few years back from David Beazley’s Generator tricks for systems programmers slides. Consider this example:

# src.py
from __future__ import annotations

import time
from typing import NoReturn


def infinite_counter(start: int, step: int) -> NoReturn:
    i = start
    while True:
        time.sleep(1)  # Not to flood stdout
        print(i)
        i += step


infinite_counter(1, 2)
# Prints
# 1
# 3
# 5
# ...

Now, how’d you decouple the print statement from the infinite_counter? Since the function never returns, you can’t collect the outputs in an iterable, return the container, and print the elements of the iterable in another function. You might be wondering why would you even need to do it. I can think of two reasons:

Pre-allocated lists in Python

· 4 min

In CPython, elements of a list are stored as pointers to the elements rather than the values of the elements themselves. This is evident from the list struct in CPython that represents a list in C:

// Fetched from CPython main branch. Removed comments for brevity.
typedef struct {

    PyObject_VAR_HEAD
    PyObject **ob_item; /* Pointer reference to the element. */
    Py_ssize_t allocated;

}PyListObject;

An empty list builds a PyObject and occupies some memory:

from sys import getsizeof

l = []

print(getsizeof(l))

This returns: