Avoiding collisions in Go context keys

· 8 min

Along with propagating deadlines and cancellation signals, Go’s context package can also carry request-scoped values across API boundaries and processes.

There are only two public API constructs associated with context values:

func WithValue(parent Context, key, val any) Context
func (c Context) Value(key any) any

WithValue can take any comparable value as both the key and the value. The key defines how the stored value is identified, and the value can be any data you want to pass through the call chain.

Why does Go's io.Reader have such a weird signature?

· 3 min

I’ve always found the signature of io.Reader a bit odd:

type Reader interface {
    Read(p []byte) (n int, err error)
}

Why take a byte slice and write data into it? Wouldn’t it be simpler to create the slice inside Read, load the data, and return it instead?

// Hypothetical; what I *thought* it should be
Read() (p []byte, err error)

This felt more intuitive to me - you call Read, and it gives you a slice filled with data, no need to pass anything.

Quicker startup with module-level __getattr__

· 4 min

This morning, someone on Twitter pointed me to PEP 562, which introduces __getattr__ and __dir__ at the module level. While __dir__ helps control which attributes are printed when calling dir(module), __getattr__ is the more interesting addition.

The __getattr__ method in a module works similarly to how it does in a Python class. For example:

class Cat:
    def __getattr__(self, name: str) -> str:
        if name == "voice":
            return "meow!!"
        raise AttributeError(f"Attribute {name} does not exist")


# Try to access 'voice' on Cat
cat = Cat()
cat.voice  # Prints "meow!!"

# Raises AttributeError: Attribute something_else does not exist
cat.something_else

In this class, __getattr__ defines what happens when specific attributes are accessed, allowing you to manage how missing attributes behave. Since Python 3.7, you can also define __getattr__ at the module level to handle attribute access on the module itself.

Dysfunctional options pattern in Go

· 7 min

Ever since Rob Pike published the text on the functional options pattern, there’s been no shortage of blogs, talks, or comments on how it improves or obfuscates configuration ergonomics.

While the necessity of such a pattern is quite evident in a language that lacks default arguments in functions, more often than not, it needlessly complicates things. The situation gets worse if you have to maintain a public API where multiple configurations are controlled in this manner.

Memory leakage in Python descriptors

· 5 min

Unless I’m hand rolling my own ORM-like feature or validation logic, I rarely need to write custom descriptors in Python. The built-in descriptor magics like @classmethod, @property, @staticmethod, and vanilla instance methods usually get the job done. However, every time I need to dig my teeth into descriptors, I reach for this fantastic Descriptor how-to guide by Raymond Hettinger. You should definitely set aside the time to read it if you haven’t already. It has helped me immensely to deepen my understanding of how many of the fundamental language constructs are wired together underneath.

Save models with update_fields for better performance in Django

· 3 min

TIL that you can specify update_fields while saving a Django model to generate a leaner underlying SQL query. This yields better performance while updating multiple objects in a tight loop. To test that, I’m opening an IPython shell with python manage.py shell -i ipython command and creating a few user objects with the following lines:

In [1]: from django.contrib.auth import User

In [2]: for i in range(1000):
   ...:     fname, lname = f'foo_{i}', f'bar_{i}'
   ...:     User.objects.create(
   ...:         first_name=fname, last_name=lname, username=fname)
   ...:

Here’s the underlying query Django generates when you’re trying to save a single object:

Dissecting an outage caused by eager-loading file content

· 5 min

Python makes it freakishly easy to load the whole content of any file into memory and process it afterward. This is one of the first things that’s taught to people who are new to the language. While the following snippet might be frowned upon by many, it’s definitely not uncommon:

# src.py

with open("foo.csv", "r") as f:
    # Load whole content of the file as a string in memory.
    f_content = f.read()

    # ...do your processing here.
    ...

Adopting this pattern as the default way of handling files isn’t the most terrible thing in the world for sure. Also, this is often the preferred way of dealing with image files or blobs. However, overzealously loading file content is only okay as long as the file size is smaller than the volatile memory of the working system.

Prefer urlsplit over urlparse to destructure URLs

· 2 min

TIL from this video by Anthony Sottile that Python’s urlparse is quite slow at parsing URLs. I’ve always used urlparse to destructure URLs and didn’t know that there’s a faster alternative to this in the standard library. The official documentation also recommends the alternative function.

The urlparse function splits a supplied URL into multiple seperate components and returns a ParseResult object. Consider this example:

In [1]: from urllib.parse import urlparse

In [2]: url = "https://httpbin.org/get?q=hello&r=22"

In [3]: urlparse(url)
Out[3]: ParseResult(
        scheme='https', netloc='httpbin.org',
        path='/get', params='', query='q=hello&r=22',
        fragment=''
    )

You can see how the function disassembles the URL and builds a ParseResult object with the URL components. Along with this, the urlparse function can also parse an obscure type of URL that you’ll most likely never need. If you notice closely in the previous example, you’ll see that there’s a params argument in the ParseResult object. This params argument gets parsed whether you need it or not and that adds some overhead. The params field will be populated if you have a URL like this:

Pre-allocated lists in Python

· 4 min

In CPython, elements of a list are stored as pointers to the elements rather than the values of the elements themselves. This is evident from the list struct in CPython that represents a list in C:

// Fetched from CPython main branch. Removed comments for brevity.
typedef struct {

    PyObject_VAR_HEAD
    PyObject **ob_item; /* Pointer reference to the element. */
    Py_ssize_t allocated;

}PyListObject;

An empty list builds a PyObject and occupies some memory:

from sys import getsizeof

l = []

print(getsizeof(l))

This returns:

Don't wrap instance methods with 'functools.lru_cache' decorator in Python

· 6 min

Recently, fell into this trap as I wanted to speed up a slow instance method by caching it.

When you decorate an instance method with functools.lru_cache decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them.

Let’s consider this example:

# src.py
import functools
import time
from typing import TypeVar

Number = TypeVar("Number", int, float, complex)


class SlowAdder:
    def __init__(self, delay: int = 1) -> None:
        self.delay = delay

    @functools.lru_cache
    def calculate(self, *args: Number) -> Number:
        time.sleep(self.delay)
        return sum(args)

    def __del__(self) -> None:
        print("Deleting instance ...")


# Create a SlowAdder instance.
slow_adder = SlowAdder(2)

# Measure performance.
start_time = time.perf_counter()
# ----------------------------------------------
result = slow_adder.calculate(1, 2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time-start_time} seconds, result: {result}.")


start_time = time.perf_counter()
# ----------------------------------------------
result = slow_adder.calculate(1, 2)
# ----------------------------------------------
end_time = time.perf_counter()
print(f"Calculation took {end_time-start_time} seconds, result: {result}.")

Here, I’ve created a simple SlowAdder class that accepts a delay value; then it sleeps for delay seconds and calculates the sum of the inputs in the calculate method. To avoid this slow recalculation for the same arguments, the calculate method was wrapped in the lru_cache decorator. The __del__ method notifies us when the garbage collection has successfully cleaned up instances of the class.