Structural subtyping in Python

· 8 min

I love using Go’s interface feature to declaratively define my public API structure. Consider this example:

package main

import (
    "fmt"
)

// Declare the interface.
type Geometry interface {
    area() float64
    perim() float64
}

// Struct that represents a rectangle.
type rect struct {
    width, height float64
}

// Method to calculate the area of a rectangle instance.
func (r *rect) area() float64 {
    return r.width * r.height
}

// Method to calculate the perimeter of a rectange instance.
func (r *rect) perim() float64 {
    return 2 * (r.width + r.height)
}

// Notice that we're calling the methods on the interface,
// not on the instance of the Rectangle struct directly.
func measure(g Geometry) {
    fmt.Println(g)
    fmt.Println(g.area())
    fmt.Println(g.perim())
}

func main() {
    r := &rect{width: 3, height: 4}

    measure(r)
}

You can play around with the example on Go Playground. Running it will print:

Automatic attribute delegation in Python composition

· 3 min

While trying to avoid inheritance in an API that I was working on, I came across this neat trick to perform attribute delegation on composed classes. Let’s say there’s a class called Engine and you want to put an engine instance in a Car. In this case, the car has a classic ‘has a’ (inheritance usually refers to ‘is a’ relationships) relationship with the engine. So, composition makes more sense than inheritance here. Consider this example:

Access 'classmethod's like 'property' methods in Python

· 2 min

I wanted to add a helper method to an Enum class. However, I didn’t want to make it a classmethod as property method made more sense in this particular case. Problem is, you aren’t supposed to initialize an enum class, and property methods can only be accessed from the instances of a class; not from the class itself.

While sifting through Django 3.2’s codebase, I found this neat trick to make a classmethod that acts like a property method and can be accessed directly from the class without initializing it.

Use __init_subclass__ hook to validate subclasses in Python

· 3 min

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice.

Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass.

Running tqdm with Python multiprocessing

· 2 min

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose.

The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like - multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc.

"""
Run `pip install tqdm` before running the script.

The function `foo` is going to be executed 100 times across
`MAX_WORKERS=5` processes. In a single pass, each process will
get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming
5 elements of an iterable will require (100 / (5*5)) 4 passes to finish
consuming the entire iterable of 100 elements.

Tqdm progress bar will update every `MAX_WORKERS*CHUNK_SIZE` iterations.
"""

# src.py


from __future__ import annotations

import multiprocessing as mp

from tqdm import tqdm
import time

import random
from dataclasses import dataclass

MAX_WORKERS = 5
CHUNK_SIZE = 5


@dataclass
class StartEnd:
    start: int
    end: int


def foo(start_end: StartEnd) -> int:
    time.sleep(0.2)
    return random.randint(start_end.start, start_end.end)


def main() -> None:
    inputs = [
        StartEnd(start, end)
        for start, end in zip(
            range(0, 100),
            range(100, 200),
        )
    ]

    with mp.Pool(processes=MAX_WORKERS) as pool:
        results = tqdm(
            pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE),
            total=len(inputs),
        )  # 'total' is redundant here but can be useful
        # when the size of the iterable is unobvious

        for result in results:
            print(result)


if __name__ == "__main__":
    main()

This will print:

Use daemon threads to test infinite while loops in Python

· 1 min

Python’s daemon threads are cool. A Python script will stop when the main thread is done and only daemon threads are running. To test a simple hello function that runs indefinitely, you can do the following:

# test_hello.py
from __future__ import annotations

import asyncio
import threading
from functools import partial
from unittest.mock import patch


async def hello() -> None:
    while True:
        await asyncio.sleep(1)
        print("hello")


@patch("asyncio.sleep", autospec=True)
async def test_hello(mock_asyncio_sleep, capsys):
    run = partial(asyncio.run, hello())
    t = threading.Thread(target=run, daemon=True)
    t.start()
    t.join(timeout=0.1)

    out, err = capsys.readouterr()
    assert err == ""
    assert "hello" in out
    mock_asyncio_sleep.assert_awaited()

To execute the script, make sure you’ve your virtual env actiavated. Also you’ll need to install pytest and pytest-asyncio. Then run:

Python's 'functools.partial' flattens nestings Automatically

· 1 min

The constructor for functools.partial() detects nesting and automatically flattens itself to a more efficient form. For example:

from functools import partial


def f(*, a: int, b: int, c: int) -> None:
    print(f"Args are {a}-{b}-{c}")


g = partial(partial(partial(f, a=1), b=2), c=3)

# Three function calls are flattened into one; free efficiency.
print(g)

# Bare function can be called as 3 arguments were bound previously.
g()

This returns:

functools.partial(<function f at 0x7f4fd16c11f0>, a=1, b=2, c=3)
Args are 1-2-3

Further reading

Pedantic configuration management with Pydantic

· 10 min

Managing configurations in your Python applications isn’t something you think about much often, until complexity starts to seep in and forces you to re-architect your initial approach. Ideally, your config management flow shouldn’t change across different applications or as your application begins to grow in size and complexity.

Even if you’re writing a library, there should be a consistent config management process that scales up properly. Since I primarily spend my time writing data-analytics, data-science applications and expose them using Flask or FastAPI framework, I’ll be tacking config management from an application development perspective.

Interfaces, mixins and building powerful custom data structures in Python

· 28 min

Imagine a custom set-like data structure that doesn’t perform hashing and trades performance for tighter memory footprint. Or imagine a dict-like data structure that automatically stores data in a PostgreSQL or Redis database the moment you initialize it; also it lets you get-set-delete key-value pairs using the usual retrieval-assignment-deletion syntax associated with built-in dictionaries. Custom data structures can give you the power of choice and writing them will make you understand how the built-in data structures in Python are constructed.

Deciphering Python's metaclasses

· 20 min

Updated on 2023-09-11: Fix broken URLs.

In Python, metaclass is one of the few tools that enables you to inject metaprogramming capabilities into your code. The term metaprogramming refers to the potential for a program to manipulate itself in a self referential manner. However, messing with metaclasses is often considered an arcane art that’s beyond the grasp of the plebeians. Heck, even Tim Peters advises you to tread carefully while dealing with these.