Use __init_subclass__ hook to validate subclasses in Python

· 3 min

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice.

Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass.

Running tqdm with Python multiprocessing

· 2 min

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose.

The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like - multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc.

"""
Run `pip install tqdm` before running the script.

The function `foo` is going to be executed 100 times across
`MAX_WORKERS=5` processes. In a single pass, each process will
get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming
5 elements of an iterable will require (100 / (5*5)) 4 passes to finish
consuming the entire iterable of 100 elements.

Tqdm progress bar will update every `MAX_WORKERS*CHUNK_SIZE` iterations.
"""

# src.py


from __future__ import annotations

import multiprocessing as mp

from tqdm import tqdm
import time

import random
from dataclasses import dataclass

MAX_WORKERS = 5
CHUNK_SIZE = 5


@dataclass
class StartEnd:
    start: int
    end: int


def foo(start_end: StartEnd) -> int:
    time.sleep(0.2)
    return random.randint(start_end.start, start_end.end)


def main() -> None:
    inputs = [
        StartEnd(start, end)
        for start, end in zip(
            range(0, 100),
            range(100, 200),
        )
    ]

    with mp.Pool(processes=MAX_WORKERS) as pool:
        results = tqdm(
            pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE),
            total=len(inputs),
        )  # 'total' is redundant here but can be useful
        # when the size of the iterable is unobvious

        for result in results:
            print(result)


if __name__ == "__main__":
    main()

This will print:

Use 'command -v' over 'which' to find a program's executable

· 1 min

One thing that came to me as news is that the command which - which is the de-facto tool to find the path of an executable - is not POSIX compliant. The recent Debian which hunt brought it to my attention. The POSIX-compliant way of finding an executable program is command -v, which is usually built into most of the shells.

So, instead of doing this:

which python3.12

Do this:

Use curly braces while pasting shell commands

· 1 min

Pasting shell commands can be a pain when they include hidden return \n characters. In such a case, your shell will try to execute the command immediately. To prevent that, use curly braces { <cmd> } while pasting the command. Your command should look like the following:

{ dig +short google.com }

Here, the spaces after the braces are significant.

Use strict mode while running bash scripts

· 1 min

Use unofficial bash strict mode while writing scripts. Bash has a few gotchas and this helps you to avoid that. For example:

#!/bin/bash

set -euo pipefail

echo "Hello"

Where,

-e              Exit immediately if a command exits with a non-zero status.
-u              Treat unset variables as an error when substituting.
-o pipefail     The return value of a pipeline is the status of
                the last command to exit with a non-zero status,
                or zero if no command exited with a non-zero status.

Further reading