TIL | Redowan's Reflections

Use __init_subclass__ hook to validate subclasses in Python

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice. Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass. ...

Running tqdm with Python multiprocessing

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose. The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like—multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc. """ Run `pip install tqdm` before running the script. The function `foo` is going to be executed 100 times across `MAX_WORKERS=5` processes. In a single pass, each process will get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming 5 elements of an iterable will require (100 / (5*5)) 4 passes to finish consuming the entire iterable of 100 elements. Tqdm progress bar will be updated after every `MAX_WORKERS*CHUNK_SIZE` iterations. """ # src.py from __future__ import annotations import multiprocessing as mp from tqdm import tqdm import time import random from dataclasses import dataclass MAX_WORKERS = 5 CHUNK_SIZE = 5 @dataclass class StartEnd: start: int end: int def foo(start_end: StartEnd) -> int: time.sleep(0.2) return random.randint(start_end.start, start_end.end) def main() -> None: inputs = [ StartEnd(start, end) for start, end in zip( range(0, 100), range(100, 200), ) ] with mp.Pool(processes=MAX_WORKERS) as pool: results = tqdm( pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE), total=len(inputs), ) # 'total' is redundant here but can be useful # when the size of the iterable is unobvious for result in results: print(result) if __name__ == "__main__": main() This will print: ...

Use 'command -v' over 'which' to find a program's executable

One thing that came to me as news is that the command which—which is the de-facto tool to find the path of an executable—is not POSIX compliant. The recent Debian debacle1 around which brought it to my attention. The POSIX-compliant way of finding an executable program is command -v, which is usually built into most of the shells. So, instead of doing this: which python3.12 Do this: command -v which python3.12 Debian’s which hunt ↩︎ ...

Use curly braces while pasting shell commands

Pasting shell commands can be a pain when they include hidden return \n characters. In such a case, your shell will try to execute the command immediately. To prevent that, use curly braces { <cmd> } while pasting the command. Your command should look like the following: { dig +short google.com } Here, the spaces after the braces are significant.

Use strict mode while running bash scripts

Use unofficial bash strict mode while writing scripts. Bash has a few gotchas and this helps you to avoid that. For example: #!/bin/bash set -euo pipefail echo "Hello" Where, -e Exit immediately if a command exits with a non-zero status. -u Treat unset variables as an error when substituting. -o pipefail The return value of a pipeline is the status of the last command to exit with a non-zero status, or zero if no command exited with a non-zero status. References The idea is a less radical version of this post1. I don’t recommend messing with the IFS without a valid reason. ...