Running tqdm with Python multiprocessing

· 2 min

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose.

The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like - multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc.

"""
Run `pip install tqdm` before running the script.

The function `foo` is going to be executed 100 times across
`MAX_WORKERS=5` processes. In a single pass, each process will
get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming
5 elements of an iterable will require (100 / (5*5)) 4 passes to finish
consuming the entire iterable of 100 elements.

Tqdm progress bar will update every `MAX_WORKERS*CHUNK_SIZE` iterations.
"""

# src.py


from __future__ import annotations

import multiprocessing as mp

from tqdm import tqdm
import time

import random
from dataclasses import dataclass

MAX_WORKERS = 5
CHUNK_SIZE = 5


@dataclass
class StartEnd:
    start: int
    end: int


def foo(start_end: StartEnd) -> int:
    time.sleep(0.2)
    return random.randint(start_end.start, start_end.end)


def main() -> None:
    inputs = [
        StartEnd(start, end)
        for start, end in zip(
            range(0, 100),
            range(100, 200),
        )
    ]

    with mp.Pool(processes=MAX_WORKERS) as pool:
        results = tqdm(
            pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE),
            total=len(inputs),
        )  # 'total' is redundant here but can be useful
        # when the size of the iterable is unobvious

        for result in results:
            print(result)


if __name__ == "__main__":
    main()

This will print:

Use daemon threads to test infinite while loops in Python

· 1 min

Python’s daemon threads are cool. A Python script will stop when the main thread is done and only daemon threads are running. To test a simple hello function that runs indefinitely, you can do the following:

# test_hello.py
from __future__ import annotations

import asyncio
import threading
from functools import partial
from unittest.mock import patch


async def hello() -> None:
    while True:
        await asyncio.sleep(1)
        print("hello")


@patch("asyncio.sleep", autospec=True)
async def test_hello(mock_asyncio_sleep, capsys):
    run = partial(asyncio.run, hello())
    t = threading.Thread(target=run, daemon=True)
    t.start()
    t.join(timeout=0.1)

    out, err = capsys.readouterr()
    assert err == ""
    assert "hello" in out
    mock_asyncio_sleep.assert_awaited()

To execute the script, make sure you’ve your virtual env actiavated. Also you’ll need to install pytest and pytest-asyncio. Then run:

Python's 'functools.partial' flattens nestings Automatically

· 1 min

The constructor for functools.partial() detects nesting and automatically flattens itself to a more efficient form. For example:

from functools import partial


def f(*, a: int, b: int, c: int) -> None:
    print(f"Args are {a}-{b}-{c}")


g = partial(partial(partial(f, a=1), b=2), c=3)

# Three function calls are flattened into one; free efficiency.
print(g)

# Bare function can be called as 3 arguments were bound previously.
g()

This returns:

functools.partial(<function f at 0x7f4fd16c11f0>, a=1, b=2, c=3)
Args are 1-2-3

Further reading

Pedantic configuration management with Pydantic

· 10 min

Managing configurations in your Python applications isn’t something you think about much often, until complexity starts to seep in and forces you to re-architect your initial approach. Ideally, your config management flow shouldn’t change across different applications or as your application begins to grow in size and complexity.

Even if you’re writing a library, there should be a consistent config management process that scales up properly. Since I primarily spend my time writing data-analytics, data-science applications and expose them using Flask or FastAPI framework, I’ll be tacking config management from an application development perspective.

Interfaces, mixins and building powerful custom data structures in Python

· 28 min

Imagine a custom set-like data structure that doesn’t perform hashing and trades performance for tighter memory footprint. Or imagine a dict-like data structure that automatically stores data in a PostgreSQL or Redis database the moment you initialize it; also it lets you get-set-delete key-value pairs using the usual retrieval-assignment-deletion syntax associated with built-in dictionaries. Custom data structures can give you the power of choice and writing them will make you understand how the built-in data structures in Python are constructed.

Deciphering Python's metaclasses

· 20 min

Updated on 2023-09-11: Fix broken URLs.

In Python, metaclass is one of the few tools that enables you to inject metaprogramming capabilities into your code. The term metaprogramming refers to the potential for a program to manipulate itself in a self referential manner. However, messing with metaclasses is often considered an arcane art that’s beyond the grasp of the plebeians. Heck, even Tim Peters advises you to tread carefully while dealing with these.

Implementing proxy pattern in Python

· 14 min

In Python, there’s a saying that “design patterns are anti-patterns”. Also, in the realm of dynamic languages, design patterns have the notoriety of injecting additional abstraction layers to the core logic and making the flow gratuitously obscure. Python’s dynamic nature and the treatment of functions as first-class objects often make Java-ish design patterns redundant.

Instead of littering your code with seemingly over-engineered patterns, you can almost always take the advantage of Python’s first-class objects, duck-typing, monkey-patching etc to accomplish the task at hand. However, recently there is one design pattern that I find myself using over and over again to write more maintainable code and that is the Proxy pattern. So I thought I’d document it here for future reference.

Effortless API response caching with Python & Redis

· 9 min

Updated on 2023-09-11: Fix broken URLs.

Recently, I was working with Mapbox’s Route optimization API. It tries to solve the traveling salesman problem where you provide the API with coordinates of multiple places and it returns a duration-optimized route between those locations. This is a perfect usecase where Redis caching can come handy. Redis is a fast and lightweight in-memory database with additional persistence options; making it a perfect candidate for the task at hand. Here, caching can save you from making redundant API requests and also, it can dramatically improve the response time as well.

Untangling Python decorators

· 21 min

Updated on 2022-02-13: Change functools import style.

When I first learned about Python decorators, using them felt like doing voodoo magic. Decorators can give you the ability to add new functionalities to any callable without actually touching or changing the code inside it. This can typically yield better encapsulation and help you write cleaner and more understandable code. However, decorator is considered as a fairly advanced topic in Python since understanding and writing it requires you to have command over multiple additional concepts like first class objects, higher order functions, closures etc. First, I’ll try to introduce these concepts as necessary and then unravel the core concept of decorator layer by layer. So let’s dive in.

Effortless concurrency with Python's concurrent.futures

· 17 min

Writing concurrent code in Python can be tricky. Before you even start, you have to worry about all this icky stuff like whether the task at hand is I/O or CPU bound or whether putting the extra effort to achieve concurrency is even going to give you the boost you need. Also, the presence of Global Interpreter Lock, GIL foists further limitations on writing truly concurrent code. But for the sake of sanity, you can oversimplify it like this without being blatantly incorrect: