Difference between constrained 'TypeVar' and 'Union' in Python

If you want to define a variable that can accept values of multiple possible types, using typing.Union is one way of doing that: from typing import Union U = Union[int, str] However, there’s another way you can express a similar concept via constrained TypeVar. You’d do so as follows: from typing import TypeVar T = TypeVar("T", int, str) So, what’s the difference between these two and when to use which? The primary difference is: T’s type needs to be consistent across multiple uses within a given scope, while U’s doesn’t. ...

January 19, 2022

Don't wrap instance methods with 'functools.lru_cache' decorator in Python

Recently, fell into this trap as I wanted to speed up a slow instance method by caching it. When you decorate an instance method with functools.lru_cache decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them. Let’s consider this example: # src.py import functools import time from typing import TypeVar Number = TypeVar("Number", int, float, complex) class SlowAdder: def __init__(self, delay: int = 1) -> None: self.delay = delay @functools.lru_cache def calculate(self, *args: Number) -> Number: time.sleep(self.delay) return sum(args) def __del__(self) -> None: print("Deleting instance ...") # Create a SlowAdder instance. slow_adder = SlowAdder(2) # Measure performance. start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") Here, I’ve created a simple SlowAdder class that accepts a delay value; then it sleeps for delay seconds and calculates the sum of the inputs in the calculate method. To avoid this slow recalculation for the same arguments, the calculate method was wrapped in the lru_cache decorator. The __del__ method notifies us when the garbage collection has successfully cleaned up instances of the class. ...

January 15, 2022

Cropping texts in Python with 'textwrap.shorten'

Problem A common interview question that I’ve seen goes as follows: Write a function to crop a text corpus without breaking any word. Take the length of the text up to which character you should trim. Make sure that the cropped text doesn’t have any trailing space. Try to maximize the number of words you can pack in your trimmed text. Your function should look something like this: def crop(text: str, limit: int) -> str: """Crops 'text' upto 'limit' characters.""" # Crop the text. cropped_text = perform_crop() return cropped_text For example, if text looks like this— ...

January 6, 2022

String interning in Python

I was reading the source code1 of the reference implementation of “PEP-661: Sentinel Values”2 and discovered an optimization technique known as String interning. Modern programming languages like Java, Python, PHP, Ruby, Julia, etc, performs string interning to make their string operations more performant. String interning String interning makes common string processing operations time and space-efficient by caching them. Instead of creating a new copy of string every time, this optimization method dictates to keep just one copy of string for every appropriate immutable distinct value and use the pointer reference wherever referred. ...

January 5, 2022

Structural subtyping in Python

I love using Go’s interface feature to declaratively define my public API structure. Consider this example: package main import ( "fmt" ) // Declare the interface. type Geometry interface { area() float64 perim() float64 } // Struct that represents a rectangle. type rect struct { width, height float64 } // Method to calculate the area of a rectangle instance. func (r *rect) area() float64 { return r.width * r.height } // Method to calculate the perimeter of a rectange instance. func (r *rect) perim() float64 { return 2 * (r.width + r.height) } // Notice that we're calling the methods on the interface, // not on the instance of the Rectangle struct directly. func measure(g Geometry) { fmt.Println(g) fmt.Println(g.area()) fmt.Println(g.perim()) } func main() { r := &rect{width: 3, height: 4} measure(r) } You can play around with the example here1. Running the example will print: ...

December 4, 2021

Automatic attribute delegation in Python composition

While trying to avoid inheritance in an API that I was working on, I came across this neat trick to perform attribute delegation on composed classes. Let’s say there’s a class called Engine and you want to put an engine instance in a Car. In this case, the car has a classic ‘has a’ (inheritance usually refers to ‘is a’ relationships) relationship with the engine. So, composition makes more sense than inheritance here. Consider this example: ...

November 28, 2021

Access 'classmethod's like 'property' methods in Python

I wanted to add a helper method to an Enum class. However, I didn’t want to make it a classmethod as property method made more sense in this particular case. Problem is, you aren’t supposed to initialize an enum class, and property methods can only be accessed from the instances of a class; not from the class itself. While sifting through Django 3.2’s codebase, I found this neat trick to make a classmethod that acts like a property method and can be accessed directly from the class without initializing it. ...

November 26, 2021

Don't add extensions to shell executables

I was browsing through the source code of Tom Christie’s typesystem1 library and discovered that the shell scripts2 of the project don’t have any extensions attached to them. At first, I found it odd, and then it all started to make sense. Executable scripts can be written in any language and the users don’t need to care about that. Also, not gonna lie, it looks cleaner this way. GitHub uses this [pattern]3 successfully to normalize their scripts. According to the pattern, every project should have a folder named scripts with a subset or superset of the following files: ...

November 23, 2021

Use __init_subclass__ hook to validate subclasses in Python

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice. Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass. ...

November 20, 2021

Running tqdm with Python multiprocessing

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose. The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like—multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc. """ Run `pip install tqdm` before running the script. The function `foo` is going to be executed 100 times across `MAX_WORKERS=5` processes. In a single pass, each process will get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming 5 elements of an iterable will require (100 / (5*5)) 4 passes to finish consuming the entire iterable of 100 elements. Tqdm progress bar will be updated after every `MAX_WORKERS*CHUNK_SIZE` iterations. """ # src.py from __future__ import annotations import multiprocessing as mp from tqdm import tqdm import time import random from dataclasses import dataclass MAX_WORKERS = 5 CHUNK_SIZE = 5 @dataclass class StartEnd: start: int end: int def foo(start_end: StartEnd) -> int: time.sleep(0.2) return random.randint(start_end.start, start_end.end) def main() -> None: inputs = [ StartEnd(start, end) for start, end in zip( range(0, 100), range(100, 200), ) ] with mp.Pool(processes=MAX_WORKERS) as pool: results = tqdm( pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE), total=len(inputs), ) # 'total' is redundant here but can be useful # when the size of the iterable is unobvious for result in results: print(result) if __name__ == "__main__": main() This will print: ...

November 18, 2021

Use daemon threads to test infinite while loops in Python

Python’s daemon threads are cool. A Python script will stop when the main thread is done and only daemon threads are running. To test a simple hello function that runs indefinitely, you can do the following: # test_hello.py from __future__ import annotations import asyncio import threading from functools import partial from unittest.mock import patch async def hello() -> None: while True: await asyncio.sleep(1) print("hello") @patch("asyncio.sleep", autospec=True) async def test_hello(mock_asyncio_sleep, capsys): run = partial(asyncio.run, hello()) t = threading.Thread(target=run, daemon=True) t.start() t.join(timeout=0.1) out, err = capsys.readouterr() assert err == "" assert "hello" in out mock_asyncio_sleep.assert_awaited() To execute the script, make sure you’ve your virtual env actiavated. Also you’ll need to install pytest and pytest-asyncio. Then run: ...

November 18, 2021

Use 'command -v' over 'which' to find a program's executable

One thing that came to me as news is that the command which—which is the de-facto tool to find the path of an executable—is not POSIX compliant. The recent Debian debacle1 around which brought it to my attention. The POSIX-compliant way of finding an executable program is command -v, which is usually built into most of the shells. So, instead of doing this: which python3.12 Do this: command -v which python3.12 Debian’s which hunt ↩︎ ...

November 16, 2021

Write git commit messages properly

Writing consistent commit messages helps you to weave a coherent story with your git history. Recently, I’ve started paying attention to my commit messages. Before this, my commit messages in this repository used to look like this: git log --oneline -5 d058a23 (HEAD -> master) bash strict mode a62e59b Updating functool partials til. 532b21a Added functool partials til ec9191c added unfinished indexing script 18e41c8 Bash tils With all the misuse of letter casings and punctuations, clearly, the message formatting is all over the place. To tame this mayhem, I’ve adopted these 7 rules of writing great commit messages: ...

November 11, 2021

Python's 'functools.partial' flattens nestings Automatically

The constructor for functools.partial() detects nesting and automatically flattens itself to a more efficient form. For example: from functools import partial def f(*, a: int, b: int, c: int) -> None: print(f"Args are {a}-{b}-{c}") g = partial(partial(partial(f, a=1), b=2), c=3) # Three function calls are flattened into one; free efficiency. print(g) # Bare function can be called as 3 arguments were bound previously. g() This returns: functools.partial(<function f at 0x7f4fd16c11f0>, a=1, b=2, c=3) Args are 1-2-3 Tweet by Raymond Hettinger 1 ↩︎ ...

November 8, 2021

Use curly braces while pasting shell commands

Pasting shell commands can be a pain when they include hidden return \n characters. In such a case, your shell will try to execute the command immediately. To prevent that, use curly braces { <cmd> } while pasting the command. Your command should look like the following: { dig +short google.com } Here, the spaces after the braces are significant.

November 8, 2021

Use strict mode while running bash scripts

Use unofficial bash strict mode while writing scripts. Bash has a few gotchas and this helps you to avoid that. For example: #!/bin/bash set -euo pipefail echo "Hello" Where, -e Exit immediately if a command exits with a non-zero status. -u Treat unset variables as an error when substituting. -o pipefail The return value of a pipeline is the status of the last command to exit with a non-zero status, or zero if no command exited with a non-zero status. References The idea is a less radical version of this post1. I don’t recommend messing with the IFS without a valid reason. ...

November 8, 2021

Pedantic configuration management with Pydantic

Managing configurations in your Python applications isn’t something you think about much often, until complexity starts to seep in and forces you to re-architect your initial approach. Ideally, your config management flow shouldn’t change across different applications or as your application begins to grow in size and complexity. Even if you’re writing a library, there should be a consistent config management process that scales up properly. Since I primarily spend my time writing data-analytics, data-science applications and expose them using Flask1 or FastAPI2 framework, I’ll be tacking config management from an application development perspective. ...

July 13, 2020

Interfaces, mixins and building powerful custom data structures in Python

Imagine a custom set-like data structure that doesn’t perform hashing and trades performance for tighter memory footprint. Or imagine a dict-like data structure that automatically stores data in a PostgreSQL or Redis database the moment you initialize it; also it lets you to get-set-delete key-value pairs using the usual retrieval-assignment-deletion syntax associated with built-in dictionaries. Custom data structures can give you the power of choice and writing them will make you understand how the built-in data structures in Python are constructed. ...

July 3, 2020

Deciphering Python's metaclasses

Updated on 2023-09-11: Fix broken URLs. In Python, metaclass is one of the few tools that enables you to inject metaprogramming capabilities into your code. The term metaprogramming refers to the potential for a program to manipulate itself in a self referential manner. However, messing with metaclasses is often considered an arcane art that’s beyond the grasp of the plebeians. Heck, even Tim Peters1 advices you to tread carefully while dealing with these. ...

June 26, 2020

Implementing proxy pattern in Python

In Python, there’s a saying that “design patterns are anti-patterns”. Also, in the realm of dynamic languages, design patterns have the notoriety of injecting additional abstraction layers to the core logic and making the flow gratuitously obscure. Python’s dynamic nature and the treatment of functions as first-class objects often make Java-ish design patterns redundant. Instead of littering your code with seemingly over-engineered patterns, you can almost always take the advantage of Python’s first-class objects, duck-typing, monkey-patching etc to accomplish the task at hand. However, recently there is one design pattern that I find myself using over and over again to write more maintainable code and that is the Proxy pattern. So I thought I’d document it here for future reference. ...

June 16, 2020