Amphibian decorators in Python

Whether you like it or not, the split world of sync and async functions in the Python ecosystem is something we’ll have to live with; at least for now. So, having to write things that work with both sync and async code is an inevitable part of the journey. Projects like Starlette1, HTTPx2 can give you some clever pointers on how to craft APIs that are compatible with both sync and async code. ...

February 6, 2022

Go Rusty with exception handling in Python

While grokking Black formatter’s codebase, I came across this1 interesting way of handling exceptions in Python. Exception handling in Python usually follows the EAFP paradigm where it’s easier to ask for forgiveness than permission. However, Rust has this recoverable error2 handling workflow that leverages generic Enums. I wanted to explore how Black emulates that in Python. This is how it works: # src.py from __future__ import annotations from typing import Generic, TypeVar, Union T = TypeVar("T") E = TypeVar("E", bound=Exception) class Ok(Generic[T]): def __init__(self, value: T) -> None: self._value = value def ok(self) -> T: return self._value class Err(Generic[E]): def __init__(self, e: E) -> None: self._e = e def err(self) -> E: return self._e Result = Union[Ok[T], Err[E]] In the above snippet, two generic types Ok and Err represent the return type and the error types of a callable respectively. These two generics were then combined into one Result generic type. You’d use the Result generic to handle exceptions as follows: ...

February 2, 2022

Variance of generic types in Python

I’ve always had a hard time explaining variance of generic types while working with type annotations in Python. This is an attempt to distill the things I’ve picked up on type variance while going through PEP-483. A pinch of type theory A generic type is a class or interface that is parameterized over types. Variance refers to how subtyping between the generic types relates to subtyping between their parameters' types. ...

January 31, 2022

Create a sub dictionary with O(K) complexity in Python

How’d you create a sub dictionary from a dictionary where the keys of the sub-dict are provided as a list? I was reading a tweet1 by Ned Bachelder on this today and that made me realize that I usually solve it with O(DK) complexity, where K is the length of the sub-dict keys and D is the length of the primary dict. Here’s how I usually do that without giving it any thoughts or whatsoever: ...

January 30, 2022

Gotchas of early-bound function argument defaults in Python

I was reading a tweet about it yesterday and that didn’t stop me from pushing a code change in production with the same rookie mistake today. Consider this function: # src.py from __future__ import annotations import logging import time from datetime import datetime def log( message: str, /, *, level: str, timestamp: str = datetime.utcnow().isoformat(), ) -> None: logger = getattr(logging, level) # Avoid f-string in logging as it's not lazy. logger("Timestamp: %s \nMessage: %s\n" % (timestamp, message)) if __name__ == "__main__": for _ in range(3): time.sleep(1) log("Reality can often be disappointing.", level="warning") Here, the function log has a parameter timestamp that computes its default value using the built-in datetime.utcnow().isoformat() method. I was under the impression that the timestamp parameter would be computed each time when the log function was called. However, that’s not what happens when you try to run it. If you run the above snippet, you’ll get this instead: ...

January 27, 2022

Use 'assertIs' to check literal booleans in Python unittest

I used to use Unittest’s self.assertTrue / self.assertFalse to check both literal booleans and truthy/falsy values in Unittest. Committed the same sin while writing tests in Django. I feel like assertTrue and assertFalse are misnomers. They don’t specifically check literal booleans, only truthy and falsy states respectively. Consider this example: # src.py import unittest class TestFoo(unittest.TestCase): def setUp(self): self.true_literal = True self.false_literal = False self.truthy = [True] self.falsy = [] def is_true(self): self.assertTrue(self.true_literal, True) def is_false(self): self.assertFalse(self.false_literal, True) def is_truthy(self): self.assertTrue(self.truthy, True) def is_falsy(self): self.assertFalse(self.falsy, True) if __name__ == "__main__": unittest.main() In the above snippet, I’ve used assertTrue and assertFalse to check both literal booleans and truthy/falsy values. However, to test the literal boolean values, assertIs works better and is more explicit. Here’s how to do the above test properly: ...

January 24, 2022

Static typing Python decorators

Accurately static typing decorators in Python is an icky business. The wrapper function obfuscates type information required to statically determine the types of the parameters and the return values of the wrapped function. Let’s write a decorator that registers the decorated functions in a global dictionary during function definition time. Here’s how I used to annotate it: # src.py # Import 'Callable' from 'typing' module in < Py3.9. from collections.abc import Callable from functools import wraps from typing import Any, TypeVar R = TypeVar("R") funcs = {} def register(func: Callable[..., R]) -> Callable[..., R]: """Register any function at definition time in the 'funcs' dict.""" # Registers the function during function defition time. funcs[func.__name__] = func @wraps(func) def inner(*args: Any, **kwargs: Any) -> Any: return func(*args, **kwargs) return inner @register def hello(name: str) -> str: return f"Hello {name}!" The functools.wraps decorator makes sure that the identity and the docstring of the wrapped function don’t get gobbled up by the decorator. This is syntactically correct and if you run Mypy against the code snippet, it’ll happily tell you that everything’s alright. However, this doesn’t exactly do anything. If you call the hello function with the wrong type of parameter, Mypy won’t be able to detect the mistake statically. Notice this: ...

January 23, 2022

Inspect docstrings with Pydoc

How come I didn’t know about the python -m pydoc command before today! It lets you inspect the docstrings of any modules, classes, functions, or methods in Python. I’m running the commands from a Python 3.10 virtual environment but it’ll work on any Python version. Let’s print out the docstrings of the functools.lru_cache function. Run: python -m pydoc functools.lru_cache This will print the following on the console: Help on function lru_cache in functools: functools.lru_cache = lru_cache(maxsize=128, typed=False) Least-recently-used cache decorator. If *maxsize* is set to None, the LRU features are disabled and the cache can grow without bound. If *typed* is True, arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Arguments to the cached function must be hashable. View the cache statistics named tuple (hits, misses, maxsize, currsize) with f.cache_info(). Clear the cache and statistics with f.cache_clear(). Access the underlying function with f.__wrapped__. Works for third party tools as well: ...

January 22, 2022

Check whether an integer is a power of two in Python

To check whether an integer is a power of two, I’ve deployed hacks like this: def is_power_of_two(x: int) -> bool: return x > 0 and hex(x)[-1] in ("0", "2", "4", "8") While this works1, I’ve never liked explaining the pattern matching hack that’s going on here. Today, I came across this tweet2 by Raymond Hettinger where he proposed an elegant solution to the problem. Here’s how it goes: def is_power_of_two(x: int) -> bool: return x > 0 and x.bit_count() == 1 This is neat as there’s no hack and it uses a mathematical invariant to check whether an integer is a power of 2 or not. Also, it’s a tad bit faster. ...

January 21, 2022

Uniform error response in Django Rest Framework

Django Rest Framework exposes a neat hook to customize the response payload of your API when errors occur. I was going through Microsoft’s REST API guideline1 and wanted to make the error response of my APIs more uniform and somewhat similar to this2. I’ll use a modified version of the quickstart example3 in the DRF docs to show how to achieve that. Also, we’ll need a POST API to demonstrate the changes better. Here’s the same example with the added POST API. Place this code in the project’s urls.py file. ...

January 20, 2022

Difference between constrained 'TypeVar' and 'Union' in Python

If you want to define a variable that can accept values of multiple possible types, using typing.Union is one way of doing that: from typing import Union U = Union[int, str] However, there’s another way you can express a similar concept via constrained TypeVar. You’d do so as follows: from typing import TypeVar T = TypeVar("T", int, str) So, what’s the difference between these two and when to use which? The primary difference is: T’s type needs to be consistent across multiple uses within a given scope, while U’s doesn’t. ...

January 19, 2022

Don't wrap instance methods with 'functools.lru_cache' decorator in Python

Recently, fell into this trap as I wanted to speed up a slow instance method by caching it. When you decorate an instance method with functools.lru_cache decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them. Let’s consider this example: # src.py import functools import time from typing import TypeVar Number = TypeVar("Number", int, float, complex) class SlowAdder: def __init__(self, delay: int = 1) -> None: self.delay = delay @functools.lru_cache def calculate(self, *args: Number) -> Number: time.sleep(self.delay) return sum(args) def __del__(self) -> None: print("Deleting instance ...") # Create a SlowAdder instance. slow_adder = SlowAdder(2) # Measure performance. start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") Here, I’ve created a simple SlowAdder class that accepts a delay value; then it sleeps for delay seconds and calculates the sum of the inputs in the calculate method. To avoid this slow recalculation for the same arguments, the calculate method was wrapped in the lru_cache decorator. The __del__ method notifies us when the garbage collection has successfully cleaned up instances of the class. ...

January 15, 2022

Cropping texts in Python with 'textwrap.shorten'

Problem A common interview question that I’ve seen goes as follows: Write a function to crop a text corpus without breaking any word. Take the length of the text up to which character you should trim. Make sure that the cropped text doesn’t have any trailing space. Try to maximize the number of words you can pack in your trimmed text. Your function should look something like this: def crop(text: str, limit: int) -> str: """Crops 'text' upto 'limit' characters.""" # Crop the text. cropped_text = perform_crop() return cropped_text For example, if text looks like this— ...

January 6, 2022

String interning in Python

I was reading the source code1 of the reference implementation of “PEP-661: Sentinel Values”2 and discovered an optimization technique known as String interning. Modern programming languages like Java, Python, PHP, Ruby, Julia, etc, performs string interning to make their string operations more performant. String interning String interning makes common string processing operations time and space-efficient by caching them. Instead of creating a new copy of string every time, this optimization method dictates to keep just one copy of string for every appropriate immutable distinct value and use the pointer reference wherever referred. ...

January 5, 2022

Structural subtyping in Python

I love using Go’s interface feature to declaratively define my public API structure. Consider this example: package main import ( "fmt" ) // Declare the interface. type Geometry interface { area() float64 perim() float64 } // Struct that represents a rectangle. type rect struct { width, height float64 } // Method to calculate the area of a rectangle instance. func (r *rect) area() float64 { return r.width * r.height } // Method to calculate the perimeter of a rectange instance. func (r *rect) perim() float64 { return 2 * (r.width + r.height) } // Notice that we're calling the methods on the interface, // not on the instance of the Rectangle struct directly. func measure(g Geometry) { fmt.Println(g) fmt.Println(g.area()) fmt.Println(g.perim()) } func main() { r := &rect{width: 3, height: 4} measure(r) } You can play around with the example here1. Running the example will print: ...

December 4, 2021

Automatic attribute delegation in Python composition

While trying to avoid inheritance in an API that I was working on, I came across this neat trick to perform attribute delegation on composed classes. Let’s say there’s a class called Engine and you want to put an engine instance in a Car. In this case, the car has a classic ‘has a’ (inheritance usually refers to ‘is a’ relationships) relationship with the engine. So, composition makes more sense than inheritance here. Consider this example: ...

November 28, 2021

Access 'classmethod's like 'property' methods in Python

I wanted to add a helper method to an Enum class. However, I didn’t want to make it a classmethod as property method made more sense in this particular case. Problem is, you aren’t supposed to initialize an enum class, and property methods can only be accessed from the instances of a class; not from the class itself. While sifting through Django 3.2’s codebase, I found this neat trick to make a classmethod that acts like a property method and can be accessed directly from the class without initializing it. ...

November 26, 2021

Don't add extensions to shell executables

I was browsing through the source code of Tom Christie’s typesystem1 library and discovered that the shell scripts2 of the project don’t have any extensions attached to them. At first, I found it odd, and then it all started to make sense. Executable scripts can be written in any language and the users don’t need to care about that. Also, not gonna lie, it looks cleaner this way. GitHub uses this [pattern]3 successfully to normalize their scripts. According to the pattern, every project should have a folder named scripts with a subset or superset of the following files: ...

November 23, 2021

Use __init_subclass__ hook to validate subclasses in Python

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice. Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass. ...

November 20, 2021

Running tqdm with Python multiprocessing

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose. The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like—multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc. """ Run `pip install tqdm` before running the script. The function `foo` is going to be executed 100 times across `MAX_WORKERS=5` processes. In a single pass, each process will get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming 5 elements of an iterable will require (100 / (5*5)) 4 passes to finish consuming the entire iterable of 100 elements. Tqdm progress bar will be updated after every `MAX_WORKERS*CHUNK_SIZE` iterations. """ # src.py from __future__ import annotations import multiprocessing as mp from tqdm import tqdm import time import random from dataclasses import dataclass MAX_WORKERS = 5 CHUNK_SIZE = 5 @dataclass class StartEnd: start: int end: int def foo(start_end: StartEnd) -> int: time.sleep(0.2) return random.randint(start_end.start, start_end.end) def main() -> None: inputs = [ StartEnd(start, end) for start, end in zip( range(0, 100), range(100, 200), ) ] with mp.Pool(processes=MAX_WORKERS) as pool: results = tqdm( pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE), total=len(inputs), ) # 'total' is redundant here but can be useful # when the size of the iterable is unobvious for result in results: print(result) if __name__ == "__main__": main() This will print: ...

November 18, 2021