Python

Structural subtyping in Python

I love using Go’s interface feature to declaratively define my public API structure. Consider this example: package main import ( "fmt" ) // Declare the interface. type Geometry interface { area() float64 perim() float64 } // Struct that represents a rectangle. type rect struct { width, height float64 } // Method to calculate the area of a rectangle instance. func (r *rect) area() float64 { return r.width * r.height } // Method to calculate the perimeter of a rectange instance. func (r *rect) perim() float64 { return 2 * (r.width + r.height) } // Notice that we're calling the methods on the interface, // not on the instance of the Rectangle struct directly. func measure(g Geometry) { fmt.Println(g) fmt.Println(g.area()) fmt.Println(g.perim()) } func main() { r := &rect{width: 3, height: 4} measure(r) } You can play around with the example here1. Running the example will print: ...

Automatic attribute delegation in Python composition

While trying to avoid inheritance in an API that I was working on, I came across this neat trick to perform attribute delegation on composed classes. Let’s say there’s a class called Engine and you want to put an engine instance in a Car. In this case, the car has a classic ‘has a’ (inheritance usually refers to ‘is a’ relationships) relationship with the engine. So, composition makes more sense than inheritance here. Consider this example: ...

Access 'classmethod's like 'property' methods in Python

I wanted to add a helper method to an Enum class. However, I didn’t want to make it a classmethod as property method made more sense in this particular case. Problem is, you aren’t supposed to initialize an enum class, and property methods can only be accessed from the instances of a class; not from the class itself. While sifting through Django 3.2’s codebase, I found this neat trick to make a classmethod that acts like a property method and can be accessed directly from the class without initializing it. ...

Use __init_subclass__ hook to validate subclasses in Python

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice. Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass. ...

Running tqdm with Python multiprocessing

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose. The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like—multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc. """ Run `pip install tqdm` before running the script. The function `foo` is going to be executed 100 times across `MAX_WORKERS=5` processes. In a single pass, each process will get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming 5 elements of an iterable will require (100 / (5*5)) 4 passes to finish consuming the entire iterable of 100 elements. Tqdm progress bar will be updated after every `MAX_WORKERS*CHUNK_SIZE` iterations. """ # src.py from __future__ import annotations import multiprocessing as mp from tqdm import tqdm import time import random from dataclasses import dataclass MAX_WORKERS = 5 CHUNK_SIZE = 5 @dataclass class StartEnd: start: int end: int def foo(start_end: StartEnd) -> int: time.sleep(0.2) return random.randint(start_end.start, start_end.end) def main() -> None: inputs = [ StartEnd(start, end) for start, end in zip( range(0, 100), range(100, 200), ) ] with mp.Pool(processes=MAX_WORKERS) as pool: results = tqdm( pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE), total=len(inputs), ) # 'total' is redundant here but can be useful # when the size of the iterable is unobvious for result in results: print(result) if __name__ == "__main__": main() This will print: ...

Use daemon threads to test infinite while loops in Python

Python’s daemon threads are cool. A Python script will stop when the main thread is done and only daemon threads are running. To test a simple hello function that runs indefinitely, you can do the following: # test_hello.py from __future__ import annotations import asyncio import threading from functools import partial from unittest.mock import patch async def hello() -> None: while True: await asyncio.sleep(1) print("hello") @patch("asyncio.sleep", autospec=True) async def test_hello(mock_asyncio_sleep, capsys): run = partial(asyncio.run, hello()) t = threading.Thread(target=run, daemon=True) t.start() t.join(timeout=0.1) out, err = capsys.readouterr() assert err == "" assert "hello" in out mock_asyncio_sleep.assert_awaited() To execute the script, make sure you’ve your virtual env actiavated. Also you’ll need to install pytest and pytest-asyncio. Then run: ...

Python's 'functools.partial' flattens nestings Automatically

The constructor for functools.partial() detects nesting and automatically flattens itself to a more efficient form. For example: from functools import partial def f(*, a: int, b: int, c: int) -> None: print(f"Args are {a}-{b}-{c}") g = partial(partial(partial(f, a=1), b=2), c=3) # Three function calls are flattened into one; free efficiency. print(g) # Bare function can be called as 3 arguments were bound previously. g() This returns: functools.partial(<function f at 0x7f4fd16c11f0>, a=1, b=2, c=3) Args are 1-2-3 Tweet by Raymond Hettinger 1 ↩︎ ...

Pedantic configuration management with Pydantic

Managing configurations in your Python applications isn’t something you think about much often, until complexity starts to seep in and forces you to re-architect your initial approach. Ideally, your config management flow shouldn’t change across different applications or as your application begins to grow in size and complexity. Even if you’re writing a library, there should be a consistent config management process that scales up properly. Since I primarily spend my time writing data-analytics, data-science applications and expose them using Flask1 or FastAPI2 framework, I’ll be tacking config management from an application development perspective. ...

Interfaces, mixins and building powerful custom data structures in Python

Imagine a custom set-like data structure that doesn’t perform hashing and trades performance for tighter memory footprint. Or imagine a dict-like data structure that automatically stores data in a PostgreSQL or Redis database the moment you initialize it; also it lets you to get-set-delete key-value pairs using the usual retrieval-assignment-deletion syntax associated with built-in dictionaries. Custom data structures can give you the power of choice and writing them will make you understand how the built-in data structures in Python are constructed. ...

Deciphering Python's metaclasses

Updated on 2023-09-11: Fix broken URLs. In Python, metaclass is one of the few tools that enables you to inject metaprogramming capabilities into your code. The term metaprogramming refers to the potential for a program to manipulate itself in a self referential manner. However, messing with metaclasses is often considered an arcane art that’s beyond the grasp of the plebeians. Heck, even Tim Peters1 advices you to tread carefully while dealing with these. ...

Implementing proxy pattern in Python

In Python, there’s a saying that “design patterns are anti-patterns”. Also, in the realm of dynamic languages, design patterns have the notoriety of injecting additional abstraction layers to the core logic and making the flow gratuitously obscure. Python’s dynamic nature and the treatment of functions as first-class objects often make Java-ish design patterns redundant. Instead of littering your code with seemingly over-engineered patterns, you can almost always take the advantage of Python’s first-class objects, duck-typing, monkey-patching etc to accomplish the task at hand. However, recently there is one design pattern that I find myself using over and over again to write more maintainable code and that is the Proxy pattern. So I thought I’d document it here for future reference. ...

Effortless API response caching with Python & Redis

Updated on 2023-09-11: Fix broken URLs. Recently, I was working with MapBox’s1 Route optimization API2. Basically, it tries to solve the traveling salesman problem3 where you provide the API with coordinates of multiple places and it returns a duration-optimized route between those locations. This is a perfect usecase where Redis4 caching can come handy. Redis is a fast and lightweight in-memory database with additional persistence options; making it a perfect candidate for the task at hand. Here, caching can save you from making redundant API requests and also, it can dramatically improve the response time as well. ...

Untangling Python decorators

Updated on 2022-02-13: Change functools import style. When I first learned about Python decorators, using them felt like doing voodoo magic. Decorators can give you the ability to add new functionalities to any callable without actually touching or changing the code inside it. This can typically yield better encapsulation and help you write cleaner and more understandable code. However, decorator is considered as a fairly advanced topic in Python since understanding and writing it requires you to have command over multiple additional concepts like first class objects, higher order functions, closures etc. First, I’ll try to introduce these concepts as necessary and then unravel the core concept of decorator layer by layer. So let’s dive in. ...

Effortless concurrency with Python's concurrent.futures

Writing concurrent code in Python can be tricky. Before you even start, you have to worry about all these icky stuff like whether the task at hand is I/O or CPU bound or whether putting the extra effort to achieve concurrency is even going to give you the boost you need. Also, the presence of Global Interpreter Lock, GIL1 foists further limitations on writing truly concurrent code. But for the sake of sanity, you can oversimplify it like this without being blatantly incorrect: ...

No really, Python's pathlib is great

When I first encountered Python’s pathlib module for path manipulation, I brushed it aside assuming it to be just an OOP way of doing what os.path already does quite well. The official doc also dubs it as the Object-oriented filesystem paths. However, back in 2019 when ticket1 confirmed that Django was replacing os.path with pathlib, I got curious. The os.path module has always been the de facto standard for working with paths in Python. But the API can feel massive as it performs a plethora of other loosely coupled system related jobs. I’ve to look things up constantly even to perform some of the most basic tasks like joining multiple paths, listing all the files in a folder having a particular extension, opening multiple files in a directory etc. The pathlib module can do nearly everything that os.path offers and comes with some additional cherries on top. ...

Running Python linters with pre-commit hooks

Pre-commit hooks1 can be a neat way to run automated ad-hoc tasks before submitting a new git commit. These tasks may include linting, trimming trailing whitespaces, running code formatter before code reviews etc. Let’s see how multiple Python linters and formatters can be applied automatically before each commit to impose strict conformity on your codebase. To keep my sanity, I only use three linters in all of my python projects: ...

Generic functions with Python's singledispatch

Updated on 2022-02-13: Change import style of functools.singledispatch. Recently, I was refactoring a portion of a Python function that somewhat looked like this: def process(data): if cond0 and cond1: # apply func01 on data that satisfies the cond0 & cond1 return func01(data) elif cond2 or cond3: # apply func23 on data that satisfies the cond2 & cond3 return func23(data) elif cond4 and cond5: # apply func45 on data that satisfies cond4 & cond5 return func45(data) def func01(data): ... def func23(data): ... def func45(data): ... This pattern gets tedious when the number of conditions and actionable functions start to grow. I was looking for a functional approach to avoid defining and calling three different functions that do very similar things. Situations like this is where parametric polymorphism1 comes into play. The idea is, you have to define a single function that’ll be dynamically overloaded with alternative implementations based on the type of the function arguments. ...

The curious case of Python's context manager

Python’s context managers are great for resource management and stopping the propagation of leaked abstractions. You’ve probably used it while opening a file or a database connection. Usually it starts with a with statement like this: with open("file.txt", "wt") as f: f.write("contents go here") In the above case, file.txt gets automatically closed when the execution flow goes out of the scope. This is equivalent to writing: try: f = open("file.txt", "wt") text = f.write("contents go here") finally: f.close() Writing custom context managers To write a custom context manager, you need to create a class that includes the __enter__ and __exit__ methods. Let’s recreate a custom context manager that will execute the same workflow as above. ...

Reduce boilerplate code with Python's dataclasses

Recently, my work needed me to create lots of custom data types and draw comparison among them. So, my code was littered with many classes that somewhat looked like this: class CartesianPoint: def __init__(self, x, y, z): self.x = x self.y = y self.z = z def __repr__(self): return f"CartesianPoint(x = {self.x}, y = {self.y}, z = {self.z})" print(CartesianPoint(1, 2, 3)) >>> CartesianPoint(x = 1, y = 2, z = 3) This class only creates a CartesianPoint type and shows a pretty output of the instances created from it. However, it already has two methods inside, __init__ and __repr__ that do not do much. ...