Compose multiple levels of fixtures in pytest

While reading the second version of Brian Okken’s pytest book1, I came across this neat trick to compose multiple levels of fixtures. Suppose, you want to create a fixture that returns some canned data from a database. Now, let’s say that invoking the fixture multiple times is expensive, and to avoid that you want to run it only once per test session. However, you still want to clear all the database states after each test function runs. Otherwise, a test might inadvertently get coupled with another test that runs before it via the fixture’s shared state. Let’s demonstrate this: ...

July 21, 2022

When to use 'git pull --rebase'

Whenever your local branch diverges from the remote branch, you can’t directly pull from the remote branch and merge it into the local branch. This can happen when, for example: You checkout from the main branch to work on a feature in a branch named alice. When you’re done, you merge alice into main. After that, if you try to pull the main branch from remote again and the content of the main branch changes by this time, you’ll encounter a merge error. Reproduce the issue Create a new branch named alice from main. Run: ...

July 14, 2022

Distil git logs attached to a single file

I run git log --oneline to list out the commit logs all the time. It prints out a compact view of the git history. Running the command in this repo gives me this: d9fad76 Publish blog on safer operator.itemgetter, closes #130 0570997 Merge pull request #129 from rednafi/dependabot/... 6967f73 Bump actions/setup-python from 3 to 4 48c8634 Merge pull request #128 from rednafi/dependabot/pip/mypy-0.961 5b7a7b0 Bump mypy from 0.960 to 0.961 However, there are times when I need to list out the commit logs that only represent the changes made to a particular file. Here’s the command that does exactly that. ...

June 21, 2022

Health check a server with 'nohup $(cmd) &'

While working on a project with EdgeDB1 and FastAPI2, I wanted to perform health checks against the FastAPI server in the GitHub CI. This would notify me about the working state of the application. The idea is to: Run the server in the background. Run the commands against the server that’ll denote that the app is in a working state. Perform cleanup. Exit with code 0 if the check is successful, else exit with code 1. The following shell script demonstrates a similar workflow with a Python HTTP server. This script: ...

April 18, 2022

Caching connection objects in Python

To avoid instantiating multiple DB connections in Python apps, a common approach is to initialize the connection objects in a module once and then import them everywhere. So, you’d do this: # src.py import boto3 # Pip install boto3 import redis # Pip install redis dynamo_client = boto3.client("dynamodb") redis_client = redis.Redis() However, this adds import time side effects to your module and can turn out to be expensive. In search of a better solution, my first instinct was to go for functools.lru_cache(None) to immortalize the connection objects in memory. It works like this: ...

March 16, 2022

Create a sub dictionary with O(K) complexity in Python

How’d you create a sub dictionary from a dictionary where the keys of the sub-dict are provided as a list? I was reading a tweet1 by Ned Bachelder on this today and that made me realize that I usually solve it with O(DK) complexity, where K is the length of the sub-dict keys and D is the length of the primary dict. Here’s how I usually do that without giving it any thoughts or whatsoever: ...

January 30, 2022

Use 'assertIs' to check literal booleans in Python unittest

I used to use Unittest’s self.assertTrue / self.assertFalse to check both literal booleans and truthy/falsy values in Unittest. Committed the same sin while writing tests in Django. I feel like assertTrue and assertFalse are misnomers. They don’t specifically check literal booleans, only truthy and falsy states respectively. Consider this example: # src.py import unittest class TestFoo(unittest.TestCase): def setUp(self): self.true_literal = True self.false_literal = False self.truthy = [True] self.falsy = [] def is_true(self): self.assertTrue(self.true_literal, True) def is_false(self): self.assertFalse(self.false_literal, True) def is_truthy(self): self.assertTrue(self.truthy, True) def is_falsy(self): self.assertFalse(self.falsy, True) if __name__ == "__main__": unittest.main() In the above snippet, I’ve used assertTrue and assertFalse to check both literal booleans and truthy/falsy values. However, to test the literal boolean values, assertIs works better and is more explicit. Here’s how to do the above test properly: ...

January 24, 2022

Check whether an integer is a power of two in Python

To check whether an integer is a power of two, I’ve deployed hacks like this: def is_power_of_two(x: int) -> bool: return x > 0 and hex(x)[-1] in ("0", "2", "4", "8") While this works1, I’ve never liked explaining the pattern matching hack that’s going on here. Today, I came across this tweet2 by Raymond Hettinger where he proposed an elegant solution to the problem. Here’s how it goes: def is_power_of_two(x: int) -> bool: return x > 0 and x.bit_count() == 1 This is neat as there’s no hack and it uses a mathematical invariant to check whether an integer is a power of 2 or not. Also, it’s a tad bit faster. ...

January 21, 2022

Uniform error response in Django Rest Framework

Django Rest Framework exposes a neat hook to customize the response payload of your API when errors occur. I was going through Microsoft’s REST API guideline1 and wanted to make the error response of my APIs more uniform and somewhat similar to this2. I’ll use a modified version of the quickstart example3 in the DRF docs to show how to achieve that. Also, we’ll need a POST API to demonstrate the changes better. Here’s the same example with the added POST API. Place this code in the project’s urls.py file. ...

January 20, 2022

Difference between constrained 'TypeVar' and 'Union' in Python

If you want to define a variable that can accept values of multiple possible types, using typing.Union is one way of doing that: from typing import Union U = Union[int, str] However, there’s another way you can express a similar concept via constrained TypeVar. You’d do so as follows: from typing import TypeVar T = TypeVar("T", int, str) So, what’s the difference between these two and when to use which? The primary difference is: T’s type needs to be consistent across multiple uses within a given scope, while U’s doesn’t. ...

January 19, 2022

Don't wrap instance methods with 'functools.lru_cache' decorator in Python

Recently, fell into this trap as I wanted to speed up a slow instance method by caching it. When you decorate an instance method with functools.lru_cache decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them. Let’s consider this example: # src.py import functools import time from typing import TypeVar Number = TypeVar("Number", int, float, complex) class SlowAdder: def __init__(self, delay: int = 1) -> None: self.delay = delay @functools.lru_cache def calculate(self, *args: Number) -> Number: time.sleep(self.delay) return sum(args) def __del__(self) -> None: print("Deleting instance ...") # Create a SlowAdder instance. slow_adder = SlowAdder(2) # Measure performance. start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") start_time = time.perf_counter() # ---------------------------------------------- result = slow_adder.calculate(1, 2) # ---------------------------------------------- end_time = time.perf_counter() print(f"Calculation took {end_time-start_time} seconds, result: {result}.") Here, I’ve created a simple SlowAdder class that accepts a delay value; then it sleeps for delay seconds and calculates the sum of the inputs in the calculate method. To avoid this slow recalculation for the same arguments, the calculate method was wrapped in the lru_cache decorator. The __del__ method notifies us when the garbage collection has successfully cleaned up instances of the class. ...

January 15, 2022

Cropping texts in Python with 'textwrap.shorten'

Problem A common interview question that I’ve seen goes as follows: Write a function to crop a text corpus without breaking any word. Take the length of the text up to which character you should trim. Make sure that the cropped text doesn’t have any trailing space. Try to maximize the number of words you can pack in your trimmed text. Your function should look something like this: def crop(text: str, limit: int) -> str: """Crops 'text' upto 'limit' characters.""" # Crop the text. cropped_text = perform_crop() return cropped_text For example, if text looks like this— ...

January 6, 2022

Automatic attribute delegation in Python composition

While trying to avoid inheritance in an API that I was working on, I came across this neat trick to perform attribute delegation on composed classes. Let’s say there’s a class called Engine and you want to put an engine instance in a Car. In this case, the car has a classic ‘has a’ (inheritance usually refers to ‘is a’ relationships) relationship with the engine. So, composition makes more sense than inheritance here. Consider this example: ...

November 28, 2021

Don't add extensions to shell executables

I was browsing through the source code of Tom Christie’s typesystem1 library and discovered that the shell scripts2 of the project don’t have any extensions attached to them. At first, I found it odd, and then it all started to make sense. Executable scripts can be written in any language and the users don’t need to care about that. Also, not gonna lie, it looks cleaner this way. GitHub uses this [pattern]3 successfully to normalize their scripts. According to the pattern, every project should have a folder named scripts with a subset or superset of the following files: ...

November 23, 2021

Use __init_subclass__ hook to validate subclasses in Python

At my workplace, we have a fairly large Celery config file where you’re expected to subclass from a base class and extend that if there’s a new domain. However, the subclass expects the configuration in a specific schema. So, having a way to enforce that schema in the subclasses and raising appropriate runtime exceptions is nice. Wrote a fancy Python 3.6+ __init_subclasshook__ to validate the subclasses as below. This is neater than writing a metaclass. ...

November 20, 2021

Running tqdm with Python multiprocessing

Making tqdm play nice with multiprocessing requires some additional work. It’s not always obvious and I don’t want to add another third-party dependency just for this purpose. The following example attempts to make tqdm work with multiprocessing.imap_unordered. However, this should also work with similar mapping methods like—multiprocessing.map, multiprocessing.imap, multiprocessing.starmap, etc. """ Run `pip install tqdm` before running the script. The function `foo` is going to be executed 100 times across `MAX_WORKERS=5` processes. In a single pass, each process will get an iterable of size `CHUNK_SIZE=5`. So 5 processes each consuming 5 elements of an iterable will require (100 / (5*5)) 4 passes to finish consuming the entire iterable of 100 elements. Tqdm progress bar will be updated after every `MAX_WORKERS*CHUNK_SIZE` iterations. """ # src.py from __future__ import annotations import multiprocessing as mp from tqdm import tqdm import time import random from dataclasses import dataclass MAX_WORKERS = 5 CHUNK_SIZE = 5 @dataclass class StartEnd: start: int end: int def foo(start_end: StartEnd) -> int: time.sleep(0.2) return random.randint(start_end.start, start_end.end) def main() -> None: inputs = [ StartEnd(start, end) for start, end in zip( range(0, 100), range(100, 200), ) ] with mp.Pool(processes=MAX_WORKERS) as pool: results = tqdm( pool.imap_unordered(foo, inputs, chunksize=CHUNK_SIZE), total=len(inputs), ) # 'total' is redundant here but can be useful # when the size of the iterable is unobvious for result in results: print(result) if __name__ == "__main__": main() This will print: ...

November 18, 2021

Use 'command -v' over 'which' to find a program's executable

One thing that came to me as news is that the command which—which is the de-facto tool to find the path of an executable—is not POSIX compliant. The recent Debian debacle1 around which brought it to my attention. The POSIX-compliant way of finding an executable program is command -v, which is usually built into most of the shells. So, instead of doing this: which python3.12 Do this: command -v which python3.12 Debian’s which hunt ↩︎ ...

November 16, 2021

Use curly braces while pasting shell commands

Pasting shell commands can be a pain when they include hidden return \n characters. In such a case, your shell will try to execute the command immediately. To prevent that, use curly braces { <cmd> } while pasting the command. Your command should look like the following: { dig +short google.com } Here, the spaces after the braces are significant.

November 8, 2021

Use strict mode while running bash scripts

Use unofficial bash strict mode while writing scripts. Bash has a few gotchas and this helps you to avoid that. For example: #!/bin/bash set -euo pipefail echo "Hello" Where, -e Exit immediately if a command exits with a non-zero status. -u Treat unset variables as an error when substituting. -o pipefail The return value of a pipeline is the status of the last command to exit with a non-zero status, or zero if no command exited with a non-zero status. References The idea is a less radical version of this post1. I don’t recommend messing with the IFS without a valid reason. ...

November 8, 2021