Escaping the template pattern hellscape in Python

Over the years, I’ve used the template pattern1 across multiple OO languages with varying degrees of success. It was one of the first patterns I learned in the primordial hours of my software engineering career, and for some reason, it just feels like the natural way to tackle many real-world code-sharing problems. Yet, even before I jumped on board with the composition over inheritance2 camp, I couldn’t help but notice how using this particular inheritance technique spawns all sorts of design and maintenance headaches as the codebase starts to grow....

July 1, 2023

Taming conditionals with bitmasks

The 100k context window of Claude 21 has been a huge boon for me since now I can paste a moderately complex problem to the chat window and ask questions about it. In that spirit, it recently refactored some pretty gnarly conditional logic for me in such an elegant manner that it absolutely blew me away. Now, I know how bitmasks2 work and am aware of the existence of enum.Flag3 in Python....

July 29, 2023

Memory leakage in Python descriptors

Unless I’m hand rolling my own ORM-like feature or validation logic, I rarely need to write custom descriptors in Python. The built-in descriptor magics like @classmethod, @property, @staticmethod, and vanilla instance methods usually get the job done. However, every time I need to dig my teeth into descriptors, I reach for this fantastic how to1 guide by Raymond Hettinger. You should definitely set aside the time to read it if you haven’t already....

July 16, 2023

Unix-style pipelining with Python's subprocess module

Python offers a ton of ways like os.system or os.spawn* to create new processes and run arbitrary commands in your system. However, the documentation usually encourages you to use the subprocess1 module for creating and managing child processes. The subprocess module exposes a high-level run() function that provides a simple interface for running a subprocess and waiting for it to complete. It accepts the command to run as a list of strings, starts the subprocess, waits for it to finish, and then returns a CompletedProcess object with information about the result....

July 14, 2023

Enabling repeatable lazy iterations in Python

The current title of this post is probably incorrect and may even be misleading. I had a hard time coming up with a suitable name for it. But the idea goes like this: sometimes you might find yourself in a situation where you need to iterate through a generator more than once. Sure, you can use an iterable like a tuple or list to allow multiple iterations, but if the number of elements is large, that’ll cause an OOM error....

July 13, 2023

Python dependency management redux

One major drawback of Python’s huge ecosystem is the significant variances in workflows among people trying to accomplish different things. This holds true for dependency management as well. Depending on what you’re doing with Python—whether it’s building reusable libraries, writing web apps, or diving into data science and machine learning—your workflow can look completely different from someone else’s. That being said, my usual approach to any development process is to pick a method and give it a shot to see if it works for my specific needs....

June 27, 2023

Implementing a simple traceroute clone in Python

I was watching this amazing lightning talk1 by Karla Burnett and wanted to understand how traceroute works in Unix. Traceroute is a tool that shows the route of a network packet from your computer to another computer on the internet. It also tells you how long it takes for the packet to reach each stop along the way. It’s useful when you want to know more about how your computer connects to other computers on the internet....

June 1, 2023

Sorting a Django queryset by a custom sequence of an attribute

I needed a way to sort a Django queryset based on a custom sequence of an attribute. Typically, Django allows sorting a queryset by any attribute on the model or related to it in either ascending or descending order. However, what if you need to sort the queryset following a custom sequence of attribute values? Suppose, you’re working with a model called Product where you want to sort the rows of the table based on a list of product ids that are already sorted in a particular order....

May 9, 2023

Deduplicating iterables while preserving order in Python

Whenever I need to deduplicate the items of an iterable in Python, my usual approach is to create a set from the iterable and then convert it back into a list or tuple. However, this approach doesn’t preserve the original order of the items, which can be a problem if you need to keep the order unscathed. Here’s a naive approach that works: from __future__ import annotations from collections.abc import Iterable # Python >3....

May 1, 2023

Pushing real-time updates to clients with Server-Sent Events (SSEs)

In multi-page web applications, a common workflow is where a user: Loads a specific page or clicks on some button that triggers a long-running task. On the server side, a background worker picks up the task and starts processing it asynchronously. The page shouldn’t reload while the task is running. The backend then communicates the status of the long-running task in real-time. Once the task is finished, the client needs to display a success or an error message depending on the final status of the finished task....

April 8, 2023

Signal handling in a multithreaded socket server

While working on a multithreaded socket server in an embedded environment, I realized that the default behavior of Python’s socketserver.ThreadingTCPServer requires some extra work if you want to shut down the server gracefully in the presence of an interruption signal. The intended behavior here is that whenever any of SIGHUP, SIGINT, SIGTERM, or SIGQUIT signals are sent to the server, it should: Acknowledge the signal and log a message to the output console of the server....

February 26, 2023

Switching between multiple data streams in a single thread

I was working on a project where I needed to poll multiple data sources and consume the incoming data points in a single thread. In this particular case, the two data streams were coming from two different Redis lists. The correct way to consume them would be to write two separate consumers and spin them up as different processes. However, in this scenario, I needed a simple way to poll and consume data from one data source, wait for a bit, then poll and consume from another data source, and keep doing this indefinitely....

February 19, 2023

Skipping the first part of an iterable in Python

Consider this iterable: it = (1, 2, 3, 0, 4, 5, 6, 7) Let’s say you want to build another iterable that includes only the numbers that appear starting from the element 0. Usually, I’d do this: # This returns (0, 4, 5, 6, 7). from_zero = tuple( elem for idx, elem in enumerate(it) if idx >= it.index(0) ) While this is quite terse and does the job, it won’t work with a generator....

February 12, 2023

Pausing and resuming a socket server in Python

I needed to write a socket server in Python that would allow me to intermittently pause the server loop for a while, run something else, then get back to the previous request-handling phase; repeating this iteration until the heat death of the universe. Initially, I opted for the low-level socket module to write something quick and dirty. However, the implementation got hairy pretty quickly. While the socket module gives you plenty of control over how you can tune the server’s behavior, writing a server with robust signal and error handling can be quite a bit of boilerplate work....

February 5, 2023

Debugging a containerized Django application in Jupyter Notebook

Back in the days when I was working as a data analyst, I used to spend hours inside Jupyter notebooks exploring, wrangling, and plotting data to gain insights. However, as I shifted my career gear towards backend software development, my usage of interactive exploratory tools dwindled. Nowadays, I spend the majority of my time working on a fairly large Django monolith accompanied by a fleet of microservices. Although I love my text editor and terminal emulators, I miss the ability to just start a Jupyter Notebook server and run code snippets interactively....

January 14, 2023

Manipulating text with query expressions in Django

I was working with a table that had a similar (simplified) structure like this: | uuid | file_path | |----------------------------------|---------------------------| | b8658dfc3e80446c92f7303edf31dcbd | media/private/file_1.pdf | | 3d750874a9df47388569a23c559a4561 | media/private/file_2.csv | | d177b7f7d8b046768ab65857451a0354 | media/private/file_3.txt | | df45742175d7451dad59761f15653d9d | media/private/image_1.png | | a542966fc193470dab84351c15523042 | media/private/image_2.jpg | Let’s say the above table is represented by the following Django model: from django.db import models class FileCabinet(models.Model): uuid = models.UUIDField( primary_key=True, default=uuid.uuid4, editable=False ) file_path = models....

January 7, 2023

Using tqdm with concurrent.fututes in Python

At my workplace, I was writing a script to download multiple files from different S3 buckets. The script relied on Django ORM, so I couldn’t use Python’s async paradigm to speed up the process. Instead, I opted for boto3 to download the files and concurrent.futures.ThreadPoolExecutor to spin up multiple threads and make the requests concurrently. However, since the script was expected to be long-running, I needed to display progress bars to show the state of execution....

January 6, 2023

Faster bulk_update in Django

Django has a Model.objects.bulk_update method that allows you to update multiple objects in a single pass. While this method is a great way to speed up the update process, oftentimes it’s not fast enough. Recently, at my workplace, I found myself writing a script to update half a million user records and it was taking quite a bit of time to mutate them even after leveraging bulk update. So I wanted to see if I could use multiprocessing with ....

November 30, 2022

Installing Python on macOS with asdf

I’ve just migrated from Ubuntu to macOS for work and am still in the process of setting up the machine. I’ve been a lifelong Linux user and this is the first time I’ve picked up an OS that’s not just another flavor of Debian. Primarily, I work with Python, NodeJS, and a tiny bit of Go. Previously, any time I had to install these language runtimes, I’d execute a bespoke script that’d install:...

November 13, 2022

Save models with update_fields for better performance in Django

TIL that you can specify update_fields while saving a Django model to generate a leaner underlying SQL query. This yields better performance while updating multiple objects in a tight loop. To test that, I’m opening an IPython shell with python manage.py shell -i ipython command and creating a few user objects with the following lines: In [1]: from django.contrib.auth import User In [2]: for i in range(1000): ...: fname, lname = f'foo_{i}', f'bar_{i}' ....

November 9, 2022