Verifying webhook origin via payload hash signing

· 5 min

While working with GitHub webhooks, I discovered a common webhook security pattern a receiver can adopt to verify that the incoming webhooks are indeed arriving from GitHub; not from some miscreant trying to carry out a man-in-the-middle attack. After some amount of digging, I found that it’s quite a common practice that many other webhook services employ as well. Also, check out how Sentry handles webhook verification.

Moreover, GitHub’s documentation demonstrates the pattern in Ruby. So I thought it’d be a good idea to translate that into Python in a more platform-agnostic manner. The core idea of the pattern goes as follows:

Recipes from Python SQLite docs

· 19 min

While going through the documentation of Python’s sqlite3 module, I noticed that it’s quite API-driven, where different parts of the module are explained in a prescriptive manner. I, however, learn better from examples, recipes, and narratives. Although a few good recipes already exist in the docs, I thought I’d also enlist some of the examples I tried out while grokking them.

Executing individual statements

To execute individual statements, you’ll need to use the cursor_obj.execute(statement) primitive.

Prefer urlsplit over urlparse to destructure URLs

· 2 min

TIL from this video by Anthony Sottile that Python’s urlparse is quite slow at parsing URLs. I’ve always used urlparse to destructure URLs and didn’t know that there’s a faster alternative to this in the standard library. The official documentation also recommends the alternative function.

The urlparse function splits a supplied URL into multiple seperate components and returns a ParseResult object. Consider this example:

In [1]: from urllib.parse import urlparse

In [2]: url = "https://httpbin.org/get?q=hello&r=22"

In [3]: urlparse(url)
Out[3]: ParseResult(
        scheme='https', netloc='httpbin.org',
        path='/get', params='', query='q=hello&r=22',
        fragment=''
    )

You can see how the function disassembles the URL and builds a ParseResult object with the URL components. Along with this, the urlparse function can also parse an obscure type of URL that you’ll most likely never need. If you notice closely in the previous example, you’ll see that there’s a params argument in the ParseResult object. This params argument gets parsed whether you need it or not and that adds some overhead. The params field will be populated if you have a URL like this:

ExitStack in Python

· 6 min

Over the years, I’ve used Python’s contextlib.ExitStack in a few interesting ways. The official ExitStack documentation advertises it as a way to manage multiple context managers and has a couple of examples of how to leverage it. However, neither in the docs nor in GitHub code search could I find examples of some of the maybe unusual ways I’ve used it in the past. So, I thought I’d document them here.

Compose multiple levels of fixtures in pytest

· 4 min

While reading the second version of Brian Okken’s pytest book, I came across this neat trick to compose multiple levels of fixtures. Suppose, you want to create a fixture that returns some canned data from a database. Now, let’s say that invoking the fixture multiple times is expensive, and to avoid that you want to run it only once per test session. However, you still want to clear all the database states after each test function runs. Otherwise, a test might inadvertently get coupled with another test that runs before it via the fixture’s shared state. Let’s demonstrate this:

Patch where the object is used

· 3 min

I was reading Ned Bachelder’s blog Why your mock doesn’t work and it triggered an epiphany in me about a testing pattern that I’ve been using for a while without being aware that there might be an aphorism on the practice.

Patch where the object is used; not where it’s defined.

To understand it, consider the example below. Here, you have a module containing a function that fetches data from some fictitious database.

Partially assert callable arguments with 'unittest.mock.ANY'

· 2 min

I just found out that you can use Python’s unittest.mock.ANY to make assertions about certain arguments in a mock call, without caring about the other arguments. This can be handy if you want to test how a callable is called but only want to make assertions about some arguments. Consider the following example:

# test_src.py

import random
import time


def fetch() -> list[float]:
    # Simulate fetching data from a database.
    time.sleep(2)
    return [random.random() for _ in range(4)]


def add(w: float, x: float, y: float, z: float) -> float:
    return w + x + y + z


def procss() -> float:
    return add(*fetch())

Let’s say we only want to test the process function. But process ultimately depends on the fetch function, which has multiple side effects - it returns pseudo-random values and waits for 2 seconds on a fictitious network call. Since we only care about process, we’ll mock the other two functions. Here’s how unittest.mock.ANY can make life easier:

Apply constraints with 'assert' in Python

· 4 min

Whenever I need to apply some runtime constraints on a value while building an API, I usually compare the value to an expected range and raise a ValueError if it’s not within the range. For example, let’s define a function that throttles some fictitious operation. The throttle function limits the number of times an operation can be performed by specifying the throttle_after parameter. This parameter defines the number of iterations after which the operation will be halted. The current_iter parameter tracks the current number of times the operation has been performed. Here’s the implementation:

Stream process a CSV file in Python

· 6 min

A common bottleneck for processing large data files is - memory. Downloading the file and loading the entire content is surely the easiest way to go. However, it’s likely that you’ll quickly hit OOM errors. Often time, whenever I have to deal with large data files that need to be downloaded and processed, I prefer to stream the content line by line and use multiple processes to consume them concurrently.

Bulk operations in Django with process pool

· 4 min

I’ve rarely been able to take advantage of Django’s bulk_create / bulk_update APIs in production applications; especially in the cases where I need to create or update multiple complex objects with a script. Often time, these complex objects trigger a chain of signals or need non-trivial setups before any operations can be performed on each of them.

The issue is, bulk_create / bulk_update doesn’t trigger these signals or expose any hooks to run any setup code. The Django doc mentions these bulk_create caveats in detail. Here are a few of them: