Disallow large file download from URLs in Python

· 2 min

I was working on a DRF POST API endpoint where the consumer is expected to add a URL containing a PDF file and the system would then download the file and save it to an S3 bucket. While this sounds quite straightforward, there’s one big issue. Before I started working on it, the core logic looked like this:

# src.py
from __future__ import annoatations

from urllib.request import urlopen
import tempfile
from shutil import copyfileobj


def save_to_s3(src_url: str, dest_url: str) -> None:
    with tempfile.NamedTemporaryFile() as file:
        with urlopen(src_url) as response:
            # This stdlib function saves the content of the file
            # in 'file'.
            copyfileobj(response, file)

        # Logic to save file in s3.
        _save_to_s3(des_url)


if __name__ == "__main__":
    save_to_s3(
        "https://citeseerx.ist.psu.edu/viewdoc/download?"
        "doi=10.1.1.92.4846&rep=rep1&type=pdf",
        "https://s3-url.com",
    )

In the above snippet, there’s no guardrail against how large the target file can be. You could bring the entire server down to its knees by posting a link to a ginormous file. The server would be busy downloading the file and keep consuming resources.

Declarative payloads with TypedDict in Python

· 6 min

While working with microservices in Python, a common pattern that I see is - the usage of dynamically filled dictionaries as payloads of REST APIs or message queues. To understand what I mean by this, consider the following example:

# src.py
from __future__ import annotations

import json
from typing import Any

import redis  # Do a pip install.


def get_payload() -> dict[str, Any]:
    """Get the 'zoo' payload containing animal names and attributes."""

    payload = {"name": "awesome_zoo", "animals": []}

    names = ("wolf", "snake", "ostrich")
    attributes = (
        {"family": "Canidae", "genus": "Canis", "is_mammal": True},
        {"family": "Viperidae", "genus": "Boas", "is_mammal": False},
    )
    for name, attr in zip(names, attributes):
        payload["animals"].append(  # type: ignore
            {"name": name, "attribute": attr},
        )
    return payload


def save_to_cache(payload: dict[str, Any]) -> None:
    # You'll need to spin up a Redis db before instantiating
    # a connection here.
    r = redis.Redis()
    print("Saving to cache...")
    r.set(f"zoo:{payload['name']}", json.dumps(payload))


if __name__ == "__main__":
    payload = get_payload()
    save_to_cache(payload)

Here, the get_payload function constructs a payload that gets stored in a Redis DB in the save_to_cache function. The get_payload function returns a dict that denotes a contrived payload containing the data of an imaginary zoo. To execute the above snippet, you’ll need to spin up a Redis database first. You can use Docker to do so. Install and configure Docker on your system and run:

Uniform error response in Django Rest Framework

· 3 min

Django Rest Framework exposes a neat hook to customize the response payload of your API when errors occur. I was going through Microsoft’s REST API guideline and wanted to make the error response of my APIs more uniform and somewhat similar to this example.

I’ll use a modified version of the quickstart example in the DRF docs to show how to achieve that. Also, we’ll need a POST API to demonstrate the changes better. Here’s the same example with the added POST API. Place this code in the project’s urls.py file.

Effortless API response caching with Python & Redis

· 9 min

Updated on 2023-09-11: Fix broken URLs.

Recently, I was working with Mapbox’s Route optimization API. It tries to solve the traveling salesman problem where you provide the API with coordinates of multiple places and it returns a duration-optimized route between those locations. This is a perfect usecase where Redis caching can come handy. Redis is a fast and lightweight in-memory database with additional persistence options; making it a perfect candidate for the task at hand. Here, caching can save you from making redundant API requests and also, it can dramatically improve the response time as well.