While working with microservices in Python, a common pattern that I see is—the usage of dynamically filled dictionaries as payloads of REST APIs or message queues. To understand what I mean by this, consider the following example:
# src.py
from __future__ import annotations
import json
from typing import Any
import redis # Do a pip install.
def get_payload() -> dict[str, Any]:
"""Get the 'zoo' payload containing animal names and attributes."""
payload = {"name": "awesome_zoo", "animals": []}
names = ("wolf", "snake", "ostrich")
attributes = (
{"family": "Canidae", "genus": "Canis", "is_mammal": True},
{"family": "Viperidae", "genus": "Boas", "is_mammal": False},
)
for name, attr in zip(names, attributes):
payload["animals"].append( # type: ignore
{"name": name, "attribute": attr},
)
return payload
def save_to_cache(payload: dict[str, Any]) -> None:
# You'll need to spin up a Redis db before instantiating
# a connection here.
r = redis.Redis()
print("Saving to cache...")
r.set(f"zoo:{payload['name']}", json.dumps(payload))
if __name__ == "__main__":
payload = get_payload()
save_to_cache(payload)
Here, the get_payload
function constructs a payload that gets stored in a Redis DB in the
save_to_cache
function. The get_payload
function returns a dict that denotes a contrived
payload containing the data of an imaginary zoo. To execute the above snippet, you’ll need
to spin up a Redis database first. You can use Docker1 to do so. Install and configure
Docker on your system and run:
docker run -d -p 6379:6379 redis:alpine
If you run the above snippet after instantiating the Redis server, it’ll run without raising
any error. You can inspect the content saved in Redis with the following command (assuming
you’ve got redis-cli
and jq
installed in your system):
echo "get zoo:awesome_zoo" | redis-cli | jq
This will return the following payload to your console:
{
"name": "awesome_zoo",
"animals": [
{
"name": "wolf",
"attribute": {
"family": "Canidae",
"genus": "Canis",
"is_mammal": true
}
},
{
"name": "snake",
"attribute": {
"family": "Viperidae",
"genus": "Boas",
"is_mammal": false
}
}
]
}
Although this workflow is functional in runtime, there’s a big gotcha here! It’s really
difficult to picture the shape of the payload
from the output of the get_payload
function; as it dynamically builds the dictionary. First, it declares a dictionary with two
fields—name
and animals
. Here, name
is a string value that denotes the name of the
zoo. The other field animals
is a list containing the names and attributes of the animals
in the zoo. Later on, the for-loop fills up the dictionary with nested data structures. This
charade of operations makes it difficult to reify the final shape of the resulting payload
in your mind.
In this case, you’ll have to inspect the content of the Redis cache to fully understand the
shape of the data. Writing code in the above manner is effortless but it makes it really
hard for the next person working on the codebase to understand how the payload looks without
tapping into the data storage. There’s a better way to declaratively communicate the shape
of the payload that doesn’t involve writing unmaintainably large docstrings. Here’s how you
can leverage TypedDict
and Annotated
to achieve the goals:
# src.py
from __future__ import annotations
import json
# In < Python 3.8, import 'TypedDict' from 'typing_extensions'.
# In < Python 3.9, import 'Annotated' from 'typing_extensions'.
from typing import Annotated, Any, TypedDict
import redis # Do a pip install.
class Attribute(TypedDict):
family: str
genus: str
is_mammal: bool
class Animal(TypedDict):
name: str
attribute: Attribute
class Zoo(TypedDict):
name: str
animals: list[Animal]
def get_payload() -> Zoo:
"""Get the 'zoo' payload containing animal names and attributes."""
payload: Zoo = {"name": "awesome_zoo", "animals": []}
names = ("wolf", "snake", "ostrich")
attributes: tuple[Attribute, ...] = (
{"family": "Canidae", "genus": "Canis", "is_mammal": True},
{"family": "Viperidae", "genus": "Boas", "is_mammal": False},
)
for name, attr in zip(names, attributes):
payload["animals"].append({"name": name, "attribute": attr})
return payload
def save_to_cache(payload: Annotated[Zoo, dict]) -> None:
# You'll need to spin up a Redis db before instantiating
# a connection here.
r = redis.Redis()
print("Saving to cache...")
r.set(f"zoo:{payload['name']}", json.dumps(payload))
if __name__ == "__main__":
payload: Zoo = get_payload()
save_to_cache(payload)
Notice, how I’ve used TypedDict
to declare the nested structure of the payload Zoo
. In
runtime, instances of typed-dict classes behave the same way as normal dicts. Here, Zoo
contains two fields—name
and animals
. The animals
field is annotated as list[Animal]
where Animal
is another typed-dict. The Animal
typed-dict houses another typed-dict
called Attribute
that defines various properties of the animal.
Taking a look at the typed-dict Zoo
and following along its nested structure, the final
shape of the payload becomes clearer without us having to look for example payloads. Also,
Mypy can check whether the payload conforms to the shape of the annotated type. I used
Annotated[Zoo, dict]
in the input parameter of save_to_cache
function to communicate
with the reader that an instance of the class Zoo
is a dict that conforms to the contract
laid out in the type itself. The type Annotated
can be used to add any arbitrary metadata
to a particular type.
In runtime, this snippet will exhibit the same behavior as the previous one. Mypy also approves this.
Handling missing key-value pairs
By default, the type checker will structurally validate the shape of the dict annotated with
a TypedDict
class and all the key-value pairs expected by the annotation must be present
in the dict. It’s possible to lax this behavior by specifying totality. This can be
helpful to deal with missing fields without letting go of type safety. Consider this:
from __future__ import annotations
from typing import TypedDict
class Attribute(TypedDict):
family: str
genus: str
is_mammal: bool
animal_attribute: Attribute = {
"family": "Hominidae",
"genus": "Homo",
} # Mypy will complain about the missing 'is_mammal' key.
Mypy will complain about the missing key:
src.py:12: error: Missing key "is_mammal" for TypedDict "Attribute"
animal_attribute: Attribute = {
^
Found 1 error in 1 file (checked 1 source file)
You can relax this behavior like this:
...
class Attribute(TypedDict, total=False):
family: str
genus: str
is_mammal: bool
...
Now Mypy will no longer complain about the missing field in the annotated dict. However,
this will still disallow arbitrary keys that isn’t defined in the TypedDict
. For example:
...
# Mypy will complain as the key 'species' doesn't exist in the TypedDict.
animal_attribute["species"] = "Sapiens"
...
src.py:17: error: TypedDict "Attribute" has no key "species"
animal_attribute["species"] = "Sapiens"
^
Found 1 error in 1 file (checked 3 source files)
make: *** [Makefile:134: mypy] Error 1
Sweet type safety without being too strict about missing fields!
Recent posts
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited
- Topological sort
- Writing a circuit breaker in Go
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing