While writing microservices in Python, I like to declaratively define the shape of the data
coming in and out of JSON APIs or NoSQL databases in a separate module. Both TypedDict
and
dataclass
are fantastic tools to communicate the shape of the data with the next person
working on the codebase.
Whenever I need to do some processing on the data before starting to work on that, I prefer to transform the data via dataclasses. Consider this example:
# src.py
from __future__ import annotations
from dataclasses import dataclass
from typing import Any
@dataclass
class WebhookPayload:
"""Save data to DynamoDB."""
url: str
request_payload: dict
response_payload: dict
status_code: int
def to_dynamodb_item(self) -> None: ...
The above class defines the structure of a payload that’ll be saved in a DynamoDB table. To
make things simpler, I want to serialize the request_payload
, response_payload
, and
status_code
fields to JSON string before saving them to the DB. Usually, I’d do it in the
to_dynamodb_item
like this:
# src.py
from __future__ import annotations
from dataclasses import dataclass
import json
from typing import Any
@dataclass
class WebhookPayload:
"""Save data to DynamoDB."""
# Snip...
def to_dynamodb_item(self) -> dict[str, Any]:
request_payload = json.dumps(self.request_payload)
response_payload = json.dumps(self.response_payload)
status_code = json.dumps(self.response_payload)
# ...
dyanmodb_item = ...
return dynamodb_item
However, keeping track of this json.dumps
transformation that’s buried in a method can be
difficult. Also, it can be hard to track the fields that need to be deserialized whenever
you want the rich data structures back. Another disadvantage is that you’ll have to perform
the same transformation again if you need serialized fields in another method. A better way
is to take advantage of the __post_init__
hook exposed by dataclasses. Here’s how you can
do it:
# src.py
from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import Any
@dataclass
class WebhookPayload:
"""Save data to DynamoDB."""
url: str
request_payload: dict | None
response_payload: dict | None
status_code: int | None
_json_transform: bool = field(default=True, repr=False)
_json_fields: tuple[str, ...] = field(
default=(
"request_payload",
"response_payload",
"status_code",
),
repr=False,
)
def __post_init__(self) -> None:
if not self._json_transform:
return
for field in self._json_fields:
# Here's where the magic happens!
setattr(self, field, json.dumps(getattr(self, field)))
def to_dynamodb_item(self) -> dict[str, Any]: ...
if __name__ == "__main__":
wh = WebhookPayload(
url="https//:httpbin.org/post",
request_payload={"hello": "world"},
response_payload=None,
status_code=None,
)
print(wh)
Running the script will print the following:
WebhookPayload(
url='https//:httpbin.org/post',
request_payload='{"hello": "world"}',
response_payload='null',
status_code='null'
)
Notice, how the intended fields are now JSON encoded. Python calls the __post_init__
hook
of a dataclass after calling the __init__
method. If you don’t generate any init by
decorating the target class with @dataclass(init=False)
, in that case, the __post_init__
hook won’t be executed.
The field
function with repr=False
allows us to exclude the configuration fields like
_json_transform
and _json_fields
from the final __repr__
of the class. Notice that
these two fields are absent in the final representation of the dataclass instance.
You can turn off the JSON conversion by setting the _json_transform
to False
:
# src.py
...
WebhookPayload(
url="https//:httpbin.org/post",
request_payload={"hello": "world"},
response_payload=None,
status_code=None,
_json_transform=False,
)
You can also add or remove fields to be transformed by changing the value of the
_json_fields
iterable of the class:
# src.py
...
WebhookPayload(
url="https//:httpbin.org/post",
request_payload={"hello": "world"},
response_payload=None,
status_code=None,
_json_fields=("status_code",),
)
This will only serialize the status_code
field. Neat!
References
Recent posts
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited
- Topological sort
- Writing a circuit breaker in Go
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing