This morning, someone on Twitter pointed me to PEP 5621, which introduces __getattr__
and __dir__
at the module level. While __dir__
helps control which attributes are
printed when calling dir(module)
, __getattr__
is the more interesting addition.
The __getattr__
method in a module works similarly to how it does in a Python class. For
example:
class Cat:
def __getattr__(self, name: str) -> str:
if name == "voice":
return "meow!!"
raise AttributeError(f"Attribute {name} does not exist")
# Try to access 'voice' on Cat
cat = Cat()
cat.voice # Prints "meow!!"
# Raises AttributeError: Attribute something_else does not exist
cat.something_else
In this class, __getattr__
defines what happens when specific attributes are accessed,
allowing you to manage how missing attributes behave. Since Python 3.7, you can also define
__getattr__
at the module level to handle attribute access on the module itself.
For instance, if you have a module my_module.py
:
# my_module.py
def existing_function() -> str:
return "I exist!"
def __getattr__(name: str) -> str:
if name == "dynamic_attribute":
return "I was generated dynamically!"
raise AttributeError(f"Module {__name__} has no attribute {name}")
Using this module:
# another_module.py
import my_module
print(my_module.existing_function()) # Prints "I exist!"
print(my_module.dynamic_attribute) # Prints "I was generated dynamically!"
print(my_module.non_existent) # Raises AttributeError
If an attribute isn’t found through the regular lookup (using object.__getattribute__
),
Python will look for __getattr__
in the module’s __dict__
. If found, it calls
__getattr__
with the attribute name and returns the result. But if you’re looking up a
name directly as a module global, it bypasses __getattr__
. This prevents performance
issues that would arise from repeatedly invoking __getattr__
for built-in or common
attributes.
One practical use for module-level __getattr__
is lazy-loading heavy dependencies to
improve startup performance. Imagine you have a module that relies on a large library but
don’t need it immediately at import.
# heavy_module.py
from typing import Any
def __getattr__(name: str) -> Any:
if name == "np":
import numpy as np
globals()["np"] = np # Cache it in the module's namespace
return np
raise AttributeError(f"Module {__name__} has no attribute {name}")
With this setup, importing heavy_module
doesn’t immediately import NumPy. Only when you
access heavy_module.np
does it trigger the import:
# main.py
import heavy_module
# NumPy hasn't been imported yet.
# Code that doesn't need NumPy...
# Now we need NumPy
arr = heavy_module.np.array([1, 2, 3])
print(arr) # NumPy is now imported and used
The first access to heavy_module.np
imports NumPy (adding ~150ns), but since we cache np
with globals()['np'] = np
, subsequent accesses are fast, as the module now holds the
reference to NumPy.
This approach is handy in scenarios like CLIs where you want to keep startup quick. For example, if you need to initialize a database connection but only for specific commands, you can defer the setup until needed.
Here’s an example with SQLite (though SQLite connections are quick, imagine a slower connection here):
# db_module.py
import sqlite3
# Caching initialized connection in the global namespace
_connection: sqlite3.Connection | None = None
def __getattr__(name: str) -> sqlite3.Connection:
if name == "connection":
global _connection
if _connection is None:
print("Initializing database connection...")
_connection = sqlite3.connect("my_database.db")
return _connection
raise AttributeError(f"Module {__name__} has no attribute {name}")
In this setup, nothing is instantiated when you import db_module
. The connection is only
initialized on the first access of db_module.connection
. Later calls use the cached
_connection
, making subsequent access fast.
Here’s how you might use it in a CLI:
# cli.py
import click
import db_module
@click.group()
def cli() -> None:
pass
@cli.command()
def greet() -> None:
click.echo("Hello!")
@cli.command()
def show_data() -> None:
conn = (
db_module.connection
) # Initializes the database connection if needed
cursor = conn.cursor()
cursor.execute("SELECT * FROM my_table")
results = cursor.fetchall()
click.echo(f"Data: {results}")
if __name__ == "__main__":
cli()
When you run python cli.py greet
, the CLI starts quickly since it doesn’t initialize the
database connection. But running python cli.py show_data
accesses db_module.connection
,
which triggers the connection setup.
This could also be achieved by defining a function that initializes the database connection
and caches it for subsequent calls. However, using module-level __getattr__
can be more
convenient if you have multiple global variables that require expensive calculations or
initializations. Instead of writing separate functions for each variable, you can handle
them all within the __getattr__
method.
Here’s one example of using it for a non-trivial case in the wild2.
Recent posts
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Docker mount revisited
- Topological sort
- Writing a circuit breaker in Go
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing
- Behind the blog
- Shell redirection syntax soup