This morning, someone on Twitter pointed me to PEP 5621, which introduces __getattr__
and __dir__ at the module level. While __dir__ helps control which attributes are
printed when calling dir(module), __getattr__ is the more interesting addition.
The __getattr__ method in a module works similarly to how it does in a Python class. For
example:
class Cat:
def __getattr__(self, name: str) -> str:
if name == "voice":
return "meow!!"
raise AttributeError(f"Attribute {name} does not exist")
# Try to access 'voice' on Cat
cat = Cat()
cat.voice # Prints "meow!!"
# Raises AttributeError: Attribute something_else does not exist
cat.something_else
In this class, __getattr__ defines what happens when specific attributes are accessed,
allowing you to manage how missing attributes behave. Since Python 3.7, you can also define
__getattr__ at the module level to handle attribute access on the module itself.
For instance, if you have a module my_module.py:
# my_module.py
def existing_function() -> str:
return "I exist!"
def __getattr__(name: str) -> str:
if name == "dynamic_attribute":
return "I was generated dynamically!"
raise AttributeError(f"Module {__name__} has no attribute {name}")
Using this module:
# another_module.py
import my_module
print(my_module.existing_function()) # Prints "I exist!"
print(my_module.dynamic_attribute) # Prints "I was generated dynamically!"
print(my_module.non_existent) # Raises AttributeError
If an attribute isn’t found through the regular lookup (using object.__getattribute__),
Python will look for __getattr__ in the module’s __dict__. If found, it calls
__getattr__ with the attribute name and returns the result. But if you’re looking up a
name directly as a module global, it bypasses __getattr__. This prevents performance
issues that would arise from repeatedly invoking __getattr__ for built-in or common
attributes.
One practical use for module-level __getattr__ is lazy-loading heavy dependencies to
improve startup performance. Imagine you have a module that relies on a large library but
don’t need it immediately at import.
# heavy_module.py
from typing import Any
def __getattr__(name: str) -> Any:
if name == "np":
import numpy as np
globals()["np"] = np # Cache it in the module's namespace
return np
raise AttributeError(f"Module {__name__} has no attribute {name}")
With this setup, importing heavy_module doesn’t immediately import NumPy. Only when you
access heavy_module.np does it trigger the import:
# main.py
import heavy_module
# NumPy hasn't been imported yet.
# Code that doesn't need NumPy...
# Now we need NumPy
arr = heavy_module.np.array([1, 2, 3])
print(arr) # NumPy is now imported and used
The first access to heavy_module.np imports NumPy (adding ~150ns), but since we cache np
with globals()['np'] = np, subsequent accesses are fast, as the module now holds the
reference to NumPy.
This approach is handy in scenarios like CLIs where you want to keep startup quick. For example, if you need to initialize a database connection but only for specific commands, you can defer the setup until needed.
Here’s an example with SQLite (though SQLite connections are quick, imagine a slower connection here):
# db_module.py
import sqlite3
# Caching initialized connection in the global namespace
_connection: sqlite3.Connection | None = None
def __getattr__(name: str) -> sqlite3.Connection:
if name == "connection":
global _connection
if _connection is None:
print("Initializing database connection...")
_connection = sqlite3.connect("my_database.db")
return _connection
raise AttributeError(f"Module {__name__} has no attribute {name}")
In this setup, nothing is instantiated when you import db_module. The connection is only
initialized on the first access of db_module.connection. Later calls use the cached
_connection, making subsequent access fast.
Here’s how you might use it in a CLI:
# cli.py
import click
import db_module
@click.group()
def cli() -> None:
pass
@cli.command()
def greet() -> None:
click.echo("Hello!")
@cli.command()
def show_data() -> None:
conn = (
db_module.connection
) # Initializes the database connection if needed
cursor = conn.cursor()
cursor.execute("SELECT * FROM my_table")
results = cursor.fetchall()
click.echo(f"Data: {results}")
if __name__ == "__main__":
cli()
When you run python cli.py greet, the CLI starts quickly since it doesn’t initialize the
database connection. But running python cli.py show_data accesses db_module.connection,
which triggers the connection setup.
This could also be achieved by defining a function that initializes the database connection
and caches it for subsequent calls. However, using module-level __getattr__ can be more
convenient if you have multiple global variables that require expensive calculations or
initializations. Instead of writing separate functions for each variable, you can handle
them all within the __getattr__ method.
Here’s one example of using it for a non-trivial case in the wild2.
Recent posts
- Revisiting interface segregation in Go
- Avoiding collisions in Go context keys
- Organizing Go tests
- Subtest grouping in Go
- Let the domain guide your application structure
- Test state, not interactions
- Early return and goroutine leak
- Lifecycle management in Go tests
- Gateway pattern for external service calls
- Flags for discoverable test config in Go