Imagine a custom set-like data structure that doesn’t perform hashing and trades performance for tighter memory footprint. Or imagine a dict-like data structure that automatically stores data in a PostgreSQL or Redis database the moment you initialize it; also it lets you to get-set-delete key-value pairs using the usual retrieval-assignment-deletion syntax associated with built-in dictionaries. Custom data structures can give you the power of choice and writing them will make you understand how the built-in data structures in Python are constructed.

One way to understand how built-in objects like dictionary, list, set etc work is to build custom data structures based on them. Python provides several mixin classes in the collection.abc module to design custom data structures that look and act like built-in structures with additional functionalities baked in.

Concepts

To understand how all these work, you’ll need a fair bit of knowledge about Interfaces, Abstract Base Classes, Mixin Classes etc. I’ll build the concept edifice layer by layer where you’ll learn about interfaces first and how they can be created and used via the abc.ABC class. Then you’ll learn how abstract base classes differ from interfaces. After that I’ll introduce mixins and explain how all these concepts can be knitted together to architect custom data structures with amazing capabilities. Let’s dive in.

Interfaces

Python interfaces can help you write classes based on common structures. They ensure that classes that provide similar functionalities will also have similar footprints. Interfaces are not as popular in Python as they are in other statically typed language. The dynamic nature and duck-typing capabilities of Python often make them redundant.

However, in larger applications, interfaces can make you avoid writing code that is poorly encapsulated or build classes that look awfully similar but provide completely unrelated functionalities. Moreover, interfaces implicitly spawn other powerful techniques like mixin classes which can help you achieve DRY nirvana.

Overview

At a high level, an interface acts as a blueprint for designing classes. In Python, an interface is a specialized abstract class that defines one or more abstract methods. Abstract classes differs from concrete classes in the sense that they aren’t intended to stand on their own and the methods they define shouldn’t have any implementation.

Usually, you inherit from an interface and implement the methods defined in the abstract class in a concrete subclass. Interfaces provide the skeletons and concrete classes provide the implementation of the methods based on those skeletons. Depending on the ways you can architect interfaces, they can be segmented into two primary categories.

  • Informal Interfaces
  • Formal Interfaces

Informal interfaces

Informal interfaces are classes which define methods that can be overridden, but there’s no strict enforcement.

Let’s write an informal interface for a simple calculator class:

class ICalc:
    """Informal Interface: Abstract calculator class."""

    def add(self, a, b):
        raise NotImplementedError

    def sub(self, a, b):
        raise NotImplementedError

    def mul(self, a, b):
        raise NotImplementedError

    def div(self, a, b):
        raise NotImplementedError

Notice that the ICalc class has four different methods that don’t give you any implementation. It’s an informal interface because you can still instantiate the class but the methods will raise NotImplementedError if you try to apply them. You’ve to subclass the interface to use it. Let’s do it:

class Calc(ICalc):
    """Concrete Class: Calculator"""

    def add(self, a, b):
        return a + b

    def sub(self, a, b):
        return a - b

    def mul(self, a, b):
        return a * b

    def div(self, a, b):
        return a / b


# Using the class
c = Calc()
print(c.add(1, 2))
print(c.sub(2, 3))
print(c.mul(4, 5))
print(c.div(5, 6))
3
-1
20
0.8333333333333334

Now, you might be wondering why you even need all of these boilerplate code and inheritance when you can directly define the concrete Calc class and call it a day.

Consider the following scenario where you want to add additional functionalities to each of the method of the Calc class. Here, you’ve two options. Either you can mutate the original class and add those extra functionalities to the methods or you can create another class with similar footprint and implement all the methods with the added functionalities.

The first option isn’t always viable and can cause regression in real life scenario. The second approach ensures modularity and is generally quicker to implement since you won’t have to worry about messing up the original concrete class. However, figuring out which methods you’ll need to implement in the extended class can be hard because the concrete class might have additional methods that you don’t want in the extended class.

In this case, instead of figuring out the methods from the concrete Calc class, it’s easier to do so from an established structure defined in the ICalc interface. Interfaces make the process of extending class functionalities more tractable. Let’s make another class that will add logging to all of the methods of the Calc class:

import logging

logging.basicConfig(level=logging.INFO)


class CalcLog(ICalc):
    """Concrete Class: Calculator with logging"""

    def add(self, a, b):
        logging.info(f"Operation: Addition, Arguments: {(a, b)}")
        return a + b

    def sub(self, a, b):
        logging.info(f"Operation: Subtraction, Arguments: {(a, b)}")
        return a - b

    def mul(self, a, b):
        logging.info(f"Operation: Multiplication, Arguments: {(a, b)}")
        return a * b

    def div(self, a, b):
        logging.info(f"Operation: Division, Arguments: {(a, b)}")
        return a / b


# Using the class
clog = CalcLog()
print(clog.add(1, 2))
print(clog.sub(2, 3))
print(clog.mul(4, 5))
print(clog.div(5, 6))
INFO:root:Operation: Addition, Arguments: (1, 2)
INFO:root:Operation: Subtraction, Arguments: (2, 3)
INFO:root:Operation: Multiplication, Arguments: (4, 5)
INFO:root:Operation: Division, Arguments: (5, 6)


3
-1
20
0.8333333333333334

In the above class, I’ve defined another class called CalcLog that basically extends the functionalities of the previously defined Calc class. Here, I’ve inherited from the informal interface ICalc and implemented all the methods with additional info logging capability.

Although writing informal interfaces is trivial, there are multiple issues that plagues them. The user of the interface class can still instantiate it like a normal class and won’t be able to tell the difference between a it and a concrete class until she tries to use any of the methods define inside the interface. Only then the methods will throw exceptions. This can have unintended side effects.

Moreover, informal interfaces won’t compel you to implement all the methods in the subclasses. You can easily get away without implementing a particular method defined in the interface. It won’t complain about the unimplemented methods in the subclasses. However, if you try to use a method that hasn’t been implemented in the subclass, you’ll get an error. This means even if issubclass(ConcreteSubClass, Interface) shows True, you can’t rely on it since it doesn’t give you the guarantee that the ConcreteSubClass has implemented all the methods defined in the Interface.

Let’s create another class FakeCalc an only implement one method defined in the ICalc abstract class:

class FakeCalc(ICalc):
    """Concrete Class: Fake calculator that doesn't implement all the methods
    defined in the interface."""

    def add(self, a, b):
        return a + b


# Using the class
cfake = FakeCalc()
print(cfake.add(1, 2))
print(cfake.sub(2, 3))
3

---------------------------------------------------------------------------

NotImplementedError                       Traceback (most recent call last)

<ipython-input-48-035c519cee55> in <module>
        10 cfake = FakeCalc()
        11 print(cfake.add(1,2))
---> 12 print(cfake.sub(2,3))


<ipython-input-45-255c6a2093b0> in sub(self, a, b)
        6
        7     def sub(self, a, b):
----> 8         raise NotImplementedError
        9
        10     def mul(self, a, b):


NotImplementedError:

Despite not implementing all the methods defined in the ICalc class, I was still able to instantiate the FakeCalc concrete class. However, when I tried to apply a method sub that wasn’t implemented in the concrete class, it gave me an error. Also, issubclass(FakeCalc, ICalc) returns True which can mislead you into thinking that all the methods of the subclass FakeCalc are usable. It can cause subtle bugs can be difficult to detect. Formal interfaces try to overcome these issues.

Formal interfaces

Formal interfaces do not suffer from the problems that plague informal interfaces. So if you want to implement an interface that the users can’t initiate independently and that forces them to implement all the methods in the concrete sub classes, formal interface is the way to go. In Python, the idiomatic way to define formal interfaces is via the abc module. Let’s transform the previously mentioned ICalc interface into a formal one:

from abc import ABC, abstractmethod


class ICalc(ABC):
    """Formal interface: Abstract calculator class."""

    @abstractmethod
    def add(self, a, b):
        pass

    @abstractmethod
    def sub(self, a, b):
        pass

    @abstractmethod
    def mul(self, a, b):
        pass

    @abstractmethod
    def div(self, a, b):
        pass

Here, I’ve imported ABC class and abstractmethod decorator from the abc module of Python’s standard library. The name ABC stands for Abstract Base Class. The interface class needs to inherit from this ABCclass and all the abstract methods need to be decorated using the abstractmethod decorator. If your knowledge on decorators are fuzzy, checkout this in-depth article on python decorators1.

Although, it seems like ICalc has merely inherited from the ABC class, under the hood, a metaclass2 ABCMeta gets attached to the interface which essentially makes sure that you can’t instantiate this class independently. Let’s try to do so and see what happens:

i = ICalc()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-118-a3cb2945d943> in <module>
----> 1 i = ICalc()

TypeError: Can't instantiate abstract class ICalc with abstract methods
add, div, mul, sub

The error message clearly states that you can’t instantiate the class ICalc directly at all. You’ve make a subclass of ICalc and implement all the abstract methods and only then you’ll be able to make an instance of the subclass. The subclassing and implementation part is same as before.

class Calc(ICalc):
    """Concrete calculator class"""

    def add(self, a, b):
        return a + b

    def sub(self, a, b):
        return a - b

    def mul(self, a, b):
        return a * b

    def div(self, a, b):
        return a / b


# Using the class
c = Calc()
print(c.add(1, 2))
print(c.sub(2, 3))
print(c.mul(4, 5))
print(c.div(5, 6))

In the case of formal interface, failing to implement even one abstract method in the subclass will raise TypeError. So you can never write something the like the FakeCalc with a formal interface. This approach is more explicit and if there is an issue, it fails early.

Interfaces vs abstract base classes

You’ve probably seen the term Interface and Abstract Base Class being used interchangeably. However, conceptually they’re different. Interfaces can be thought of as a special case of Abstract Base Classes.

It’s imperative that all the methods of an interface are abstract methods and the classes don’t store any state (instance variables). However, in case of abstract base classes, the methods are generally abstract but there can also be methods that provide implementation (concrete methods) and also, these classes can have instance variables. This generic abstract base classes can get very interesting and they can be used as mixins but more on that in the later sections.

Both interfaces and abstract base classes are similar in the sense that they can’t stand on their own, that means these classes aren’t meant to be instantiated independently. Pay attention to the following snippet to understand how interfaces and abstract base classes differ.

Interface

from abc import ABC, abstractmethod


class InterfaceExample(ABC):
    @abstractmethod
    def method_a(self):
        pass

    @abstractmethod
    def method_b(self):
        pass

Here, all the methods must have to be abstract.

Abstract Base Class

from abc import ABC, abstractmethod


class AbstractBaseClassExample(ABC):
    @abstractmethod
    def method_a(self):
        pass

    @abstractmethod
    def method_b(self):
        pass

    def method_c(self):
        # implement something
        pass

Notice how method_c in the above class is a concrete method and can have implementation.

The two examples above establish the fact that

All interfaces are abstract base classes but not all abstract base classes are interfaces.

A complete example

Before moving on to the next section, let’s see another contrived example to get the idea about the cases where interfaces can come handy. I’ll define an interface called AutoMobile and create three concrete classes called Car, Truck and Bus from it. The interface defines three abstract methods start, accelerate and stop that the concrete classes will need to implement later.

mixins

from abc import ABC, abstractmethod


class Automobile(ABC):
    """Formal interface: Abstract automobile class."""

    @abstractmethod
    def start(self):
        pass

    @abstractmethod
    def accelerate(self):
        pass

    @abstractmethod
    def stop(self):
        pass


class Car(Automobile):
    """Concrete Class: Car"""

    def start(self):
        return "The car is starting"

    def accelerate(self):
        return "The car is accelerating"

    def stop(self):
        return "The car is stopping"


class Truck(Automobile):
    """Concrete Class: Truck"""

    def start(self):
        return "The truck is starting"

    def accelerate(self):
        return "The truck is accelerating"

    def stop(self):
        return "The truck is stopping"


class Bus(Automobile):
    """Concrete Class: Bus"""

    def start(self):
        return "The bus is starting"

    def accelerate(self):
        return "The bus is accelerating"

    def stop(self):
        return "The bus is stopping"


car = Car()
truck = Truck()
bus = Bus()

print(car.start())
print(car.accelerate())
print(car.stop())
print(truck.start())
print(truck.accelerate())
print(truck.stop())
print(bus.start())
print(bus.accelerate())
print(bus.stop())
The car is starting
The car is accelerating
The car is stopping
The truck is starting
The truck is accelerating
The truck is stopping
The bus is starting
The bus is accelerating
The bus is stopping

The above example delineates the use cases for interfaces. When you need to create multiple similar classes, interfaces can provide a basic foundation for the subclasses to build upon. In the next section, I’ll be using formal interfaces to create Mixin classes. So, before understanding mixin classes and how they can be used to inject additional plugins to your classes, it’s important that you understand interfaces and abstract base classes properly.

Mixins

Imagine you’re baking chocolate brownies. Now, you can have them without any extra fluff which is fine or you can top them with cream cheese, caramel sauce, chocolate chips etc. Usually you don’t make the extra toppings yourself, rather you prepare the brownies and use off the shelf toppings. This also gives you the ability to mix and match different combinations of toppings to spruce up the flavors quickly. However, making the the toppings from scratch would be a lengthy process and doing it over an over again can ruin the fun of baking.

While creating software, there’s sometimes a limit to the depth we should go. When pieces of what we’d like to achieve have already been executed well by others, it makes a lot of sense to reuse them. One way to achieve modularity and reusability in object-oriented programming is through a concept called a mixin. Different languages implement the concept of mixin in different ways. In Python, mixins are supported via multiple inheritance.

Overview

In the context of Python especially, a mixin is a parent class that provides functionality to subclasses but is not intended to be instantiated itself. This should already incite deja vu in you since classes that aren’t intended to be instantiated and can have both concrete and abstract methods are basically abstract base classes. Mixins can be regarded as a specific strain of abstract base classes where they can house both concrete and abstract methods but don’t keep any internal states.

These can help you when -

  • You want to provide a lot of optional features for a class.
  • You want to provide a lot of not-optional features for a class, but you want the features in separate classes so that each of them is about one feature (behavior).
  • You want to use one particular feature in many different classes.

Let’s see a contrived example. Consider Werkzeug’s3 request and response system. Werkzeug is a small library that Flask4 depends on. I can make a plain old request object by saying:

from werkzeug import BaseRequest


class Request(BaseRequest):
    pass

If I want to add accept header support, I would make that:

from werkzeug import BaseRequest, AcceptMixin


class Request(AcceptMixin, BaseRequest):
    pass

If I wanted to make a request object that supports accept headers, etags, user agent and authentication support, I could do this:

from werkzeug import (
    BaseRequest,
    AcceptMixin,
    ETagRequestMixin,
    UserAgentMixin,
    AuthenticationMixin,
)


class Request(
    AcceptMixin,
    ETagRequestMixin,
    UserAgentMixin,
    AuthenticationMixin,
    BaseRequest,
):
    pass

The above example might cause you to say, “that’s just multiple inheritance, not really a mixin”, which is can be true in a special case. Indeed, the differences between plain old multiple inheritance and mixin based inheritance collapse when the parent class can be instantiated. Understanding the subtlety in the differences between a mixin class, an abstract base class, an interface and the scope of multiple inheritance is important, so I’ll explore them in a dedicated section.

Differences between interfaces, abstract classes and mixins

In order to better understand mixins, it’s be useful to compare mixins with abstract classes and interfaces from a code/implementation perspective:

Interfaces

Interfaces can contain abstract methods only, no concrete methods and no internal states (instance variables).

Abstract Classes

Abstract classes can contain abstract methods, concrete methods and internal state.

Mixins

Like interfaces, mixins do not contain any internal state. But like abstract classes, they can contain one or more concrete methods. So mixins are basically abstract classes without any internal states.

In Python, these are just conventions because all of the above are defined as classes. However, one trait that is common among interfaces, abstract classes and mixins is that they shouldn’t exist on their own, i.e. shouldn’t be instantiated independently.

A complete example

Before diving into the real-life examples and how mixins can be used to construct custom data structures, let’s have a look at a self-contained example of a mixin class at work:

import inspect
from abc import ABC, abstractmethod
from pprint import pprint


class DisplayFactorMult(ABC):
    """Mixin class that reveals factor calculation details."""

    @abstractmethod
    def multiply(self, x):
        pass

    def multiply_show(self, x):
        result = self.multiply(x)
        print(f"Factor: {self.factor}, Argument: {x},  Result: {result}")
        return result


class FactorMult(DisplayFactorMult):
    """Concrete class that uses the DisplayFactorMult mixin."""

    def __init__(self, factor):
        self.factor = factor

    def multiply(self, x):
        return x * self.factor


# Putting the FactorMult class to use
f = FactorMult(10)
f.multiply_show(20)

# Use the inspect.getmembers method to inspect the methods
pprint(inspect.getmembers(f, predicate=inspect.ismethod))
Factor: 10, Argument: 20,  Result: 200
[('__init__',
    <bound method FactorMult.__init__ of
    <__main__.FactorMult object at 0x7f0f0546bf40>>),
    ('multiply', <bound method FactorMult.multiply of
    <__main__.FactorMult object at 0x7f0f0546bf40>>),
    ('multiply_show', <bound method DisplayFactorMult.multiply_show of
    <__main__.FactorMult object at 0x7f0f0546bf40>>)]

The FactorMult class takes in a number as a factor and the multiply method simply multiplies an argument with the factor. The mixin class DisplayFactorMult provides an additional method multiply_show that enhances the multiply method of the concrete class. Method multiply_show prints the value of the factor, arguments an the result before returning the result. Here, DisplayFactoryMult is a mixin since it houses an abstract method multiply, a concrete method multiply_show and doesn’t store any instance variable.

If you really want to dive deeper into mixins and their real-life use cases, checkout the codebase of the requests5 library. It defines and employs many powerful mixin classes to bestow superpowers upon different concrete classes.

Building powerful custom data structures with mixins

You’ve reached the hall of fame where I’ll be building custom data structures using the mixin classes from the collections.abc module.

Verbose tuple

This is a tuple-like data structure that acts exactly like the built-in tuple but with one exception. It’ll print out the special methods underneath when you perform any operation with it.

from collections.abc import Sequence


class VerboseTuple(Sequence):
    """Custom class that is exactly like a tuple but does some
    extra magic.

    Sequence:
    -------------------
    Inherits From: Reversible, Collection
    Abstract Methods: __getitem__, __len__
    Mixin Methods: __contains__, __iter__, __reversed__, index,
            and count
    """

    def __init__(self, *args):
        self.args = args

    @classmethod
    def _classname(cls):
        # This method just returns the name of the class
        return cls.__name__

    def __getitem__(self, index):
        print(f"Method: __getitem__, Index: {index}")
        return self.args[index]

    def __len__(self):
        print(f"Method: __len__")
        return len(self.args)

    def __repr__(self):
        return f"{self._classname()}{tuple(self.args)}"


vt = VerboseTuple(1, 3, 4)

print(vt)
print(f"Abstract Methods: {set(Sequence.__abstractmethods__)}")
print(
    f"Mixin Methods: { {k for k, v in Sequence.__dict__.items() if callable(v)} }"
)
VerboseTuple(1, 3, 4)
Abstract Methods: {'__len__', '__getitem__'}
Mixin Methods: {'__iter__', '__contains__', 'index', 'count', '__getitem__',
'__reversed__'}

To build the VerboseTuple data structure, first, I’ve inherited the Sequence mixin class from the collections.abc module. The docstring mentions all the abstract and mixin methods provided by the Sequence class. To build the new data structure, you’ll have to implement all the abstract methods defined in the Sequence class and you’ll get all the mixin methods implemented automatically. Notice that the print statement above also reveals the abstract and the mixin methods.

In the following snippet I’ve used some of the functionalities offered by tuple and printed them in a way that will reveal the special methods when they perform any action.

# check __getitem__
print("\n ==== Checking __getitem__ ====")
print(vt[2])

# check __len__
print("\n ==== Checking __len__ ====")
print(len(vt))

# check __contains__
print("\n ==== Checking __contains__ ====")
print(3 in vt)

# check __len__
print("\n ==== Checking __iter__ ====")
for elem in vt:
    print(elem)

# check reverse
print(f"\n ==== Checking reverse ====")
print(list(reversed(vt)))

# check count
print("\n ==== Checking count ====")
print(vt.count(1))
    ==== Checking __getitem__ ====
Method: __getitem__, Index: 2
4

    ==== Checking __len__ ====
Method: __len__
3

    ==== Checking __contains__ ====
Method: __getitem__, Index: 0
Method: __getitem__, Index: 1
True

    ==== Checking __iter__ ====
Method: __getitem__, Index: 0
1
Method: __getitem__, Index: 1
3
Method: __getitem__, Index: 2
4
Method: __getitem__, Index: 3

    ==== Checking reverse ====
Method: __len__
Method: __getitem__, Index: 2
Method: __getitem__, Index: 1
Method: __getitem__, Index: 0
[4, 3, 1]

    ==== Checking count ====
Method: __getitem__, Index: 0
Method: __getitem__, Index: 1
Method: __getitem__, Index: 2
Method: __getitem__, Index: 3
1

The printed statements reveal the corresponding special methods used internally when a particular tuple operation occurs.

Verbose list

This is a list-like data structure that acts exactly like the built-in list but with one exception. Like VerboseTuple, it’ll also print out the special methods underneath when you perform any operation on or with it.`

from collections.abc import MutableSequence


class VerboseList(MutableSequence):
    """Custom class that is exactly like a list but does some
    extra magic.

    MutableSequence:
    -----------------
    Inherits From: Sequence
    Abstract Methods: __getitem__, __setitem__, __delitem__,
            __len__, insert
    Mixin Methods: Inherited Sequence methods and append, reverse,
            extend, pop, remove, and __iadd__
    """

    def __init__(self, *args):
        self.args = list(args)

    @classmethod
    def _classname(cls):
        # This method just returns the name of the class
        return cls.__name__

    def __getitem__(self, index):
        print(f"Method: __getitem__, Index: {index}")
        return self.args[index]

    def __setitem__(self, index, value):
        print(f"Method: __setitem__, Index: {index}, Value: {value}")
        self.args[index] = value

    def __delitem__(self, index):
        print(f"Method: __delitem__, Index: {index}")
        del self.args[index]

    def __len__(self):
        print(f"Method: __len__")
        return len(self.args)

    def __repr__(self):
        return f"{self._classname()}{tuple(self.args)}"

    def insert(self, index, value):
        self.args.insert(index, value)


vl = VerboseList(4, 5, 6)
vl2 = VerboseList(7, 8, 9)

print(vl)
print(f"Abstract Methods: {set(MutableSequence.__abstractmethods__)}")
print(
    f"Mixin Methods: {
        {k for k, v in MutableSequence.__dict__.items() if callable(v)}
    }"
)
VerboseList(4, 5, 6)
Abstract Methods: {'__delitem__', '__len__', '__getitem__', 'insert', '__setitem__'}
Mixin Methods: {'__iadd__', '__setitem__', 'pop', 'append', 'extend', '__delitem__',
'reverse', 'insert', 'clear', 'remove'}

In the above segment, I’ve inherited the MutableSequence mixin class from the collections.abc module. This ensures that the VerboseList object will be mutable. All the abstract methods mentioned in the docstring have been implemented and the output print statements reveal the structure of the custom data structure as well as all the abstract and mixin methods.

In the following snippet, I’ve used some of the functionalities offered by list and printed them in a way that will reveal the special methods when they perform any action.

# check __setitem__
print("\n ==== Checking __setitem__ ====")
vl[1] = 44
print(vl)

# check remove (__delitem__)
print("\n ==== Checking remove ====")
vl.remove(6)
print(vl)

# check extend
print("\n ==== Checking extend ====")
vl.extend([0, 0])
print(vl)

# check pop
print("\n ==== Checking pop ====")
vl.pop(-1)
print(vl)

# check __iadd__
print("\n ==== Checking __iadd__")
vl += vl2
print(vl)
    ==== Checking __setitem__ ====
Method: __setitem__, Index: 1, Value: 44
VerboseList(4, 44, 6)

    ==== Checking remove ====
Method: __getitem__, Index: 0
Method: __getitem__, Index: 1
Method: __getitem__, Index: 2
Method: __delitem__, Index: 2
VerboseList(4, 44)

    ==== Checking extend ====
Method: __len__
Method: __len__
VerboseList(4, 44, 0, 0)

    ==== Checking pop ====
Method: __getitem__, Index: -1
Method: __delitem__, Index: -1
VerboseList(4, 44, 0)

    ==== Checking __iadd__
Method: __getitem__, Index: 0
Method: __len__
Method: __getitem__, Index: 1
Method: __len__
Method: __getitem__, Index: 2
Method: __len__
Method: __getitem__, Index: 3
VerboseList(4, 44, 0, 7, 8, 9)

Verbose frozen dict

Here, VerboseFrozenDict is an immutable data structure that is similar to the built-in dictionaries. Like the previous structures, this also reveals the internal special methods while performing different operations.

from collections.abc import Mapping


class VerboseFrozenDict(Mapping):
    """Custom class that is exactly like an immutable dict but does
    some extra magic.

    Mapping:
    -----------------
    Inherits From: Collection
    Abstract Methods: __getitem__, __iter__, __len__
    Mixin Methods: __contains__, keys, items, values, get, __eq__,
            and __ne__
    """

    def __init__(self, **kwargs):
        self.kwargs = kwargs

    @classmethod
    def _classname(cls):
        # This method just returns the name of the class
        return cls.__name__

    def __getitem__(self, key):
        print(f"Method: __getitem__, Key: {key}")
        return self.kwargs[key]

    def __iter__(self):
        print(f"Method: __iter__")
        return iter(self.kwargs)

    def __len__(self):
        print(f"Method: __len__")
        return len(self.kwargs)

    def __repr__(self):
        return f"{self._classname()}({self.kwargs})"


vf = VerboseFrozenDict(**{"a": "apple"})
vf2 = VerboseFrozenDict(**{"b": "orange", "c": "mango"})

print(vf)
print(f"Abstract Methods: {set(Mapping.__abstractmethods__)}")
print(
    f"Mixin Methods: {
        {k for k, v in Mapping.__dict__.items() if callable(v)}
    }"
)
VerboseFrozenDict({'a': 'apple'})
Abstract Methods: {'__len__', '__getitem__', '__iter__'}
Mixin Methods: {'items', '__contains__', 'values', '__eq__', 'keys', 'get',
'__getitem__'}

In the above segment, I’ve inherited the Mapping mixin class from the collections.abc module. This ensures that the output sequence will be immutable. Just like before, all the abstract methods mentioned in the docstring have been implemented and the output print statements reveal the structure of the custom data structure, all the abstract and mixin methods.

Below the printed output will reveal the special methods used internally when the VerboseFrozenDict objects perform any operation.

# check __getitem__
print("\n ==== Checking __getitem__ ====")
print(vf["a"])

# check __iter__
print("\n ==== Checking __iter__ ====")
for elem in vf:
    print(elem)

# check __len__
print("\n ==== Checking __len__ ====")
print(len(vf))

# check __contains__
print("\n ==== Checking __iter__ ====")
print("a" in vf)

# check keys, values
print(f"\n ==== Checking items, keys, values ====")
print(vf.items())
print(vf.keys())
print(vf.values())

# check get
print("\n ==== Checking get ====")
print(vf.get("b", None))

# check eq & nq
print("\n ==== Checking __eq__, __nq__ ====")
print(vf == vf2)
print(vf != vf2)
    ==== Checking __getitem__ ====
Method: __getitem__, Key: a
apple

    ==== Checking __iter__ ====
Method: __iter__
a

    ==== Checking __len__ ====
Method: __len__
1

    ==== Checking __iter__ ====
Method: __getitem__, Key: a
True

    ==== Checking items, keys, values ====
ItemsView(VerboseFrozenDict({'a': 'apple'}))
KeysView(VerboseFrozenDict({'a': 'apple'}))
ValuesView(VerboseFrozenDict({'a': 'apple'}))

    ==== Checking get ====
Method: __getitem__, Key: b
None

    ==== Checking __eq__, __nq__ ====
Method: __iter__
Method: __getitem__, Key: a
Method: __iter__
Method: __getitem__, Key: b
Method: __getitem__, Key: c
False
Method: __iter__
Method: __getitem__, Key: a
Method: __iter__
Method: __getitem__, Key: b
Method: __getitem__, Key: c
True

Verbose dict

The VerboseDict data structure is the mutable version of VerboseFrozedDict. It supports all the operations of VerboseFrozenDict with some additional features like adding and deleting key-value pairs, updating values corresponding to different keys etc.

from collections.abc import MutableMapping


class VerboseDict(MutableMapping):
    """Custom class that is exactly like a dict but does some
    extra magic.

    MutableMapping:
    -----------------
    Inherits From: Mapping
    Abstract Methods: __getitem__, __setitem__, __delitem__, __iter__,
                __len__
    Mixin Methods: Inherited Mapping methods and pop, popitem, clear,
                update, and setdefault
    """

    def __init__(self, **kwargs):
        self.kwargs = kwargs

    @classmethod
    def _classname(cls):
        # This method just returns the name of the class
        return cls.__name__

    def __getitem__(self, key):
        print(f"Method: __getitem__, Key: {key}")
        return self.kwargs[key]

    def __setitem__(self, key, value):
        print(f"Method: __setitem__, Key: {key}")
        self.kwargs[key] = value

    def __delitem__(self, key):
        print(f"Method: __delitem__, Key: {key}")
        del self.kwargs[key]

    def __iter__(self):
        print(f"Method: __iter__")
        return iter(self.kwargs)

    def __len__(self):
        print(f"Method: __len__")
        return len(self.kwargs)

    def __repr__(self):
        return f"{self._classname()}({self.kwargs})"


vd = VerboseDict(**{"a": "apple", "b": "ball", "c": "cat"})

print(vd)
print(f"Abstract Methods: {set(MutableMapping.__abstractmethods__)}")
print(
    f"Mixin Methods: {
        {k for k, v in MutableMapping.__dict__.items() if callable(v)}
    }"
)
VerboseDict({'a': 'apple', 'b': 'ball', 'c': 'cat'})
Abstract Methods: {
    '__delitem__', '__len__', '__iter__',
    '__getitem__', '__setitem__'
}
Mixin Methods: {
    '__setitem__', 'pop', 'popitem',
    '__delitem__', 'setdefault', 'update',
    'clear'
}

The output statements reveal the structure of the VeboseDict class and the abstract and mixin methods associated with it. The following snippet will print the special methods used internally by the custom data structure (also in the built-in one) while performing different operations.

# check __getitem__
print("\n ==== Checking __setitem__ ====")
vd["a"] = "orange"
print(vd)

# check popitem
print("\n ==== Checking popitem ====")
vd.popitem()
print(vd)

# check update
print("\n ==== Checking update ====")
vd.update({"d": "dog"})
print(vd)

# check clear
print("\n ==== Checking clear ====")
vd.clear()
print(vd)

# check setdefault
print(f"\n ==== Checking setdefault ====")
x = vd.setdefault("a", "pepsi")
print(x)
print(vd)
    ==== Checking __setitem__ ====
Method: __setitem__, Key: a
VerboseDict({'a': 'orange', 'b': 'ball', 'c': 'cat'})

    ==== Checking popitem ====
Method: __iter__
Method: __getitem__, Key: a
Method: __delitem__, Key: a
VerboseDict({'b': 'ball', 'c': 'cat'})

    ==== Checking update ====
Method: __setitem__, Key: d
VerboseDict({'b': 'ball', 'c': 'cat', 'd': 'dog'})

    ==== Checking clear ====
Method: __iter__
Method: __getitem__, Key: b
Method: __delitem__, Key: b
Method: __iter__
Method: __getitem__, Key: c
Method: __delitem__, Key: c
Method: __iter__
Method: __getitem__, Key: d
Method: __delitem__, Key: d
Method: __iter__
VerboseDict({})

    ==== Checking setdefault ====
Method: __getitem__, Key: a
Method: __setitem__, Key: a
pepsi
VerboseDict({'a': 'pepsi'})

Going ballistic with custom data structures

This section discusses two advanced data structures that I mentioned at the beginning of the post.

  • BitSet : Mutable set-like data structure that doesn’t perform hashing.
  • SQLAlchemyDict: Mutable dict-like data structure that can store key-value pairs in any SQLAlchemy supported relational database.

BitSet

This mutable set-like data structure doesn’t perform hashing to store data. It can store integers in a fixed range. While storing integers, BitSet objects use less memory compared to built-in sets.

However, since no hashing happens, it’s slower to perform addition and retrieval compared to built-in sets. The following code snippet was taken directly from Raymond Hettinger’s 2019 PyCon Russia talk6 on advanced data structures.

from collections.abc import MutableSet


class BitSet(MutableSet):
    "Ordered set with compact storage for integers in a fixed range"

    def __init__(self, limit, iterable=()):
        self.limit = limit
        num_bytes = (limit + 7) // 8
        self.data = bytearray(num_bytes)
        self |= iterable

    def _get_location(self, elem):
        if elem < 0 or elem >= self.limit:
            raise ValueError(
                f"{elem!r} must be in range 0 <= elem < {self.limit}"
            )
        return divmod(elem, 8)

    def __contains__(self, elem):
        bytenum, bitnum = self._get_location(elem)
        return bool((self.data[bytenum] >> bitnum) & 1)

    def add(self, elem):
        bytenum, bitnum = self._get_location(elem)
        self.data[bytenum] |= 1 << bitnum

    def discard(self, elem):
        bytenum, bitnum = self._get_location(elem)
        self.data[bytenum] &= ~(1 << bitnum)

    def __iter__(self):
        for elem in range(self.limit):
            if elem in self:
                yield elem

    def __len__(self):
        return sum(1 for elem in self)

    def __repr__(self):
        return (
            f"{type(self).__name__}(limit={self.limit}, iterable={list(self)})"
        )

    def _from_iterable(self, iterable):
        return type(self)(self.limit, iterable)

Let’s inspect the above data structure to understand exactly how much memory we can save. I’ll digress a little here. Normally, you’d use sys.getsizeof to measure the memory footprint of an object where the function reveals the size in bytes.

But there’s a problem. The function sys.getsizeof only reveals the size of the target object, excluding the objects the target objects might be referring to. To understand what I mean, consider the following situation:

Suppose, you have a nested list that looks like this:

lst = [[1], [2, 3], [[4, 5], 6, 7], 8, 9]

When you apply sys.getsizeof function on the list, it shows 96 bytes. This means only the outermost list consumes 96 bytes of memory. Here, sys.getsizeof doesn’t include the size of the nested lists.

The same is true for other data structures. In case of nested dictionaries, sys.getsizeof will not include the size of nested data structures. I’ll only reveal the size of the outermost dictionary object. The following snippet will traverse through the reference tree of a nested object and reveal the true size of it.

from collections.abc import Mapping, Container
from sys import getsizeof


def deep_getsizeof(o: object, ids: None = None) -> int:
    """Find the memory footprint of a Python object.

    This is a recursive function that drills down a Python object graph
    like a dictionary holding nested dictionaries with lists of lists
    and tuples and sets.

    The sys.getsizeof function does a shallow size of only. It counts each
    object inside a container as pointer only regardless of how big it
    really is.

    Params
    ------
     o: object
        The object
     ids: None
        Later an iterable is assigned to store the object ids

     Returns
     --------
     int
        Returns the size of object in bytes
    """

    if ids is None:
        ids = set()

    d = deep_getsizeof
    if id(o) in ids:
        return 0

    r = getsizeof(o)
    ids.add(id(o))

    if isinstance(o, str):
        return r

    if isinstance(o, Mapping):
        return r + sum(d(k, ids) + d(v, ids) for k, v in o.iteritems())

    if isinstance(o, Container):
        return r + sum(d(x, ids) for x in o)

    return r

Let’s use the deep_getsizeof to inspect the size differences between built-in set and BitSet objects.

bs = BitSet(limit=5, iterable=[0, 4])
s = {0, 4}
print(f"Normal Set object: {s}")
print(f"BitSet object: {bs}")
print(f"Size of a normal Set object: {deep_getsizeof(s)} bytes")
print(f"Size of a BitSet object: {deep_getsizeof(bs)} bytes")
Normal Set object: {0, 4}
BitSet object: BitSet(limit=5, iterable=[0, 4])
Size of a normal Set object: 268 bytes
Size of a BitSet object: 100 bytes

The output of the print statements reveal that the BitSet object uses less than half the memory compared to its built-in counterpart!

SQLAlchemyDict

Here goes the second type of custom data structure that I mentioned in the introduction. It’s also a mutable dict-like structure that can automatically store key-value pairs to any SQLAlchemy supported relational database when initialized.

I was inspired to write this one from the same Raymond Hettinger talk that I mentioned before. For demonstration purposes, I’ve chosen SQLite databse to store the key value pairs.

This structure gives you immense power since you can abstract away the entire process of database communication inside the custom object. You’ll perform get-set-delete operations on the object just like you’d do so with built-in dictionary objects and the custom object will take care of storing and updating the data to the target database.

Before running the code snippet below, you’ll need to install SQLAlchemy as an external dependency.

# sqla_dict.py

from collections.abc import MutableMapping
from contextlib import contextmanager
from operator import itemgetter

from sqlalchemy import create_engine, text
from sqlalchemy.exc import OperationalError
from sqlalchemy.orm import sessionmaker


def create_transaction_session(dburl):
    # an engine, which the Session will use for connection resources
    some_engine = create_engine(dburl)

    # create a configured "Session" class
    Session = sessionmaker(bind=some_engine)

    @contextmanager
    def session_scope():
        """Provide a transactional scope around a series of operations."""

        session = Session()
        try:
            yield session
            session.commit()

        except OperationalError:
            pass

        except Exception:
            session.rollback()
            raise
        finally:
            session.close()

    return session_scope


session_scope = create_transaction_session("sqlite:///foo.db")


class SQLAlechemyDict(MutableMapping):
    def __init__(self, dbname, session_scope, items=None, **kwargs):
        self.dbname = dbname
        self.session_scope = session_scope

        if items is None:
            items = []

        with self.session_scope() as session:
            session.execute("CREATE TABLE Dict (key text, value text)")
            session.execute("CREATE UNIQUE INDEX KIndx ON Dict (key)")

        self.update(items, **kwargs)

    def __setitem__(self, key, value):
        if key in self:
            del self[key]

        with self.session_scope() as session:
            session.execute(
                text("INSERT INTO  Dict VALUES (:key, :value)"),
                {"key": key, "value": value},
            )

    def __getitem__(self, key):
        with self.session_scope() as session:
            r = session.execute(
                text("SELECT value FROM Dict WHERE key=:key"),
                {"key": key},
            )
            row = r.fetchone()

            if row is None:
                raise KeyError(key)
            return row[0]

    def __delitem__(self, key):
        if key not in self:
            raise KeyError(key)

        with self.session_scope() as session:
            session.execute(
                text("DELETE FROM Dict WHERE key=:key"), {"key": key}
            )

    def __len__(self):
        with self.session_scope() as session:
            r = session.execute("SELECT COUNT(*) FROM Dict")
            return next(r)[0]

    def __iter__(self):
        with self.session_scope() as session:
            r = session.execute("SELECT key FROM Dict")
            return map(itemgetter(0), r.fetchall())

    def __repr__(self):
        return f"{type(self).__name__}(dbname={self.dbname!r}, items={list(self.items())})"

    def vacuum(self):
        with self.session_scope() as session:
            session.execute("VACUUM;")


if __name__ == "__main__":
    # test the struct
    sqladict = SQLAlechemyDict(
        dbname="foo.db",
        session_scope=session_scope,
        items={"hello": "world"},
    )
    print(sqladict)
    sqladict["key"] = "val"

    for key in sqladict:
        print(key)

    # >>> SQLAlechemyDict(dbname='foo.db', items=[('hello', 'world'), ('key', 'val')])
    # >>> hello
    # >>> key
SQLAlechemyDict(dbname='foo.db', items=[('hello', 'world'), ('key', 'val')])
hello
key

Running the above code snippet will create a SQLite database named foo.db in your current working directory. You can inspect the database with any database viewer and find your key-value pairs there. Everything else is the same as a built-in dictionary object.

Recent posts