Learning from bidict¶
Below is an outline of some of the more fascinating Python corners I got to explore further thanks to working on bidict.
If you are interested in learning more about any of the following, reviewing the (small) codebase could be a great way to get started.
Python’s data model¶
Using
__new__()to bypass default object initialization, e.g. for bettercopy()performance. See_base.py.Overriding
object.__getattribute__()for custom attribute lookup. See Sorted Bidict Recipes.Using
object.__getstate__(),object.__setstate__(), andobject.__reduce__()to make an object pickleable that otherwise wouldn’t be, due to e.g. using weakrefs, as bidicts do (covered further below).Using __slots__ to speed up attribute access and reduce memory usage. Must be careful with pickling and weakrefs. See
BidictBase.__getstate__().What happens when you implement a custom
__eq__()? e.g.a == bvs.b == awhen onlyais an instance of your class? Great write-up in https://eev.ee/blog/2012/03/24/python-faq-equality/Making an immutable type hashable (so it can be inserted into
dicts andsets): Must implement__hash__()such thata == b ⇒ hash(a) == hash(b). See theobject.__hash__()andobject.__eq__()docs. Seebidict.frozenbidict.Consider
FrozenOrderedBidict: its__eq__()is order-insensitive. So all contained items must participate in the hash order-insensitively.Can use collections.abc.Set._hash which provides a pure Python implementation of the same hash algorithm used to hash
frozensets. (SinceItemsViewextendsSet,bidict.frozenbidict.__hash__()just callsItemsView(self)._hash().)- Does this argue for making
collections.abc.Set._hash()non-private? - Why isn’t the C implementation of this algorithm directly exposed in
CPython? Only way to use it is to call
hash(frozenset(self.items())), which wastes memory allocating the ephemeral frozenset, and time copying all the items into it before they’re hashed.
- Does this argue for making
Unlike other attributes, if a class implements
__hash__(), any subclasses of that class will not inherit it. It’s like Python implicitly adds__hash__ = Noneto the body of every class that doesn’t explicitly define__hash__. So if you do want a subclass to inherit a base class’s__hash__()implementation, you have to set that manually, e.g. by adding__hash__ = BaseClass.__hash__in the class body. SeeFrozenOrderedBidict.This is consistent with the fact that
objectimplements__hash__(), but subclasses ofobjectthat override__eq__()are not hashable by default.
Surprising
Mappingcorner cases:- nan as key
- Equivalent but distinct Hashables
- pywat#38
- “Intransitive equality
(of
OrderedDict) was a mistake.” –Raymond Hettinger - Hence __eq__() is order-insensitive for ordered bidicts.
- “Intransitive equality
(of
If an instance of your custom mapping type contains the same items as a mapping of another type, should they compare equal? What if one of the mappings is ordered and the other isn’t? What about returning the
NotImplementedobject?- bidict’s
__eq__()design errs on the side of allowing more type polymorphism on the grounds that this is what the majority of use cases expect, and that it’s more Pythonic. - Any user who does need exact-type-matching equality can just override
bidict’s __eq__()method in a subclass.- If this subclass were also hashable, would it be worth overriding
bidict.frozenbidict.__hash__()too to include the type? - Only point would be to reduce collisions when multiple instances of different
types contained the same items
and were going to be inserted into the same
dictorset(since they’d now be unequal but would hash to the same value otherwise). Probably not worth it.
- If this subclass were also hashable, would it be worth overriding
- bidict’s
Using weakref¶
See Bidict Avoids Reference Cycles. The doubly-linked lists that back ordered bidicts also use weakrefs to avoid creating strong reference cycles.
Other interesting stuff in the standard library¶
reprlibandreprlib.recursive_repr()(but not needed for bidict because there’s no way to insert a bidict into itself)operator.methodcaller()platform.python_implementation- See Missing bidicts in Stdlib!
Subclassing namedtuple() classes¶
To get the performance benefits, intrinsic sortability, etc.
of namedtuple()
while customizing behavior, state, API, etc.,
you can subclass a namedtuple() class.
Just make sure to include __slots__ = (),
or you’ll lose a lot of the performance benefits.
_marker.py contains a small example.
Here’s a larger one:
>>> from collections import namedtuple
>>> from itertools import count
>>> class Node(namedtuple('_Node', 'cost tiebreaker data parent')):
... """Represent nodes in a graph traversal. Suitable for use with e.g. heapq."""
...
... __slots__ = ()
... _counter = count() # break ties between equal-cost nodes, avoid comparing data
...
... # Give call sites a cleaner API for creating new Nodes
... def __new__(cls, cost, data, parent=None):
... tiebreaker = next(cls._counter)
... return super(Node, cls).__new__(cls, cost, tiebreaker, data, parent)
...
... @property
... def depth(self):
... return self.parent.depth + 1 if self.parent else 0
...
... def __repr__(self):
... return 'Node(cost={cost}, data={data!r})'.format(**self._asdict())
>>> start = Node(cost=0, data='foo')
>>> child = Node(cost=5, data='bar', parent=start)
>>> child
Node(cost=5, data='bar')
>>> child.parent
Node(cost=0, data='foo')
>>> child.depth
1
namedtuple()-style dynamic class generation¶
See _named.py.
How to efficiently implement an ordered mapping¶
- Use a backing dict and doubly-linked list.
- See
_orderedbase.py.OrderedDictprovided a good reference.
API Design¶
- Integrating with
collectionsviacollections.abcandabc - Implementing ABCs like
collections.abc.Hashable - Thanks to
Hashableimplementingabc.ABCMeta.__subclasshook__(), any class that implements all the required methods of theHashableinterface (namely,__hash__()) makes it a virtual subclass already, no need to explicitly extend. I.e. As long asFooimplements a__hash__()method,issubclass(Foo, Hashable)will always be True, no need to explicitly subclass viaclass Foo(Hashable): ... collections.abc.Mappingandcollections.abc.MutableMappingdon’t implement__subclasshook__(), so must either explicitly subclass (if you want to inherit any of their implementations) or useabc.ABCMeta.register()(to register as a virtual subclass without inheriting any implementation)- Providing a new open ABC like
BidirectionalMapping - Notice we have
collections.abc.Reversiblebut nocollections.abc.Orderedorcollections.abc.OrderedMapping. Proposed in bpo-28912 but rejected. Would have been useful for bidict’s__repr__()implementation (see_base.py), and potentially for interop with other ordered mapping implementations such as SortedDict - Beyond
collections.abc.Mapping, bidicts implement additional APIs thatdictandOrderedDictimplement.- When creating a new API, making it familiar, memorable, and intuitive is hugely important to a good user experience.
- Making APIs Pythonic
- Zen of Python
- “Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess.” → bidict’s default duplication policies
- “Readability counts.”
“There should be one – and preferably only one – obvious way to do it.”
→ an early version of bidict allowed using the
~operator to access.inverseand a special slice syntax likeb[:val]to look up a key by value, but these were removed in preference to the more obvious and readable.inverse-based spellings.
Portability¶
Python 2 vs. Python 3
mostly
dictAPI changes, but also functions likezip(),map(),filter(), etc.If you define a custom
__eq__()on a class, it will not be used for!=comparisons on Python 2 automatically; you must explicitly add an__ne__()implementation that calls your__eq__()implementation. If you don’t,object.__ne__()will be used instead, which behaves likeis not. GOTCHA alert!Python 3 thankfully fixes this.
borrowing methods from other classes:
In Python 2, must grab the
.im_func/__func__attribute off the borrowed method to avoid gettingTypeError: unbound method ...() must be called with ... instance as first argumentSee
_frozenordered.py.
CPython vs. PyPy
- gc / weakref
- primitives’ identities, nan, etc.
- https://bitbucket.org/pypy/pypy/src/dafacc4/pypy/doc/cpython_differences.rst?mode=view
- hence
test_no_reference_cycles(intest_hypothesis.py) is skipped on PyPy
Python Syntax hacks¶
bidict used to support
slice syntax
for looking up keys by value.
See this for an example of how it was implemented.
See #19 for why it was dropped.