Customizing encoding and decoding
Both the encoder and decoder can be customized to support a wider range of types.
Customizing the decoder
There are three ways to customize the decoding behavior, available as keyword arguments to
load(), loads() and CBORDecoder:
semantic_decoders: lets you change how specific semantic tags are decodedtag_hook: lets you define a catch-all for unhandled semantic tagsobject_hook: lets you transform any newly-decoded dicts
Customizing map decoding
The final decoder option allows users to customize how CBOR maps are decoded, using the
object_hook option. This callback takes two arguments: the mapping (a dict or a
frozendict) and the immutable flag. The callback should return either the mapping
passed to it, or another object that should replace it.
Here’s an example that decode any dict with the key typename set to Point as a Point
instance:
from collections.abc import Mapping
from typing import Any
import cbor2
class Point:
def __init__(self, x: int, y: int):
self.x = x
self.y = y
def object_hook(value: Mapping[Any, Any], immutable: bool) -> Mapping[Any, Any] | Point:
if value.get("typename") == "Point":
return Point(value["x"], value["y"])
return value
payload = cbor2.dumps({"typename": "Point", "x": 4, "y": 5})
point = cbor2.loads(payload, object_hook=object_hook)
assert isinstance(point, Point)
assert point.x == 4
assert point.y == 5
Note
Make sure you have well defined rules for special handling of dicts so you don’t end up trying to convert all CBOR maps the decoder encounters.
Dealing with immutable containers
In rare cases, you may need to decode the next item from the stream as immutable. In practice, this means:
Maps are decoded as
frozendictinstead ofdict
There are two ways your custom decoder callbacks may want to interact with the decoder’s
immutable flag:
Use it to decide what data types to instantiate (e.g.
tuplevslist)Decode an enclosed item as immutable with
decoder.decode(immutable=True)
Here’s a simplified example that uses this flag to decode the semantic tag 258 as either a
set or a frozenset, depending on the value of the flag:
import cbor2
def decode_set(decoder: cbor2.CBORDecoder) -> set | frozenset:
# Ignore value sharing and indefinite containers (length == None)
# for the sake of simplicity
items = decoder.decode(immutable=True) # all set items must be hashable
return frozenset(items) if decoder.immutable else set(items)
# Encode/decode a regular set
value = {"aa", "bb"}
assert cbor2.loads(cbor2.dumps(value), semantic_decoders={258: decode_set}) == value
# Encode/decode a dict that uses a set as a key (must be frozenset to be used as a dict key)
value = {frozenset(["aa", "bb"]): "value"}
assert cbor2.loads(cbor2.dumps(value), semantic_decoders={258: decode_set}) == value
Customizing the encoder
There are two ways to customize the encoder behavior available as keyword arguments to
dump(), dumps() and CBOREncoder:
encoders: specifies a mapping of an exact Python type to an encoder callabledefault: specifies a “catch-all” encoder callable for objects not matched with any specific encoder callback
Overriding the encoder for a specific Python type
The encoders option allows users to override the encoding behavior for any Python types.
The option takes a dict or any mapping type where the
keys are Python types and the values are encoder callbacks. The encoder callbacks must take two
positional arguments: the encoder instance and the object to be encoded.
Here’s an example of how to add support for encoding a custom type:
import cbor2
class Point:
def __init__(self, x: int, y: int):
self.x = x
self.y = y
def encode_point(encoder: cbor2.CBOREncoder, value: Point) -> None:
# Tag number 4000 was chosen arbitrarily
encoder.encode_semantic(4000, [value.x, value.y])
# prints b'\xd9\x0f\xa0\x82\x04\x05'
print(cbor2.dumps(Point(4, 5), encoders={Point: encode_point}))
This encodes the two fields, x and y, as an array under the (arbitrarily chosen) semantic tag 4000.
Important
The encoder matches type exactly, so it will not match against subclasses of types in the encoder registry!
Value sharing with custom types
In order to properly encode and decode cyclic references with custom types, some special care has to be taken. Suppose you have a custom type as below, where any child object could contain a reference to its parent or any ancestor, you would encounter an error when naively trying to serialize such a cyclic object graph:
from __future__ import annotations
from typing import Any
import cbor2
class MyType:
def __init__(self, parent: MyType | None = None):
self.parent = parent
self.children = []
if parent:
self.parent.children.append(self)
def encode_mytype(encoder: cbor2.CBOREncoder, value: MyType):
# The state has to be serialized separately so that the decoder would have a chance to
# create an empty instance before the shared value references are decoded
encoder.encode_semantic(80000, value.__dict__)
def decode_mytype(state: dict[str, Any], immutable: bool) -> MyType:
instance = MyType.__new__()
instance.__dict__.update(state)
return instance
parent = MyType()
child1 = MyType(parent)
child2 = MyType(parent)
# ERROR: cbor2.CBOREncodeValueError: cyclic data structure detected
serialized = cbor2.dumps(parent, encoders={MyType: encode_mytype})
To fix this, a few adjustments need to be made:
Value sharing needs to be turned on in the encoder with
value_sharing=TrueThe encoder callback must be decorated with
@shareable_encoderThe decoder callback must be decorated with
@shareable_decoder
Here is the revised example:
from __future__ import annotations
from collections.abc import Callable
from typing import Any
import cbor2
class MyType:
def __init__(self, parent: MyType | None = None):
self.parent = parent
self.children = []
if parent:
self.parent.children.append(self)
@cbor2.shareable_encoder
def encode_mytype(encoder: cbor2.CBOREncoder, value: MyType):
# The state has to be serialized separately so that the decoder would have a chance to
# create an empty instance before the shared value references are decoded
encoder.encode_semantic(80000, value.__dict__)
@cbor2.shareable_decoder
def decode_mytype(immutable: bool) -> tuple[MyType, Callable[[Any], Any]]:
# The uninitialized instance will be marked as shareable before its state is decoded
instance = MyType.__new__(MyType)
def decoder(state: dict[str, Any]) -> MyType:
instance.__dict__.update(state)
return instance
# Return the raw instance and a callback to be run once the state has been decoded
return instance, decoder
parent = MyType()
child1 = MyType(parent)
child2 = MyType(parent)
# Important: value sharing must be enabled
serialized = cbor2.dumps(parent, encoders={MyType: encode_mytype}, value_sharing=True)
new_parent = cbor2.loads(serialized, semantic_decoders={80000: decode_mytype})
assert new_parent.children[0].parent is new_parent
assert new_parent.children[1].parent is new_parent