Basic usage

Serializing and deserializing with cbor2 is pretty straightforward:

from cbor2 import dump, dumps, load, loads

# Serialize an object as a bytestring
data = dumps(['hello', 'world'])

# Deserialize a bytestring
obj = loads(data)

# Efficiently deserialize from a file
with open('input.cbor', 'rb') as fp:
    obj = load(fp)

# Efficiently serialize an object to a file
with open('output.cbor', 'wb') as fp:
    dump(obj, fp)

Some data types, however, require extra considerations, as detailed below.

Date/time handling

The CBOR specification does not support naïve datetimes (that is, datetimes where tzinfo is missing). When the encoder encounters such a datetime, it needs to know which timezone it belongs to. To this end, you can specify a default timezone by passing a tzinfo instance to dump()/dumps() call as the timezone argument. Decoded datetimes are always timezone aware.

By default, datetimes are serialized in a manner that retains their timezone offsets. You can optimize the data stream size by passing datetime_as_timestamp=False to dump()/dumps(), but this causes the timezone offset information to be lost.

In versions prior to 4.2 the encoder would convert a datetime.date object into a datetime.datetime prior to writing. This can cause confusion on decoding so this has been disabled by default in the next version. The behaviour can be re-enabled as follows:

from cbor2 import dumps
from datetime import date, timezone

# Serialize dates as datetimes
encoded = dumps(date(2019, 10, 28), timezone=timezone.utc, date_as_datetime=True)

A default timezone offset must be provided also.

Cyclic (recursive) data structures

If the encoder encounters a shareable object (ie. list or dict) that it has seen before, it will by default raise CBOREncodeError indicating that a cyclic reference has been detected and value sharing was not enabled. CBOR has, however, an extension specification that allows the encoder to reference a previously encoded value without processing it again. This makes it possible to serialize such cyclic references, but value sharing has to be enabled by passing value_sharing=True to dump()/dumps().

Warning

Support for value sharing is rare in other CBOR implementations, so think carefully whether you want to enable it. It also causes some line overhead, as all potentially shareable values must be tagged as such.

String references

When string_referencing=True is passed to dump()/dumps(), if the encoder would encode a string that it has previously encoded and where a reference would be shorter than the encoded string, it instead encodes a reference to the nth sufficiently long string already encoded.

Warning

Support for string referencing is rare in other CBOR implementations, so think carefully whether you want to enable it.

Decoder support for semantic tags

Below is a list of supported semantic tags for decoding.

Tag

Semantics

Python type(s)

0

Standard datetime string

datetime.datetime

1

Epoch-based datetime

datetime.datetime

2

Positive bignum

int

3

Negative bignum

int

4

Decimal fraction

decimal.Decimal

5

Bigfloat

decimal.Decimal

25

String reference

str / bytes

28

Mark shared value

N/A

29

Reference shared value

N/A

30

Rational number

fractions.Fraction

35

Regular expression

re.Pattern (result of re.compile(...))

36

MIME message

email.message.Message

37

Binary UUID

uuid.UUID

52

IPv4 address/network

ipaddress.IPv4Address, ipaddress.IPv4Network or ipaddress.IPv4Interface

54

IPv6 address/network

ipaddress.IPv6Address, ipaddress.IPv6Network or ipaddress.IPv6Interface

100

Epoch-based date

datetime.date

256

String reference namespace

N/A

258

Set of unique items

set

260

Network address (DEPRECATED)

ipaddress.IPv4Address or ipaddress.IPv4Address

261

Network prefix (DEPRECATED))

ipaddress.IPv4Network, ipaddress.IPv4Interface, ipaddress.IPv6Network or ipaddress.IPv6Interface

1004

ISO formatted date string

datetime.date

43000

Single complex number

complex

55799

Self-Described CBOR

object

Any unsupported semantic tags encountered by the decoder will be decoded as CBORTag objects.

Encoder support for built-in CBOR types

Below is the list of Python types encoded as various major CBOR tags or special values.

Python type

Major tag

Subtype

int

0 or 1

bytes

2

bytearray

2

str

3

list

4

tuple

4

collections.abc.Sequence

4

dict

5

collections.abc.Mapping

5

bool

7

20 (False) or 21 (True)

None

7

22

undefined

7

23

float

7

27

Encoder semantic tag support

Below is the list of Python types and the semantic tags they’re encoded as.

Python type

Semantic tag

complex

43000

datetime.datetime

0 or 1, depending on the datetime_as_timestamp setting

datetime.date

1004, 100, 1 or 0, depending on the date_as_datetime and datetime_as_timestamp settings

decimal.Decimal

4 (unless NaN or infinities, which are encoded as floats)

fractions.Fraction

30

frozenset

258

re.Pattern

35

email.mime.text.MIMEText

35

ipaddress.IPv4Address

52

ipaddress.IPv4Interface

52

ipaddress.IPv4Network

52

ipaddress.IPv6Address

54

ipaddress.IPv6Interface

54

ipaddress.IPv6Network

54

set

258

uuid.UUID

37

If you want to write a file that is detected as CBOR by the Unix file utility, wrap your data in a CBORTag object like so:

from cbor2 import dump, CBORTag

with open('output.cbor', 'wb') as fp:
    dump(CBORTag(55799, obj), fp)

This will be ignored on decode and the original data content will be returned.

Use Cases

Here are some things that the cbor2 library could be (and in some cases, is being) used for:

  • Experimenting with network protocols based on CBOR encoding

  • Designing new data storage formats

  • Submitting binary documents to ElasticSearch without base64 encoding overhead

  • Storing and validating file metadata in a secure backup system

  • RPC which supports Decimals with low overhead