kopia lustrzana https://github.com/peterhinch/micropython-samples
SERIALISATION.md: Add CBOR information.
rodzic
d7529baff2
commit
fdf4250f0c
146
SERIALISATION.md
146
SERIALISATION.md
|
@ -16,20 +16,23 @@ I2C or SPI. All these require the data to be presented as linear sequences of
|
||||||
bytes. The problem is how to convert an arbitrary Python object to such a
|
bytes. The problem is how to convert an arbitrary Python object to such a
|
||||||
sequence, and how subsequently to restore the object.
|
sequence, and how subsequently to restore the object.
|
||||||
|
|
||||||
There are numerous standards for achieving this, five of which are readily
|
There are numerous standards for achieving this, six of which are readily
|
||||||
available to MicroPython. Each has its own advantages and drawbacks. In two
|
available to MicroPython. Each has its own advantages and drawbacks. In two
|
||||||
cases the encoded strings aim to be human readable and comprise ASCII
|
cases the encoded strings aim to be human readable and comprise ASCII
|
||||||
characters. In the others they comprise binary `bytes` objects where bytes can
|
characters. In the others they comprise binary `bytes` objects where bytes can
|
||||||
take all possible values. The following are the formats with MicroPython
|
take all possible values. The following are the formats with known MicroPython
|
||||||
support:
|
support:
|
||||||
|
|
||||||
1. ujson (ASCII, official)
|
Self-describing:
|
||||||
|
1. json [ASCII, official](http://docs.micropython.org/en/latest/library/json.html).
|
||||||
2. pickle (ASCII, official)
|
2. pickle (ASCII, official)
|
||||||
3. ustruct (binary, official)
|
3. MessagePack [binary, unofficial](https://github.com/peterhinch/micropython-msgpack)
|
||||||
4. MessagePack [binary, unofficial](https://github.com/peterhinch/micropython-msgpack)
|
4. CBOR [binary, unofficial](https://github.com/alexmrqt/micropython-cbor/tree/master)
|
||||||
5. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
|
Requiring a schema:
|
||||||
|
5. struct [binary, official](http://docs.micropython.org/en/latest/library/struct.html)
|
||||||
|
6. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
|
||||||
|
|
||||||
The `ujson` and `pickle` formats produce human-readable byte sequences. These
|
The `json` and `pickle` formats produce human-readable byte sequences. These
|
||||||
aid debugging. The use of ASCII data means that a delimiter can be used to
|
aid debugging. The use of ASCII data means that a delimiter can be used to
|
||||||
identify the end of a message. This is because it is possible to guarantee that
|
identify the end of a message. This is because it is possible to guarantee that
|
||||||
the delimiter will never occur within a message. A delimiter cannot be used
|
the delimiter will never occur within a message. A delimiter cannot be used
|
||||||
|
@ -37,36 +40,45 @@ with binary formats because a message byte can take all possible values
|
||||||
including that of the delimiter. The drawback of ASCII formats is inefficiency:
|
including that of the delimiter. The drawback of ASCII formats is inefficiency:
|
||||||
the byte sequences are relatively long.
|
the byte sequences are relatively long.
|
||||||
|
|
||||||
Numbers 1, 2 and 4 are self-describing: the format includes a definition of its
|
All bar 5 and 6 are are self-describing: the format includes a definition of
|
||||||
structure. This means that the decoding process can re-create the object in the
|
its structure. This means that the decoding process can re-create the object in
|
||||||
absence of information on its structure, which may therefore change at runtime.
|
the absence of information on its structure, which may therefore change at
|
||||||
Self describing formats inevitably are variable length. This is no problem
|
runtime. Self describing formats inevitably are variable length. This is no
|
||||||
where data is being saved to file, but if it is being communicated across a
|
problem where data is being saved to file, but if it is being communicated
|
||||||
link the receiving process needs a means to determine when a complete message
|
across a link the receiving process needs a means to determine when a complete
|
||||||
has been received. In the case of ASCII formats a delimiter may be used but in
|
message has been received. In the case of ASCII formats a delimiter may be used
|
||||||
the case of `MessagePack` this presents something of a challenge.
|
but in the cases of `MessagePack` and `CBOR` this presents something of a
|
||||||
|
challenge.
|
||||||
|
|
||||||
The `ustruct` format is binary: the byte sequence comprises binary data which
|
The `struct` format is binary: the byte sequence comprises binary data which
|
||||||
is neither human readable nor self-describing. The problem of message framing
|
is neither human readable nor self-describing. The problem of message framing
|
||||||
is solved by hard coding a fixed message structure and length which is known to
|
is solved by hard coding a fixed message structure and length which is known to
|
||||||
transmitter and receiver. In simple cases of fixed format data, `ustruct`
|
transmitter and receiver. In simple cases of fixed format data, `struct`
|
||||||
provides a simple, efficient solution.
|
provides a simple, efficient solution.
|
||||||
|
|
||||||
In `protobuf` and `MessagePack` messages are variable length; both can handle
|
In `protobuf`, `CBOR` and `MessagePack` messages are variable length; all can
|
||||||
data whose length varies at runtime. `MessagePack` also allows the message
|
handle data whose length varies at runtime. `MessagePack` and `CBOR` allow the
|
||||||
structure to change at runtime. It is also extensible to enable the efficient
|
message structure to change at runtime. They are also extensible to enable the
|
||||||
coding of additional Python types or instances of user defined classes.
|
efficient coding of additional Python types or instances of user defined classes.
|
||||||
|
|
||||||
The `protobuf` standard requires transmitter and receiver to share a schema
|
The `protobuf` standard requires transmitter and receiver to share a schema
|
||||||
which defines the message structure. Message length may change at runtime, but
|
which defines the message structure. Message length may change at runtime, but
|
||||||
structure may not.
|
structure may not.
|
||||||
|
|
||||||
There has been some discussion of supporting [CBOR](https://cbor.io/). There is a
|
Uniquely `CBOR` has an ability to accept data objects whose size is initially
|
||||||
MicroPython library [here](https://github.com/onetonfoot/micropython-cbor). This
|
unknown. An encoder can receive a declaration that an array is to follow, then a
|
||||||
is a binary format with a focus on minimising message length. I have not yet had
|
sequence of elements. A terminator signals the end of the array. This
|
||||||
time to study this.
|
functionality is not provided in the
|
||||||
|
[MicroPython implementation](https://github.com/alexmrqt/micropython-cbor/tree/master).
|
||||||
|
Implementing such support would require a Python API to be defined.
|
||||||
|
|
||||||
## 1.1 Transmission over unreliable links
|
## 1.1 Protocol References
|
||||||
|
|
||||||
|
[MessagePack](https://github.com/msgpack/msgpack/tree/master)
|
||||||
|
[CBOR](https://cbor.io/)
|
||||||
|
[CBOR spec](https://www.rfc-editor.org/rfc/rfc8949.html)
|
||||||
|
|
||||||
|
## 1.2 Transmission over unreliable links
|
||||||
|
|
||||||
Consider a system where a transmitter periodically sends messages to a receiver
|
Consider a system where a transmitter periodically sends messages to a receiver
|
||||||
over a communication link. An aspect of the message framing problem arises if
|
over a communication link. An aspect of the message framing problem arises if
|
||||||
|
@ -80,7 +92,7 @@ continuous stream of data. In the case of regular bursts of data a timeout can
|
||||||
be used. Otherwise "out of band" signalling is required where the receiver
|
be used. Otherwise "out of band" signalling is required where the receiver
|
||||||
signals the transmitter to request retransmission.
|
signals the transmitter to request retransmission.
|
||||||
|
|
||||||
## 1.2 Concurrency
|
## 1.3 Concurrency
|
||||||
|
|
||||||
In `asyncio` systems the transmitter presents no problem. A message is created
|
In `asyncio` systems the transmitter presents no problem. A message is created
|
||||||
using synchronous code, then transmitted using asynchronous code typically with
|
using synchronous code, then transmitted using asynchronous code typically with
|
||||||
|
@ -90,19 +102,19 @@ is appended.
|
||||||
In the case of ASCII protocols the receiver can use `StreamReader.readline()`
|
In the case of ASCII protocols the receiver can use `StreamReader.readline()`
|
||||||
to await a complete message.
|
to await a complete message.
|
||||||
|
|
||||||
`ustruct` also presents a simple case in that the number of expected bytes is
|
`struct` also presents a simple case in that the number of expected bytes is
|
||||||
known to the receiver which simply awaits that number.
|
known to the receiver which simply awaits that number.
|
||||||
|
|
||||||
The variable length binary protocols present a difficulty in that the message
|
The variable length binary protocols present a difficulty in that the message
|
||||||
length is unknown in advance. A solution is available for `MessagePack`.
|
length is unknown in advance. A solution is available for `MessagePack`.
|
||||||
|
|
||||||
# 2. ujson and pickle
|
# 2. json and pickle
|
||||||
|
|
||||||
These are very similar. `ujson` is documented
|
These are very similar. `json` is documented
|
||||||
[here](http://docs.micropython.org/en/latest/library/ujson.html). `pickle` has
|
[here](http://docs.micropython.org/en/latest/library/json.html). `pickle` has
|
||||||
identical methods so this doc may be used for both.
|
identical methods so this doc may be used for both.
|
||||||
|
|
||||||
The advantage of `ujson` is that JSON strings can be accepted by CPython and by
|
The advantage of `json` is that JSON strings can be accepted by CPython and by
|
||||||
other languages. The drawback is that only a subset of Python object types can
|
other languages. The drawback is that only a subset of Python object types can
|
||||||
be converted to legal JSON strings; this is a limitation of the
|
be converted to legal JSON strings; this is a limitation of the
|
||||||
[JSON specification](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf).
|
[JSON specification](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf).
|
||||||
|
@ -114,7 +126,9 @@ The strings produced are incompatible with CPython's `pickle`, but can be
|
||||||
decoded in CPython by using the MicroPython decoder. There is a
|
decoded in CPython by using the MicroPython decoder. There is a
|
||||||
[bug](https://github.com/micropython/micropython/issues/2280) in the
|
[bug](https://github.com/micropython/micropython/issues/2280) in the
|
||||||
MicroPython implementation when running under MicroPython. A workround consists
|
MicroPython implementation when running under MicroPython. A workround consists
|
||||||
of never encoding short strings which change repeatedly.
|
of never encoding short strings which change repeatedly. The official MicroPython
|
||||||
|
library achieves its simplicity by invoking the compiler at runtime. This is
|
||||||
|
costly in RAM.
|
||||||
|
|
||||||
## 2.1 Usage examples
|
## 2.1 Usage examples
|
||||||
|
|
||||||
|
@ -130,11 +144,11 @@ print('Decoded data (partial):', v[3])
|
||||||
```
|
```
|
||||||
JSON. Note that dictionary keys must be strings:
|
JSON. Note that dictionary keys must be strings:
|
||||||
```python
|
```python
|
||||||
import ujson
|
import json
|
||||||
data = {'1':'test', '2':1.414, '3': [11, 12, 13]}
|
data = {'1':'test', '2':1.414, '3': [11, 12, 13]}
|
||||||
s = ujson.dumps(data)
|
s = json.dumps(data)
|
||||||
print('Human readable data:', s)
|
print('Human readable data:', s)
|
||||||
v = ujson.loads(s)
|
v = json.loads(s)
|
||||||
print('Decoded data (partial):', v['3'])
|
print('Decoded data (partial):', v['3'])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -142,15 +156,15 @@ print('Decoded data (partial):', v['3'])
|
||||||
|
|
||||||
In real applications the data, and hence the string length, vary at runtime.
|
In real applications the data, and hence the string length, vary at runtime.
|
||||||
The receiving process needs to know when a complete string has been received or
|
The receiving process needs to know when a complete string has been received or
|
||||||
read from a file. In practice `ujson` and `pickle` do not include newline
|
read from a file. In practice `json` and `pickle` do not include newline
|
||||||
characters in encoded strings. If the data being encoded includes a newline, it
|
characters in encoded strings. If the data being encoded includes a newline, it
|
||||||
is escaped in the string:
|
is escaped in the string:
|
||||||
```python
|
```python
|
||||||
import ujson
|
import json
|
||||||
data = {'1':b'test\nmore', '2':1.414, '3': [11, 12, 13]}
|
data = {'1':b'test\nmore', '2':1.414, '3': [11, 12, 13]}
|
||||||
s = ujson.dumps(data)
|
s = json.dumps(data)
|
||||||
print('Human readable data:', s)
|
print('Human readable data:', s)
|
||||||
v = ujson.loads(s)
|
v = json.loads(s)
|
||||||
print('Decoded data (partial):', v['1'])
|
print('Decoded data (partial):', v['1'])
|
||||||
```
|
```
|
||||||
If this is pasted at the REPL you will observe that the human readable data
|
If this is pasted at the REPL you will observe that the human readable data
|
||||||
|
@ -165,9 +179,9 @@ reading may be done using `readline` methods as in this code fragment where
|
||||||
`u` is a UART instance:
|
`u` is a UART instance:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
s = ujson.dumps(data)
|
s = json.dumps(data)
|
||||||
# pickle produces a bytes object whereas ujson produces a string
|
# pickle produces a bytes object whereas json produces a string
|
||||||
# In the case of ujson it is probably wise to convert to bytes
|
# In the case of json it is probably wise to convert to bytes
|
||||||
u.write(s.encode())
|
u.write(s.encode())
|
||||||
# Pickle:
|
# Pickle:
|
||||||
# u.write(s)
|
# u.write(s)
|
||||||
|
@ -175,14 +189,14 @@ u.write(b'\n')
|
||||||
|
|
||||||
# receiver
|
# receiver
|
||||||
s = u.readline()
|
s = u.readline()
|
||||||
v = ujson.loads(s) # ujson can cope with bytes object
|
v = json.loads(s) # json can cope with bytes object
|
||||||
# ujson and pickle can cope with trailing newline.
|
# json and pickle can cope with trailing newline.
|
||||||
```
|
```
|
||||||
|
|
||||||
# 3. ustruct
|
# 3. struct
|
||||||
|
|
||||||
This is documented
|
This is documented
|
||||||
[here](http://docs.micropython.org/en/latest/library/ustruct.html). The binary
|
[here](http://docs.micropython.org/en/latest/library/struct.html). The binary
|
||||||
format is efficient, but the format of a sequence cannot change at runtime and
|
format is efficient, but the format of a sequence cannot change at runtime and
|
||||||
must be "known" to the decoding process. Records are of fixed length. If data
|
must be "known" to the decoding process. Records are of fixed length. If data
|
||||||
is to be stored in a binary random access file, the fixed record size means
|
is to be stored in a binary random access file, the fixed record size means
|
||||||
|
@ -190,59 +204,59 @@ that the offset of a given record may readily be calculated.
|
||||||
|
|
||||||
Write a 100 record file. Each record comprises three 32-bit integers:
|
Write a 100 record file. Each record comprises three 32-bit integers:
|
||||||
```python
|
```python
|
||||||
import ustruct
|
import struct
|
||||||
fmt = 'iii' # Record format: 3 signed ints
|
fmt = 'iii' # Record format: 3 signed ints
|
||||||
rlen = ustruct.calcsize(fmt) # Record length
|
rlen = struct.calcsize(fmt) # Record length
|
||||||
buf = bytearray(rlen)
|
buf = bytearray(rlen)
|
||||||
with open('myfile', 'wb') as f:
|
with open('myfile', 'wb') as f:
|
||||||
for x in range(100):
|
for x in range(100):
|
||||||
y = x * x
|
y = x * x
|
||||||
z = x * 10
|
z = x * 10
|
||||||
ustruct.pack_into(fmt, buf, 0, x, y, z)
|
struct.pack_into(fmt, buf, 0, x, y, z)
|
||||||
f.write(buf)
|
f.write(buf)
|
||||||
```
|
```
|
||||||
Read record no. 10 from that file:
|
Read record no. 10 from that file:
|
||||||
```python
|
```python
|
||||||
import ustruct
|
import struct
|
||||||
fmt = 'iii'
|
fmt = 'iii'
|
||||||
rlen = ustruct.calcsize(fmt) # Record length
|
rlen = struct.calcsize(fmt) # Record length
|
||||||
buf = bytearray(rlen)
|
buf = bytearray(rlen)
|
||||||
rnum = 10 # Record no.
|
rnum = 10 # Record no.
|
||||||
with open('myfile', 'rb') as f:
|
with open('myfile', 'rb') as f:
|
||||||
f.seek(rnum * rlen)
|
f.seek(rnum * rlen)
|
||||||
f.readinto(buf)
|
f.readinto(buf)
|
||||||
result = ustruct.unpack_from(fmt, buf)
|
result = struct.unpack_from(fmt, buf)
|
||||||
print(result)
|
print(result)
|
||||||
```
|
```
|
||||||
Owing to the fixed record length, integers must be constrained to fit the
|
Owing to the fixed record length, integers must be constrained to fit the
|
||||||
length declared in the format string.
|
length declared in the format string.
|
||||||
|
|
||||||
Binary formats cannot use delimiters as any delimiter character may be present
|
Binary formats cannot use delimiters as any delimiter character may be present
|
||||||
in the data - however the fixed length of `ustruct` records means that this is
|
in the data - however the fixed length of `struct` records means that this is
|
||||||
not a problem.
|
not a problem.
|
||||||
|
|
||||||
For performance oriented applications, `ustruct` is the only serialisation
|
For performance oriented applications, `struct` is the only serialisation
|
||||||
approach which can be used in a non-allocating fashion, by using pre-allocated
|
approach which can be used in a non-allocating fashion, by using pre-allocated
|
||||||
buffers as in the above example.
|
buffers as in the above example.
|
||||||
|
|
||||||
## 3.1 Strings
|
## 3.1 Strings
|
||||||
|
|
||||||
In `ustruct` the `s` data type is normally prefixed by a length (defaulting to
|
In `struct` the `s` data type is normally prefixed by a length (defaulting to
|
||||||
1). This ensures that records are of fixed size, but is potentially inefficient
|
1). This ensures that records are of fixed size, but is potentially inefficient
|
||||||
as shorter strings will still occupy the same amount of space. Longer strings
|
as shorter strings will still occupy the same amount of space. Longer strings
|
||||||
will silently be truncated. Short strings are packed with zeros.
|
will silently be truncated. Short strings are packed with zeros.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import ustruct
|
import struct
|
||||||
fmt = 'ii30s'
|
fmt = 'ii30s'
|
||||||
rlen = ustruct.calcsize(fmt) # Record length
|
rlen = struct.calcsize(fmt) # Record length
|
||||||
buf = bytearray(rlen)
|
buf = bytearray(rlen)
|
||||||
ustruct.pack_into(fmt, buf, 0, 11, 22, 'the quick brown fox')
|
struct.pack_into(fmt, buf, 0, 11, 22, 'the quick brown fox')
|
||||||
ustruct.unpack_from(fmt, buf)
|
struct.unpack_from(fmt, buf)
|
||||||
ustruct.pack_into(fmt, buf, 0, 11, 22, 'rats')
|
struct.pack_into(fmt, buf, 0, 11, 22, 'rats')
|
||||||
ustruct.unpack_from(fmt, buf) # Packed with zeros
|
struct.unpack_from(fmt, buf) # Packed with zeros
|
||||||
ustruct.pack_into(fmt, buf, 0, 11, 22, 'the quick brown fox jumps over the lazy dog')
|
struct.pack_into(fmt, buf, 0, 11, 22, 'the quick brown fox jumps over the lazy dog')
|
||||||
ustruct.unpack_from(fmt, buf) # Truncation
|
struct.unpack_from(fmt, buf) # Truncation
|
||||||
```
|
```
|
||||||
Output:
|
Output:
|
||||||
```python
|
```python
|
||||||
|
@ -254,8 +268,8 @@ Output:
|
||||||
# 4. MessagePack
|
# 4. MessagePack
|
||||||
|
|
||||||
Of the binary formats this is the easiest to use and can be a "drop in"
|
Of the binary formats this is the easiest to use and can be a "drop in"
|
||||||
replacement for `ujson` as it supports the same four methods `dump`, `dumps`,
|
replacement for `json` as it supports the same four methods `dump`, `dumps`,
|
||||||
`load` and `loads`. An application might initially be developed with `ujson`,
|
`load` and `loads`. An application might initially be developed with `json`,
|
||||||
the protocol being changed to `MessagePack` later. Creation of a `MessagePack`
|
the protocol being changed to `MessagePack` later. Creation of a `MessagePack`
|
||||||
string can be done with:
|
string can be done with:
|
||||||
```python
|
```python
|
||||||
|
|
Ładowanie…
Reference in New Issue