kopia lustrzana https://github.com/peterhinch/micropython-samples
Serialisation: add reference to MessagePack.
rodzic
1f3ee31cc6
commit
a0cef58a8b
141
SERIALISATION.md
141
SERIALISATION.md
|
@ -16,32 +16,79 @@ I2C or SPI. All these require the data to be presented as linear sequences of
|
||||||
bytes. The problem is how to convert an arbitrary Python object to such a
|
bytes. The problem is how to convert an arbitrary Python object to such a
|
||||||
sequence, and how subsequently to restore the object.
|
sequence, and how subsequently to restore the object.
|
||||||
|
|
||||||
I am aware of four ways of achieving this, each with their own advantages and
|
There are numerous standards for achieving this, five of which are readily
|
||||||
drawbacks. In two cases the encoded strings comprise ASCII characters, in the
|
available to MicroPython. Each has its own advantages and drawbacks. In two
|
||||||
other two they are binary (bytes can take all possible values).
|
cases the encoded strings aim to be human readable and comprise ASCII
|
||||||
|
characters. In the others they comprise binary `bytes` objects where bytes can
|
||||||
|
take all possible values. The following are the formats with MicroPython
|
||||||
|
support:
|
||||||
|
|
||||||
1. ujson (ASCII, official)
|
1. ujson (ASCII, official)
|
||||||
2. pickle (ASCII, official)
|
2. pickle (ASCII, official)
|
||||||
3. ustruct (binary, official)
|
3. ustruct (binary, official)
|
||||||
4. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
|
4. MessagePack [binary, unofficial](https://github.com/peterhinch/micropython-msgpack)
|
||||||
|
5. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
|
||||||
|
|
||||||
The first two are self-describing: the format includes a definition of its
|
The `ujson` and `pickle` formats produce human-readable byte sequences. These
|
||||||
|
aid debugging. The use of ASCII data means that a delimiter can be used to
|
||||||
|
identify the end of a message. This is because it is possible to guarantee that
|
||||||
|
the delimiter will never occur within a message. A delimiter cannot be used
|
||||||
|
with binary formats because a message byte can take all possible values
|
||||||
|
including that of the delimiter. The drawback of ASCII formats is inefficiency:
|
||||||
|
the byte sequences are relatively long.
|
||||||
|
|
||||||
|
Numbers 1, 2 and 4 are self-describing: the format includes a definition of its
|
||||||
structure. This means that the decoding process can re-create the object in the
|
structure. This means that the decoding process can re-create the object in the
|
||||||
absence of information on its structure, which may therefore change at runtime.
|
absence of information on its structure, which may therefore change at runtime.
|
||||||
Further, `ujson` and `pickle` produce human-readable byte sequences which aid
|
Self describing formats inevitably are variable length. This means that the
|
||||||
debugging. The drawback is inefficiency: the byte sequences are relatively
|
receiving process must be provided with a means to determine when a complete
|
||||||
long. They are variable length. This means that the receiving process must be
|
message has been received. In the case of ASCII formats a delimiter may be used
|
||||||
provided with a means to determine when a complete string has been received.
|
but in the case of `MessagePack` this presents something of a challenge.
|
||||||
|
|
||||||
The `ustruct` and `protobuf` solutions are binary formats: the byte sequences
|
The `ustruct` format is binary: the byte sequence comprises binary data which
|
||||||
comprise binary data which is neither human readable nor self-describing.
|
is neither human readable nor self-describing. The problem of message framing
|
||||||
Binary sequences require that the receiver has information on their structure
|
is solved by hard coding a fixed message structure and length which is known to
|
||||||
in order to decode them. In the case of `ustruct` sequences are of a fixed
|
transmitter and receiver. In simple cases of fixed format data, `ustruct`
|
||||||
length which can be determined from the structure. `protobuf` sequences are
|
provides a simple, efficient solution.
|
||||||
variable length requiring handling discussed below.
|
|
||||||
|
|
||||||
The benefit of binary sequences is efficiency: sequence length is closer to the
|
In `protobuf` and `MessagePack` messages are variable length; both can handle
|
||||||
information-theoretic minimum, compared to the ASCII options.
|
data whose length varies at runtime. `MessagePack` also allows the message
|
||||||
|
structure to change at runtime. It is also extensible to enable the efficient
|
||||||
|
coding of additional Python types or instances of user defined classes.
|
||||||
|
|
||||||
|
The `protobuf` standard requires transmitter and receiver to share a schema
|
||||||
|
which defines the message structure. Message length may change at runtime, but
|
||||||
|
structure may not.
|
||||||
|
|
||||||
|
## 1.1 Transmission over unreliable links
|
||||||
|
|
||||||
|
Consider a system where a transmitter periodically sends messages to a receiver
|
||||||
|
over a communication link. An aspect of the message framing problem arises if
|
||||||
|
that link is unreliable, meaning that bytes may be lost or corrupted in
|
||||||
|
transit. In the case of ASCII formats with a delimiter the receiver, once it
|
||||||
|
has detected the problem, can discard characters until the delimiter is
|
||||||
|
received and then wait for a complete message.
|
||||||
|
|
||||||
|
In the case of binary formats it is generally impossible to re-synchronise to a
|
||||||
|
continuous stream of data. In the case of regular bursts of data a timeout can
|
||||||
|
be used. Otherwise "out of band" signalling is required where the receiver
|
||||||
|
signals the transmitter to request retransmission.
|
||||||
|
|
||||||
|
## 1.2 Concurrency
|
||||||
|
|
||||||
|
In `uasyncio` systems the transmitter presents no problem. A message is created
|
||||||
|
using synchronous code, then transmitted using asynchronous code typically with
|
||||||
|
a `StreamWriter`. In the case of ASCII protocols a delimiter - usually `b"\n"`
|
||||||
|
is appended.
|
||||||
|
|
||||||
|
In the case of ASCII protocols the receiver can use `StreamReader.readline()`
|
||||||
|
to await a complete message.
|
||||||
|
|
||||||
|
`ustruct` also presents a simple case in that the number of expected bytes is
|
||||||
|
known to the receiver which simply awaits that number.
|
||||||
|
|
||||||
|
The variable length binary protocols present a difficulty in that the message
|
||||||
|
length is unknown in advance. A solution is available for `MessagePack`.
|
||||||
|
|
||||||
# 2. ujson and pickle
|
# 2. ujson and pickle
|
||||||
|
|
||||||
|
@ -198,7 +245,47 @@ Output:
|
||||||
(11, 22, b'the quick brown fox jumps over')
|
(11, 22, b'the quick brown fox jumps over')
|
||||||
```
|
```
|
||||||
|
|
||||||
# 4. Protocol Buffers
|
# 4. MessagePack
|
||||||
|
|
||||||
|
Of the binary formats this is the easiest to use and can be a "drop in"
|
||||||
|
replacement for `ujson` as it supports the same four methods `dump`, `dumps`,
|
||||||
|
`load` and `loads`. An application might initially be developed with `ujson`,
|
||||||
|
the protocol being changed to `MessagePack` later. Creation of a `MessagePack`
|
||||||
|
string can be done with:
|
||||||
|
```python
|
||||||
|
import umsgpack
|
||||||
|
obj = [1.23, 2.56, 89000]
|
||||||
|
msg = umsgpack.dumps(obj) # msg is a bytes object
|
||||||
|
```
|
||||||
|
Retrieval of the object is as follows:
|
||||||
|
```python
|
||||||
|
import umsgpack
|
||||||
|
# Retrieve the message msg
|
||||||
|
obj = umsgpack.dumps(msg)
|
||||||
|
```
|
||||||
|
An ingenious feature of the standard is its extensibility. This can be used to
|
||||||
|
add support for additional Python types or user defined classes. This example
|
||||||
|
shows `complex` data being supported as if it were a native type:
|
||||||
|
```python
|
||||||
|
import umsgpack
|
||||||
|
from umsgpack_ext import mpext
|
||||||
|
with open('data', 'wb') as f:
|
||||||
|
umsgpack.dump(mpext(1 + 4j), f) # mpext() handles extension type
|
||||||
|
```
|
||||||
|
Reading back:
|
||||||
|
```python
|
||||||
|
import umsgpack
|
||||||
|
import umsgpack_ext # Decoder only needs access to this module
|
||||||
|
with open('data', 'rb') as f:
|
||||||
|
z = umsgpack.load(f)
|
||||||
|
print(z) # z is complex
|
||||||
|
```
|
||||||
|
Please see [this repo](https://github.com/peterhinch/micropython-msgpack). The
|
||||||
|
docs include references to the standard and to other implementations. The repo
|
||||||
|
includes an asynchronous receiver which enables incoming messages to be decoded
|
||||||
|
as they arrive while allowing other tasks to run concurrently.
|
||||||
|
|
||||||
|
# 5. Protocol Buffers
|
||||||
|
|
||||||
This is a [Google standard](https://developers.google.com/protocol-buffers/)
|
This is a [Google standard](https://developers.google.com/protocol-buffers/)
|
||||||
described in [this Wikipedia article](https://en.wikipedia.org/wiki/Protocol_Buffers).
|
described in [this Wikipedia article](https://en.wikipedia.org/wiki/Protocol_Buffers).
|
||||||
|
@ -230,7 +317,7 @@ inner `tuple` are strings, with element 0 defining the field's key. Subsequent
|
||||||
elements define the field's data type; in most cases the data type is defined
|
elements define the field's data type; in most cases the data type is defined
|
||||||
by a single string.
|
by a single string.
|
||||||
|
|
||||||
## 4.1 Installation
|
## 5.1 Installation
|
||||||
|
|
||||||
The library comprises a single file `minipb.py`. It has a dependency, the
|
The library comprises a single file `minipb.py`. It has a dependency, the
|
||||||
`logging` module `logging.py` which may be found in
|
`logging` module `logging.py` which may be found in
|
||||||
|
@ -238,7 +325,7 @@ The library comprises a single file `minipb.py`. It has a dependency, the
|
||||||
On RAM constrained platforms `minipb.py` may be cross-compiled or frozen as
|
On RAM constrained platforms `minipb.py` may be cross-compiled or frozen as
|
||||||
bytecode for even lower RAM consumption.
|
bytecode for even lower RAM consumption.
|
||||||
|
|
||||||
## 4.2 Data types
|
## 5.2 Data types
|
||||||
|
|
||||||
These are listed in
|
These are listed in
|
||||||
[the docs](https://github.com/dogtopus/minipb/wiki/Schema-Representations).
|
[the docs](https://github.com/dogtopus/minipb/wiki/Schema-Representations).
|
||||||
|
@ -256,14 +343,14 @@ a subset may be used which maps onto Python data types:
|
||||||
other platforms with special firmware builds.
|
other platforms with special firmware builds.
|
||||||
7. 'X' An empty field.
|
7. 'X' An empty field.
|
||||||
|
|
||||||
## 4.2.1 Required and Optional fields
|
## 5.2.1 Required and Optional fields
|
||||||
|
|
||||||
If a field is prefixed with `*` it is a `required` field, otherwise it is
|
If a field is prefixed with `*` it is a `required` field, otherwise it is
|
||||||
optional. The field must still exist in the data: the only difference is that
|
optional. The field must still exist in the data: the only difference is that
|
||||||
a `required` field cannot be set to `None`. Optional fields can be useful,
|
a `required` field cannot be set to `None`. Optional fields can be useful,
|
||||||
notably for boolean types which can then represent three states.
|
notably for boolean types which can then represent three states.
|
||||||
|
|
||||||
## 4.3 Application design
|
## 5.3 Application design
|
||||||
|
|
||||||
The following is a minimal example which can be pasted at the REPL:
|
The following is a minimal example which can be pasted at the REPL:
|
||||||
```python
|
```python
|
||||||
|
@ -287,7 +374,7 @@ being saved to a binary file, the file will need an index. Where data is to
|
||||||
be transmitted over and interface each string should be prepended with a fixed
|
be transmitted over and interface each string should be prepended with a fixed
|
||||||
length "size" field. The following example illustrates this.
|
length "size" field. The following example illustrates this.
|
||||||
|
|
||||||
## 4.4 Transmitter/Receiver example
|
## 5.4 Transmitter/Receiver example
|
||||||
|
|
||||||
These examples can't be cut and pasted at the REPL as they assume `send(n)` and
|
These examples can't be cut and pasted at the REPL as they assume `send(n)` and
|
||||||
`receive(n)` functions which access the interface.
|
`receive(n)` functions which access the interface.
|
||||||
|
@ -329,7 +416,7 @@ while True:
|
||||||
# Do something with the received dict
|
# Do something with the received dict
|
||||||
```
|
```
|
||||||
|
|
||||||
## 4.5 Repeating fields
|
## 5.5 Repeating fields
|
||||||
|
|
||||||
This feature enables variable length lists to be encoded. List elements must
|
This feature enables variable length lists to be encoded. List elements must
|
||||||
all be of the same (declared) data type. In this example the `value` and `txt`
|
all be of the same (declared) data type. In this example the `value` and `txt`
|
||||||
|
@ -357,13 +444,13 @@ tx = w.encode(data)
|
||||||
rx = w.decode(tx)
|
rx = w.decode(tx)
|
||||||
print(rx)
|
print(rx)
|
||||||
```
|
```
|
||||||
### 4.5.1 Packed repeating fields
|
### 5.5.1 Packed repeating fields
|
||||||
|
|
||||||
The author of `minipb` [does not recommend](https://github.com/dogtopus/minipb/issues/6)
|
The author of `minipb` [does not recommend](https://github.com/dogtopus/minipb/issues/6)
|
||||||
their use. Their purpose appears to be in the context of fixed-length fields
|
their use. Their purpose appears to be in the context of fixed-length fields
|
||||||
which are outside the scope of pure Python programming.
|
which are outside the scope of pure Python programming.
|
||||||
|
|
||||||
## 4.6 Message fields (nested dicts)
|
## 5.6 Message fields (nested dicts)
|
||||||
|
|
||||||
The concept of message fields is a Protocol Buffer notion. In MicroPython
|
The concept of message fields is a Protocol Buffer notion. In MicroPython
|
||||||
terminology a message field contains a `dict` whose contents are defined by
|
terminology a message field contains a `dict` whose contents are defined by
|
||||||
|
@ -404,7 +491,7 @@ print(rx)
|
||||||
print(rx['nested'][2]['str2']) # Access inner dict instances
|
print(rx['nested'][2]['str2']) # Access inner dict instances
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4.6.1 Recursion
|
### 5.6.1 Recursion
|
||||||
|
|
||||||
This is surely overkill in most MicroPython applications, but for the sake of
|
This is surely overkill in most MicroPython applications, but for the sake of
|
||||||
completeness message fields can be recursive:
|
completeness message fields can be recursive:
|
||||||
|
|
Ładowanie…
Reference in New Issue