Serialisation: add reference to MessagePack.

2021-07-29 11:53:52 +01:00 · 2021-07-29 11:53:52 +01:00 · a0cef58a8b
commit a0cef58a8b
--- a/SERIALISATION.md
+++ b/SERIALISATION.md
@ -16,32 +16,79 @@ I2C or SPI. All these require the data to be presented as linear sequences of
 bytes. The problem is how to convert an arbitrary Python object to such a
 sequence, and how subsequently to restore the object.
-I am aware of four ways of achieving this, each with their own advantages and
+There are numerous standards for achieving this, five of which are readily
-drawbacks. In two cases the encoded strings comprise ASCII characters, in the
+available to MicroPython. Each has its own advantages and drawbacks. In two
-other two they are binary (bytes can take all possible values).
+cases the encoded strings aim to be human readable and comprise ASCII
 characters. In the others they comprise binary `bytes` objects where bytes can
 take all possible values. The following are the formats with MicroPython
 support:
 1. ujson (ASCII, official)
 2. pickle (ASCII, official)
 3. ustruct (binary, official)
- 4. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
+ 4. MessagePack [binary, unofficial](https://github.com/peterhinch/micropython-msgpack)
 5. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
-The first two are self-describing: the format includes a definition of its
+The `ujson` and `pickle` formats produce human-readable byte sequences. These
 aid debugging. The use of ASCII data means that a delimiter can be used to
 identify the end of a message. This is because it is possible to guarantee that
 the delimiter will never occur within a message. A delimiter cannot be used
 with binary formats because a message byte can take all possible values
 including that of the delimiter. The drawback of ASCII formats is inefficiency:
 the byte sequences are relatively long.
 Numbers 1, 2 and 4 are self-describing: the format includes a definition of its
 structure. This means that the decoding process can re-create the object in the
 absence of information on its structure, which may therefore change at runtime.
-Further, `ujson` and `pickle` produce human-readable byte sequences which aid
+Self describing formats inevitably are variable length. This means that the
-debugging. The drawback is inefficiency: the byte sequences are relatively
+receiving process must be provided with a means to determine when a complete
-long. They are variable length. This means that the receiving process must be
+message has been received. In the case of ASCII formats a delimiter may be used
-provided with a means to determine when a complete string has been received.
+but in the case of `MessagePack` this presents something of a challenge.
-The `ustruct` and `protobuf` solutions are binary formats: the byte sequences
+The `ustruct` format is binary: the byte sequence comprises binary data which
-comprise binary data which is neither human readable nor self-describing.
+is neither human readable nor self-describing. The problem of message framing
-Binary sequences require that the receiver has information on their structure
+is solved by hard coding a fixed message structure and length which is known to
-in order to decode them. In the case of `ustruct` sequences are of a fixed
+transmitter and receiver. In simple cases of fixed format data, `ustruct`
-length which can be determined from the structure. `protobuf` sequences are
+provides a simple, efficient solution.
 variable length requiring handling discussed below.
-The benefit of binary sequences is efficiency: sequence length is closer to the
+In `protobuf` and `MessagePack` messages are variable length; both can handle
-information-theoretic minimum, compared to the ASCII options.
+data whose length varies at runtime. `MessagePack` also allows the message
 structure to change at runtime. It is also extensible to enable the efficient
 coding of additional Python types or instances of user defined classes.
 The `protobuf` standard requires transmitter and receiver to share a schema
 which defines the message structure. Message length may change at runtime, but
 structure may not.
 ## 1.1 Transmission over unreliable links
 Consider a system where a transmitter periodically sends messages to a receiver
 over a communication link. An aspect of the message framing problem arises if
 that link is unreliable, meaning that bytes may be lost or corrupted in
 transit. In the case of ASCII formats with a delimiter the receiver, once it
 has detected the problem, can discard characters until the delimiter is
 received and then wait for a complete message.
 In the case of binary formats it is generally impossible to re-synchronise to a
 continuous stream of data. In the case of regular bursts of data a timeout can
 be used. Otherwise "out of band" signalling is required where the receiver
 signals the transmitter to request retransmission.
 ## 1.2 Concurrency
 In `uasyncio` systems the transmitter presents no problem. A message is created
 using synchronous code, then transmitted using asynchronous code typically with
 a `StreamWriter`. In the case of ASCII protocols a delimiter - usually `b"\n"`
 is appended.
 In the case of ASCII protocols the receiver can use `StreamReader.readline()`
 to await a complete message.
 `ustruct` also presents a simple case in that the number of expected bytes is
 known to the receiver which simply awaits that number.
 The variable length binary protocols present a difficulty in that the message
 length is unknown in advance. A solution is available for `MessagePack`.
 # 2. ujson and pickle
@ -198,7 +245,47 @@ Output:
 (11, 22, b'the quick brown fox jumps over')
 ```
-# 4. Protocol Buffers
+# 4. MessagePack
 Of the binary formats this is the easiest to use and can be a "drop in"
 replacement for `ujson` as it supports the same four methods `dump`, `dumps`,
 `load` and `loads`. An application might initially be developed with `ujson`,
 the protocol being changed to `MessagePack` later. Creation of a `MessagePack`
 string can be done with:
 ```python
 import umsgpack
 obj = [1.23, 2.56, 89000]
 msg = umsgpack.dumps(obj)  # msg is a bytes object 
 ```
 Retrieval of the object is as follows:
 ```python
 import umsgpack
 # Retrieve the message msg
 obj = umsgpack.dumps(msg)
 ```
 An ingenious feature of the standard is its extensibility. This can be used to
 add support for additional Python types or user defined classes. This example
 shows `complex` data being supported as if it were a native type:
 ```python
 import umsgpack
 from umsgpack_ext import mpext
 with open('data', 'wb') as f:
   umsgpack.dump(mpext(1 + 4j), f)  # mpext() handles extension type
 ```
 Reading back:
 ```python
 import umsgpack
 import umsgpack_ext  # Decoder only needs access to this module
 with open('data', 'rb') as f:
    z = umsgpack.load(f)
 print(z)  # z is complex
 ```
 Please see [this repo](https://github.com/peterhinch/micropython-msgpack). The
 docs include references to the standard and to other implementations. The repo
 includes an asynchronous receiver which enables incoming messages to be decoded
 as they arrive while allowing other tasks to run concurrently.
 # 5. Protocol Buffers
 This is a [Google standard](https://developers.google.com/protocol-buffers/)
 described in [this Wikipedia article](https://en.wikipedia.org/wiki/Protocol_Buffers).
@ -230,7 +317,7 @@ inner `tuple` are strings, with element 0 defining the field's key. Subsequent
 elements define the field's data type; in most cases the data type is defined
 by a single string.
-## 4.1 Installation
+## 5.1 Installation
 The library comprises a single file `minipb.py`. It has a dependency, the
 `logging` module `logging.py` which may be found in
@ -238,7 +325,7 @@ The library comprises a single file `minipb.py`. It has a dependency, the
 On RAM constrained platforms `minipb.py` may be cross-compiled or frozen as
 bytecode for even lower RAM consumption.
-## 4.2 Data types
+## 5.2 Data types
 These are listed in
 [the docs](https://github.com/dogtopus/minipb/wiki/Schema-Representations).
@ -256,14 +343,14 @@ a subset may be used which maps onto Python data types:
 other platforms with special firmware builds.
 7. 'X' An empty field.
-## 4.2.1 Required and Optional fields
+## 5.2.1 Required and Optional fields
 If a field is prefixed with `*` it is a `required` field, otherwise it is
 optional. The field must still exist in the data: the only difference is that
 a `required` field cannot be set to `None`. Optional fields can be useful,
 notably for boolean types which can then represent three states.
-## 4.3 Application design
+## 5.3 Application design
 The following is a minimal example which can be pasted at the REPL:
 ```python
@ -287,7 +374,7 @@ being saved to a binary file, the file will need an index. Where data is to
 be transmitted over and interface each string should be prepended with a fixed
 length "size" field. The following example illustrates this.
-## 4.4 Transmitter/Receiver example
+## 5.4 Transmitter/Receiver example
 These examples can't be cut and pasted at the REPL as they assume `send(n)` and
 `receive(n)` functions which access the interface.
@ -329,7 +416,7 @@ while True:
    # Do something with the received dict
 ```
-## 4.5 Repeating fields
+## 5.5 Repeating fields
 This feature enables variable length lists to be encoded. List elements must
 all be of the same (declared) data type. In this example the `value` and `txt`
@ -357,13 +444,13 @@ tx = w.encode(data)
 rx = w.decode(tx)
 print(rx)
 ```
-### 4.5.1 Packed repeating fields
+### 5.5.1 Packed repeating fields
 The author of `minipb` [does not recommend](https://github.com/dogtopus/minipb/issues/6)
 their use. Their purpose appears to be in the context of fixed-length fields
 which are outside the scope of pure Python programming.
-## 4.6 Message fields (nested dicts)
+## 5.6 Message fields (nested dicts)
 The concept of message fields is a Protocol Buffer notion. In MicroPython
 terminology a message field contains a `dict` whose contents are defined by
@ -404,7 +491,7 @@ print(rx)
 print(rx['nested'][2]['str2'])  # Access inner dict instances
 ```
-### 4.6.1 Recursion
+### 5.6.1 Recursion
 This is surely overkill in most MicroPython applications, but for the sake of
 completeness message fields can be recursive: