Added address encoding definition.

2019-12-08 12:22:25 -08:00 · 2019-12-08 12:22:25 -08:00 · 673eefd8c7
commit 673eefd8c7
--- a/M17-Protocol.md
+++ b/M17-Protocol.md
@ -41,8 +41,8 @@ The M17 Packet format borrows heavily from Ethernet, except the Preamble and Syn
  * **TODO** Something long and arbitrary, like e or Pi.
 * Packet Indicator: 1 byte
  * A value to indicate this is a Packet, not a Stream.
-* Destination address: 6 bytes
-* Source address: 6 bytes
+* Destination address: 6 bytes  (See below for address encoding.)
+* Source address: 6 bytes  (See below for address encoding.)
 * Length: 2 bytes (Number of bytes in payload)
 * Payload: N bytes
 * CRC: 4 bytes (32-bit CRC of the entire frame, not including the preamble or sync byte. Includes Destination, Source, Lengh, and Payload.)
@ -89,3 +89,119 @@ CODEC2 3200 needs to send a 64 bit, 4 byte, CODEC frame every 20ms.  Stream fram

 ### File Transfer Stream

+# Address Encoding
+M17 addresses are 48 bits, 6 bytes long.  Callsigns (and other addresses) are encoded into these 6 bytes in the following ways:
+* An address of 0 is invalid.
+  * **TODO** Do we want to use zero as a flag value of some kind?
+* Address values between 1 and 262143999999999 (which is (40^9)-1), up to 9 characters of text are encoded using base40, described below.
+* Address values between 262144000000000 (40^9) and 281474976710655 ((2^48)-1) are invalid
+  * **TODO** Can we think of something to do with these 19330976710655 addresses?
+* An address of 0xFFFFFFFFFFFF is a broadcast.  All stations should receive and listen to this message.
+
+## Callsign Encoding: base40
+9 characters from an alphabet of 40 possible characters can be encoded into 48 bits, 6 bytes.  The base40 alphabet is:
+* 0: An invalid character, something not in the alphabet was provided.
+* 1-26: 'A' through 'Z'
+* 27-36: '0' through '9'
+* 37: '-'
+* 38: '/'
+* 39: TBD
+
+Encoding is little endian.  That is, the right most characters in the encoded string are the most significant bits in the resulting encoding.
+
+### Example code: encode_base40() 
+```
+uint64_t encode_callsign_base40(const char *callsign) {
+   uint64_t encoded = 0;
+   for (const char *p = (callsign + strlen(callsign) - 1); p >= callsign; p-- ) {
+      encoded *= 40;
+      // If speed is more important than code space, you can replace this with a lookup into a 256 byte array.
+      if (*p >= 'A' && *p <= 'Z')  // 1-26
+         encoded += *p - 'A' + 1;
+      else if (*p >= '0' && *p <= '9')  // 27-36
+         encoded += *p - '0' + 27;
+      else if (*p == '-')  // 37
+         encoded += 37;
+      // These are just place holders. If other characters make more sense, change these.
+      // Be sure to change them in the decode array below too.
+      else if (*p == '/')  // 38
+         encoded += 38;
+      else if (*p == '.')  // 39
+         encoded += 39;
+      else
+         // Invalid character, represented by 0.
+         //encoded += 0;
+         ;
+   }
+   return encoded;
+}
+```
+### Example code: decode_base40()
+```
+char *decode_callsign_base40(uint64_t encoded, char *callsign) {
+   if (encoded >= 262144000000000) {   // 40^9
+      *callsign = 0;
+      return callsign;
+   }
+
+   char *p = callsign;
+   for (; encoded > 0; p++) {
+      *p = "xABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/."[encoded % 40];
+      encoded /= 40;
+   }
+   *p = 0;
+   return callsign;
+}
+```
+
+### Why base40?
+The longest commonly assigned callsign from the FCC is 6 characters. The minimum alphabet of A-Z, 0-9, and a "done" character mean the most compact encoding of an American callsign could be: log2(37^6)=31.26 bits, or 4 bytes.
+
+But I'm not convinced that 6 character is a global maximum.  Also, we want to extend our callsigns (see below).  So we want more than 6 characters.  How many bits do we need to represent more characters:
+* 7 characters: log2(37^7)=36.47 bits, 5 bytes
+* 8 characters: log2(37^8)=41.67 bits, 6 bytes
+* 9 characters: log2(37^9)=46.89 bits, 6 bytes
+* 10 characters: log2(37^10)=52.09 bits, 7 bytes.
+
+Of these, 9 characters into 6 bytes seems the sweet spot.  Given 9 characters, how large can we make the alphabet without using more than 6 bytes?
+* 37 alphabet: log2(37^9)=46.89 bits, 6 bytes
+* 38 alphabet: log2(38^9)=47.23 bits, 6 bytes
+* 39 alphabet: log2(39^9)=47.57 bits, 6 bytes
+* 40 alphabet: log2(40^9)=47.90 bits, 6 bytes
+* 41 alphabet: log2(41^9)=48.22 bits, 7 bytes
+
+Given this, 9 characters from an alphabet of 40 possible characters, makes maximal use of 6 bytes.
+
+## Callsign Formats
+Government issued callsigns should be able to encode directly with no changes.
+
+### Multiple Stations
+To allow for multiple stations by the same operator, we borrow the use of the '-' character from AX.25.  A callsign  such as "KR6ZY-1" is considered a different station than "KR6ZY-2" or even "KR6ZY", but it is understood that these all belong to the same operator, "KR6ZY."
+
+### Temporary Modifiers
+Similarly, suffixes are often added to callsign to indicate temporary changes of status, such as "KR6ZY/M" for a mobile station, or "KR6ZY/AE" to signify that I have Amateur Extra operating privileges even though the FCC database may not yet be updated.  So the '/' is included in the base40 alphabet.
+
+The difference between '-' and '/' is that '-' are considered different stations, but '/' are NOT.  They are considered to be a temporary modification to the same station.  **TODO** I'm not sure what impact this actually has.
+
+### Interoperability
+It may be desirable to bridge information between M17 and other networks.  The 9 character base40 encoding allows for this:
+
+**TODO** Define more interoperability standards here.  System Fusion? P25? IRLP? AllStar?
+
+#### DMR
+DMR unfortunately doesn't have a guaranteed single name space.  Individual IDs are reasonably well recognized to be managed by https://www.radioid.net/database/search#! but Talk Groups are much less well managed.  Talk Group XYZ on Brandmeister may be (and often is) different than Talk Group XYZ on a private cBridge system.
+
+* DMR IDs are encoded as: `D<number>`  eg: `D3106728` for KR6ZY
+* DMR Talk Groups are encoded by their network.  Currently, the following networks are defined:
+  * Brandmeister: `BM<number>`  eg:  `BM31075`
+  * More networks to be defined here.
+
+#### D-Star
+D-Star reflectors have well defined names: REFxxxY which are encoded directly into base40.
+
+**TODO** Individuals?  Just callsigns?
+
+
+#### Interoperability Challenges
+* We'll need to provide a source ID on the other network.  Not sure how to do that, and it'll probably be unique for each network we want to interoperate with.  Maybe write the DMR/BM gateway to automatically lookup a callsign in the DMR database and map it to a DMR ID?  Just thinking out loud.
+* We will have to transcode CODEC2 to whatever the other network uses (pretty much AMBE of one flavor or another.)  I'd be curious to see how that sounds.