Address Encoding Appendix overhaul, typo fixes

2022-06-19 08:01:00 -04:00 · 2022-06-19 08:01:00 -04:00 · 4869b15284
commit 4869b15284
--- a/pages/02.part-1/03.physical-layer/docs.md
+++ b/pages/02.part-1/03.physical-layer/docs.md
@ -156,7 +156,7 @@ On the receive side, symbols are converted to randomized payload bits.  Each ran

 ### End of Transmission marker (EoT)

-Every transmission ends with a distinct symbol stream, which shall consist of of 40 ms (192 symbols) of a repeating 0x55 0x5D (+3, +3, +3, +3, +3, +3, -3, +3) pattern.
+Every transmission ends with a distinct symbol stream, which shall consist of 40 ms (192 symbols) of a repeating 0x55 0x5D (+3, +3, +3, +3, +3, +3, -3, +3) pattern.

 ### Carrier-sense Multiple Access (CSMA)

--- a/pages/02.part-1/04.data-link-layer/docs.md
+++ b/pages/02.part-1/04.data-link-layer/docs.md
@ -81,7 +81,7 @@ During a [Transmission](../physical-layer#transmission), only one LSF Sync Burst

 BERT Sync Bursts, if present, may only follow the Preamble or other BERT frames.

-Multiple Stream and Packet Sync Bursts may be present during a Transmission.
+Multiple Stream or Packet Sync Bursts may be present during a Transmission, depending on the mode.

 <center><span style="font-weight:bold">Table 2</span> Frame Specific Sync Bursts</center>
 Frame Type | Preamble | Sync Burst Bytes | Sync Burst Symbols
@ -144,7 +144,7 @@ Message                  | CRC Output
 (empty string)           | 0xFFFF
 ASCII string "A"         | 0x206E
 ASCII string "123456789" | 0x772B
-Bytes 0x00 to 0xFF       | 0x1c31
+Bytes 0x00 to 0xFF       | 0x1C31

 #### LSF Contents ECC/FEC

--- a/pages/04.appendix/01.address-encoding/docs.md
+++ b/pages/04.appendix/01.address-encoding/docs.md
@ -5,138 +5,163 @@ taxonomy:
        - docs
 ---

-M17 uses 48 bits (6 bytes) long addresses. Callsigns (and other addresses) are encoded into these 6 bytes in the following ways:
+M17 uses 48-bit (6-byte) addresses. Callsigns and special purpose addresses are encoded into these 6 bytes in the following ways:

-* An address of 0 is invalid.
-* Address values between 1 and 262143999999999 (which is 409−1), up to 9 characters of text are encoded using base40, described below.
-* Address values between 262144000000000 (409) and 281474976710654 (248−2) are invalid
-* An address of 0xFFFFFFFFFFFF is a broadcast. All stations should receive and listen to this message.
+* An address of 0 is reserved for future use.
+* Address values between 1 and 262143999999999 ($40^{9}−1$), contain up to 9 characters of text encoded using base-40 as described below.
+* Address values between 262144000000000 ($40^{9}$) and 281474976710654 ($2^{48}−2$) are reserved for future use.
+* An address of 0xFFFFFFFFFFFF is a broadcast.

-##### Address Scheme
+### Address Scheme

-Address Range                   | Category  | Number of Addresses | Remarks
+Address Range (base-16)         | Category  | Number of Addresses | Remarks
 -------------                   | --------  | ------------------- | -------
 0x000000000000                  | RESERVED  | 1                   | For future use
-0x000000000001 - 0xee6b27ffffff | Unit ID   | 262143999999999     | 
-0xee6b28000000 - 0xfffffffffffe | RESERVED  | 19330976710655      | For future use
-0xffffffffffff                  | Broadcast | 1                   | Valid only for destination
+0x000000000001 - 0xEE6B27FFFFFF | Unit ID   | 262143999999999     | 
+0xEE6B28000000 - 0xFFFFFFFFFFFE | RESERVED  | 19330976710655      | For future use
+0xFFFFFFFFFFFF                  | Broadcast | 1                   | Valid only for destination

-## Callsign Encoding: base40
+### Callsign Encoding: base-40

-9 characters from an alphabet of 40 possible characters can be encoded into 48 bits, 6 bytes. The base40 alphabet is:
+9 characters from an alphabet of 40 possible characters can be encoded into 48 bits (6 bytes). The base-40 alphabet is:

-* 0: A space. Invalid characters will be replaced with this.
-* 1-26: “A” through “Z”
-* 27-36: “0” through “9”
-* 37: “-” (hyphen)
-* 38: “/” (slash)
-* 39: “.” (dot)
+Value (base-10) | Character | Note
+--------------- | --------- | ----
+0               | ' '       | A space, ASCII 32 (0x20). Invalid characters will be replaced with this.
+1 - 26          | 'A' - 'Z' | Upper case letters, ASCII 65 - 90 (0x41 - 0x5A).
+27 - 36         | '0' - '9' | Numerals, ASCII 48 - 57 (0x30 - 0x39).
+37              | '-'       | Hyphen, ASCII 45 (0x2D).
+38              | '/'       | Forward Slash, ASCII 47 (0x2F).
+39              | '.'       | Dot, ASCII 46 (0x2E).

-Encoding is little endian. That is, the right most characters in the encoded string are the most significant bits in the resulting encoding.
+When computing the base-40 value of the callsign, the left most character of the callsign is the least significant value.  Callsigns must be
+left justified. Leading spaces are not permitted.

-#### Example code: encode_base40()
+After the base-40 value is calculated, the final 6-byte address is the big endian encoded (most significant byte first) representation of the base-40 value. 

-```c
-uint64_t encode_callsign_base40(const char *callsign) {
-  uint64_t encoded = 0;
-  for (const char *p = (callsign + strlen(callsign) - 1); p >= callsign; p-- ) {
-    encoded *= 40;
-    // If speed is more important than code space,
-    // you can replace this with a lookup into a 256 byte array.
-    if (*p >= 'A' && *p <= 'Z') // 1-26
-      encoded += *p - 'A' + 1;
-    else if (*p >= '0' && *p <= '9') // 27-36
-      encoded += *p - '0' + 27;
-    else if (*p == '-') // 37
-      encoded += 37;
-    // These are just place holders. If other characters make more sense,
-    // change these. Be sure to change them in the decode array below too.
-    else if (*p == '/') // 38
-      encoded += 38;
-    else if (*p == '.') // 39
-      encoded += 39;
-    else
-      // Invalid character or space, represented by 0, decoded as a space.
-      //encoded += 0;
-  }
-  return encoded;
-}
+For example, for the callsign AB1CD, the base-40 representation would be DC1BA, and would be calculated as:
+
+('D': $4 \times 40^4$) + ('C': $3 \times 40^3$) + ('1': $28 \times 40^2$) + ('B': $2 \times 40^1$) + ('A': $1 \times 40^0$)
+
+DC1BA (base-40), 0x0000009fdd51 (base-16), 10476881 (base-10)
+
+The final address encoded into the 6-byte LSF/LICH field would be 0x0000009fdd51 
+
+#### Example Encoder
+
+```python
+def encodeM17(call):
+	"""Encode a text string into an M17 address value"""
+	
+	charMap = ' ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/.'
+
+	# convert to upper case
+	call = call.upper()
+
+	# generate an assert error if more than 9 characters long
+	assert len(call) <= 9, 'Error: <callsign> must be 9 characters or less'
+
+	if call == 'ALL':
+		# handle the special case for Broadcast
+		encoded = 0xFFFFFFFFFFFF
+	else:
+		encoded = 0
+		# loop through the characters starting from the end (right most character)
+		for c in call[::-1]:
+			# find the position of the character in the map
+			value = charMap.find(c)
+
+			# if value < 0, the character was not found
+			# invalid characters are forced to 0
+			if value < 0:
+				value = 0
+
+			# shift the current value by one base-40 character (40 decimal)
+			# and add the current value
+			encoded = encoded*40 + value
+
+	return encoded
 ```

-#### Example code: decode_base40()
+#### Example Decoder

-```c
-char *decode_callsign_base40(uint64_t encoded, char *callsign) {
-  if (encoded >= 262144000000000) { // 40^9
-    *callsign = 0;
-    return callsign;
-  }
-  char *p = callsign;
-  for (; encoded > 0; p++) {
-    *p = " ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/."[encoded % 40];
-    encoded /= 40;
-  }
-  *p = 0;
+```python
+def decodeM17(encoded):
+	"""Decode an M17 address value to a text string"""

-  return callsign;
-}
+	charMap = ' ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/.'
+
+	# check for unique values
+	if encoded == 0xFFFFFFFFFFFF:
+		# BROADCAST
+		call = 'ALL'
+	elif encoded == 0:
+		call = 'RESERVED'
+	elif encoded >= 0xEE6B28000000:
+		call = 'RESERVED'
+	else:
+		call = ''
+		while (encoded > 0):
+			call = call + charMap[encoded % 40]
+			encoded = encoded // 40
+
+	return call
 ```

-#### Why base40?
+#### Why base-40?

-The longest commonly assigned callsign from the FCC is 6 characters. The minimum alphabet of A-Z, 0-9, and a “done” character mean the most compact encoding of an American callsign could be: $log2(37^6)=31.26$ bits, or 4 bytes.  
-  
-Some countries use longer callsigns, and the US sometimes issues longer special event callsigns. Also, we want to extend our callsigns (see below). So we want more than 6 characters. How many bits do we need to represent more characters:
+##### Callsign Formats
+
+The [International Telecommunication Union (ITU)](https://www.itu.int/) coordinates radio callsign formats worldwide, with format details specified in ITU [Radio Regulations](https://www.itu.int/pub/R-REG-RR/en) Articles 19.67 through 19.69.  A very extensive [Wikipedia entry for Amateur Radio Call Signs](https://en.wikipedia.org/wiki/Amateur_radio_call_signs) includes implementation details on callsign use around the world.
+
+From the ITU Articles, the longest standard callsign may consist of up to seven characters, with longer temporary special occasion callsigns allowed.  The allowed callsign characters, or "callsign alphabet", are the 26 letters of the English alphabet ('A' through 'Z') and the ten digits ('0' through '9').
+
+##### Secondary Operating Suffixes
+
+Secondary operating suffixes are often added to callsign to indicate temporary changes of status, such as "AB1CD/M" for a mobile station, or "AB1CD/AE" to signify the station has additional operating privileges, etc. The '/' character will be included in callsign alphabet. 

 ##### Bits per Characters

-Characters | Bits                  | Bytes
---------- | ----                  | -----
-7          | $log2(37^7)=36.47$    | 5
-8          | $log2(37^8)=41.67$    | 6
-9          | $log2(37^9)=46.89$    | 6
-10         | $log2(37^{10})=52.09$ | 7
+The minimum number of allowed callsign characters in the callsign alphabet is 37 ('A' through 'Z', '0' through '9', and '/').  The following table shows how many bytes are required to encoded a callsign using an alphabet size of 37.

-Of these, 9 characters into 6 bytes seems the sweet spot. Given 9 characters, how large can we make the alphabet without using more than 6 bytes?
+Callsign Characters | Bits                  | Bytes
+------------------- | ----                  | -----
+7                   | $log_2(37^7)=36.47$    | 5
+8                   | $log_2(37^8)=41.67$    | 6
+9                   | $log_2(37^9)=46.89$    | 6
+10                  | $log_2(37^{10})=52.09$ | 7
+11                  | $log_2(37^{11})=57.30$ | 8
+12                  | $log_2(37^{12})=62.51$ | 8
+13                  | $log_2(37^{13})=67.72$ | 9
+
+Of these, 9 characters into 6 bytes, or 12 charcters into 8 bytes are the most efficient. Given that 9 callsign characters and 6 bytes should be suitable for the majority of use cases, can the callsign alphabet be increased without using more than 6 bytes?

 ##### Alphabet Size vs. Bytes

+The following table shows how many bytes are required to encode a 9 character callsign using callsign alphabet sizes of 37 through 41.
+
 Alphabet Size | Bits               | Bytes
 ------------- | ----               | -----
-37            | $log2(37^9)=46.89$ | 6
-38            | $log2(38^9)=47.23$ | 6
-39            | $log2(39^9)=47.57$ | 6
-40            | $log2(40^9)=47.90$ | 6
-41            | $log2(41^9)=48.22$ | 7
+37            | $log_2(37^9)=46.89$ | 6
+38            | $log_2(38^9)=47.23$ | 6
+39            | $log_2(39^9)=47.57$ | 6
+40            | $log_2(40^9)=47.90$ | 6
+41            | $log_2(41^9)=48.22$ | 7

-Given this, 9 characters from an alphabet of 40 possible characters, makes maximal use of 6 bytes.
+The largest callsign alphabet size able to encode 9 characters into 6 bytes is 40.  This means the minimal callsign alphabet of 37 can be extended with three additional characters.

-## Callsign Formats
+##### Multiple Stations

-Government issued callsigns should be able to encode directly with no changes.
+To indicate multiple stations by the same operator, the '-' character can be used. A callsign such as "AB1CD-1" is considered a different station than "AB1CD-2" or even "AB1CD", but it is understood that these all belong to the same operator, "AB1CD".  The '-' character will be included in callsign alphabet.

-### Multiple Stations
+##### Fill

-To allow for multiple stations by the same operator, we borrow the use of the ‘-’ character from AX.25 and the SSID field. A callsign such as “AB1CD-1” is considered a different station than “AB1CD-2” or even “AB1CD”, but it is understood that these all belong to the same operator, “AB1CD”
+A space ' ' character is included in the callsign alphabet as a fill character or as a substitute for characters that are not part of the callsign alphabet.

-### Temporary Modifiers
+##### Dot

-Similarly, suffixes are often added to callsign to indicate temporary changes of status, such as “AB1CD/M” for a mobile station, or “AB1CD/AE” to signify that I have Amateur Extra operating privileges even though the FCC database may not yet be updated. So the ‘/’ is included in the base40 alphabet. The difference between ‘-’ and ‘/’ is that ‘-’ are considered different stations, but ‘/’ are NOT. They are considered to be a temporary modification to the same station.
+A dot '.' character is included in the callsign alphabet as ... TBD ...

-### Interoperability
+##### M17 base-40 Callsign Alphabet

-It may be desirable to bridge information between M17 and other networks. The 9 character base40 encoding allows for this:
-
-#### DMR
-
-DMR unfortunately doesn’t have a guaranteed single name space. Individual IDs are reasonably well recognized to be managed by [RadioID.net](https://www.radioid.net/database/search#!) but Talk Groups are much less well managed. Talk Group XYZ on Brandmeister may be (and often is) different than Talk Group XYZ on a private cBridge system.
-
-* DMR IDs are encoded as: D<number> eg: D3106728 for KR6ZY
-* DMR Talk Groups are encoded by their network. Currently, the following networks are defined:
-    * Brandmeister: BM<number> eg: BM31075
-    * DMRPlus: DP<number> eg: DP262
-    * More networks to be defined here.
-    
-#### D-Star
-
-D-Star reflectors have well defined names: REFxxxY which are encoded directly into base40.
+These final additions complete the 40 character M17 callsign alphabet as ' ' (space), 'A' through 'Z', '0' through '9', '-' (hyphen), '/' (forward slash), and '.' (dot).