micropython-font-to-py/FONT_TO_PY.md

# font_to_py.py

Convert a font file to Python source code. The principal reason for doing this
is to save RAM on resource-limited targets: the font file may be incorporated
into a firmware build such that it occupies flash memory rather than scarce
RAM. Python code built into firmware is known as frozen bytecode.

## V0.3 notes

8 Sept 2019

Remove redundancy from index file. Emit extra index for sparse fonts, reducing
code size. Add comment field in the output file showing creation command line.
Repo includes the file `extended`. This facilitates creating fonts comprising
the printable ASCII set plus `°μπωϕθαβγδλΩ`. Improvements to `font_test.py`.

###### [Main README](./README.md)

# Dependencies

The utility requires Python 3.2 or greater, also `freetype` which may be
installed using `pip3`. On Linux (you may need a root prompt):

```shell
# apt-get install python3-pip
# pip3 install freetype-py
```

# Usage

`font_to_py.py` is a command line utility written in Python 3. It is run on a
PC. It takes as input a font file with a `ttf` or `otf` extension and a
required height in pixels and outputs a Python 3 source file. The pixel layout
is determined by command arguments. By default fonts are stored in variable
pitch form. This may be overidden by a command line argument.

By default the printable ASCII character set (ordinal values 32 to 126
inclusive) is supported (i.e. not including control characters). Command line
arguments can modify this range as required to specify arbitrary sets of
Unicode characters. Non-English and non-contiguous character sets may be
defined.

Further arguments ensure that the byte contents and layout are correct for the
target display hardware. Their usage should be specified in the documentation
for the device driver.

Example usage to produce a file `myfont.py` with height of 23 pixels:  
`font_to_py.py FreeSans.ttf 23 myfont.py`

## Arguments

### Mandatory positional arguments:

 1. Font file path. Must be a ttf or otf file.
 2. Height in pixels.
 3. Output file path. Filename must have a .py extension.

### Optional arguments:

 * -f or --fixed If specified, all characters will have the same width. By
 default fonts are assumed to be variable pitch.
 * -x or --xmap Specifies horizontal mapping (default is vertical).
 * -r or --reverse Specifies bit reversal in each font byte.
 * -s or --smallest Ordinal value of smallest character to be stored. Default
 32 (ASCII space).
 * -l or --largest Ordinal value of largest character to be stored. Default 126.
 * -e or --errchar Ordinal value of character to be rendered if an attempt is
 made to display an out-of-range character. Default 63 (ord("?")).
 * -i or --iterate Specialist use. See below.
 * -c or --charset Option to restrict the characters in the font to a specific
 set. See below.
 * -k or --charset_file Obtain the character set from a file. Typical use is
 for alternative character sets such as Cyrillic: the file must contain the
 character set to be included. An example file is `cyrillic`. Another is 
 `extended` which adds unicode characters "° μ π ω ϕ θ α β γ δ λ Ω" to those
 with `ord` values from 32-126. Such files will only produce useful results if
 the source font file includes those glyphs.

The -c option may be used to reduce the size of the font file by limiting the
character set. If the font file is frozen as bytecode this will not reduce RAM
usage but it will conserve flash. Example usage for a digital clock font:

```shell
$ font_to_py.py Arial.ttf 20 arial_clock.py -c 1234567890:
```
Example usage with the -k option:  
```shell
font_to_py.py FreeSans.ttf 20 freesans_cyr_20.py -k cyrillic
font_to_py.py -x -k extended FreeSans.ttf 17 font10.py
```

If a character set is specified via `-c` or `-k`, then `--smallest` and
`--largest` should not be specified: these values are computed from the
character set.

Any requirement for arguments -xr will be specified in the device driver
documentation. Bit reversal is required by some display hardware.

There have been reports that producing fonts with Unicode characters outside
the ASCII set from ttf files is unreliable. If expected results are not
achieved, use an otf font. I have successfully created Cyrillic and extended
fonts from a `ttf`, so I suspect the issue may be source fonts lacking the
required glyphs.

The `-i` or `--iterate` argument. For specialist applications. Specifying this
causes a generator function `glyphs` to be included in the Python font file. A
generator instantiated with this will yield `bitmap`, `height`, and `width` for
every glyph in the font.

### Output

The specified height is a target. The algorithm gets as close to the target
height as possible (usually within one pixel). The actual height achieved is
displayed on completion, along with the width of the widest character.

A warning is output if the output filename does not have a .py extension as the
creation of a binary font file may not be intended.

## The font file

Assume that the you have employed the utility to create a file `myfont.py`. In
your code you will issue

```python
import myfont
```

The `myfont` module name will then be used to instantiate a `Writer` object
to render strings on demand. A practical example may be studied
[here](./writer/writer_demo.py).
The detailed layout of the Python file may be seen [here](./writer/DRIVERS.md).

### Binary font files

There is an option to create a binary font file, specified with a `-b` or
`--binary` command line argument. In this instance the output filename must
not have a `.py` extension. This is primarily intended for the e-paper driver
in applications where the file is to be stored on the display's internal flash
memory rather than using frozen Python modules.

The technique of accessing character data from a random access file is slow
and thus probably only applicable to devices such as e-paper where the update
time is slow.

Binary files currently support only the standard ASCII character set. There is
no error character: the device driver must ensure that seeks are within range.
Consequently the following arguments are invalid:

 * -s or --smallest
 * -l or --largest
 * -e or --errchar

# Dependencies, links and licence

The code is released under the MIT licence. The `font_to_py.py` utility
requires Python 3.2 or later.

The module relies on [Freetype](https://www.freetype.org/) which is included in most Linux distributions.  
It uses the [Freetype Python bindings](http://freetype-py.readthedocs.io/en/latest/index.html)
which will need to be installed.  
My solution draws on the excellent example code written by Daniel Bader. This
may be viewed [here](https://dbader.org/blog/monochrome-font-rendering-with-freetype-and-python)
and [here](https://gist.github.com/dbader/5488053).

# Appendix 1: RAM utilisation Test Results

The supplied `freesans20.py` and `courier20.py` files were frozen as bytecode
on a Pyboard V1.0. The following code was pasted at the REPL:

```python
import gc, micropython
gc.collect()
micropython.mem_info()

import freesans20

gc.collect()
micropython.mem_info()

import courier20

gc.collect()
micropython.mem_info()

def foo():
    addr, height, width = freesans20.get_ch('a')

foo()

gc.collect()
micropython.mem_info()
print(len(freesans20._font) + len(freesans20._index))
```

The memory used was 1712, 2032, 2384 and 2416 bytes. As increments over the
prior state this corresponds to 320, 352 and 32 bytes. The `print` statement
shows the RAM which would be consumed by the data arrays: this was 3956 bytes
for `freesans20`.

The `foo()` function emulates the behaviour of a device driver in rendering a
character to a display. The local variables constitute memory which is
reclaimed on exit from the function. Its additional RAM use was 16 bytes.

## Conclusion

With a font of height 20 pixels RAM saving was an order of magnitude. The
saving will be greater if larger fonts are used as RAM usage is independent of
the array sizes.

# Appendix 2: Recent improvements

The representation of non-contiguous character sets such as the `extended` set
presents a challenge because the ordinal values of the Unicode characters can
be expected to span a range much greater than the number of characters in the
set. Using an index of the type used for the ASCII set would be inefficient as
most of the elements would be null (pointing to the default character).

The code now behaves as follows. If the character set contains no more than 95
characters (including the default) the emitted Python file is as before. This
keeps the code small and efficient for the common (default) case).

Larger character sets are assumed to be sparse. Characters with ordinal values
which place them in the first 95 characters are looked up using the normal
index. Those above use an index optimised for sparse values and a binary search
algorithm.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								# font_to_py.py
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								Convert a font file to Python source code. The principal reason for doing this
 								is to save RAM on resource-limited targets: the font file may be incorporated
 								into a firmware build such that it occupies flash memory rather than scarce
 								RAM. Python code built into firmware is known as frozen bytecode.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+								## V0.3 notes
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+Sept 2019
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+								Remove redundancy from index file. Emit extra index for sparse fonts, reducing
 								code size. Add comment field in the output file showing creation command line.
-												Docs: minor change to FONT_TO_PY.md

											
										
										
											2019-09-08 05:03:00 +00:00
+								Repo includes the file `extended`. This facilitates creating fonts comprising
 								the printable ASCII set plus `°μπωϕθαβγδλΩ`. Improvements to `font_test.py`.
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								###### [Main README](./README.md)
-												FONT_TO_PY.md Improve installation instructions.

											
										
										
											2018-06-02 11:03:11 +00:00
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								# Dependencies
 								The utility requires Python 3.2 or greater, also `freetype` which may be
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								installed using `pip3`. On Linux (you may need a root prompt):
-												FONT_TO_PY.md Improve installation instructions.

											
										
										
											2018-06-02 11:03:11 +00:00
 								```shell
 								# apt-get install python3-pip
 								# pip3 install freetype-py
 								```
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								# Usage
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								`font_to_py.py` is a command line utility written in Python 3. It is run on a
 								PC. It takes as input a font file with a `ttf` or `otf` extension and a
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								required height in pixels and outputs a Python 3 source file. The pixel layout
 								is determined by command arguments. By default fonts are stored in variable
 								pitch form. This may be overidden by a command line argument.
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								By default the printable ASCII character set (ordinal values 32 to 126
 								inclusive) is supported (i.e. not including control characters). Command line
 								arguments can modify this range as required to specify arbitrary sets of
 								Unicode characters. Non-English and non-contiguous character sets may be
 								defined.
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								Further arguments ensure that the byte contents and layout are correct for the
 								target display hardware. Their usage should be specified in the documentation
 								for the device driver.
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								Example usage to produce a file `myfont.py` with height of 23 pixels:
 								`font_to_py.py FreeSans.ttf 23 myfont.py`
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								## Arguments
 								### Mandatory positional arguments:
 . Font file path. Must be a ttf or otf file.
 . Height in pixels.
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+. Output file path. Filename must have a .py extension.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								### Optional arguments:
 								 * -f or --fixed If specified, all characters will have the same width. By
 								 default fonts are assumed to be variable pitch.
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+								 * -x or --xmap Specifies horizontal mapping (default is vertical).
 								 * -r or --reverse Specifies bit reversal in each font byte.
 								 * -s or --smallest Ordinal value of smallest character to be stored. Default
 (ASCII space).
 								 * -l or --largest Ordinal value of largest character to be stored. Default 126.
 								 * -e or --errchar Ordinal value of character to be rendered if an attempt is
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								 made to display an out-of-range character. Default 63 (ord("?")).
-												Add --iterate argument.

											
										
										
											2018-09-06 10:38:52 +00:00
+								 * -i or --iterate Specialist use. See below.
-												font_to_py.py Add --charset option.

											
										
										
											2018-08-20 06:45:39 +00:00
+								 * -c or --charset Option to restrict the characters in the font to a specific
 								 set. See below.
-												Document -k option.

											
										
										
											2018-08-28 10:52:33 +00:00
+								 * -k or --charset_file Obtain the character set from a file. Typical use is
 								 for alternative character sets such as Cyrillic: the file must contain the
-												V0.27 Command line as comment in font file.

											
										
										
											2019-09-06 08:06:23 +00:00
+								 character set to be included. An example file is `cyrillic`. Another is
 								 `extended` which adds unicode characters "° μ π ω ϕ θ α β γ δ λ Ω" to those
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								 with `ord` values from 32-126. Such files will only produce useful results if
 								 the source font file includes those glyphs.
-												font_to_py.py Add --charset option.

											
										
										
											2018-08-20 06:45:39 +00:00
-												GUI elements split out to nanogui.

											
										
										
											2018-08-31 10:17:58 +00:00
+								The -c option may be used to reduce the size of the font file by limiting the
 								character set. If the font file is frozen as bytecode this will not reduce RAM
 								usage but it will conserve flash. Example usage for a digital clock font:
-												font_to_py.py Add --charset option.

											
										
										
											2018-08-20 06:45:39 +00:00
 								```shell
-												font_to_py.py Update docs.

											
										
										
											2018-08-20 10:54:36 +00:00
+								$ font_to_py.py Arial.ttf 20 arial_clock.py -c 1234567890:
-												font_to_py.py Add --charset option.

											
										
										
											2018-08-20 06:45:39 +00:00
+								```
-												Document -k option.

											
										
										
											2018-08-28 10:52:33 +00:00
+								Example usage with the -k option:
 								```shell
-												GUI elements split out to nanogui.

											
										
										
											2018-08-31 10:17:58 +00:00
+								font_to_py.py FreeSans.ttf 20 freesans_cyr_20.py -k cyrillic
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								font_to_py.py -x -k extended FreeSans.ttf 17 font10.py
-												Document -k option.

											
										
										
											2018-08-28 10:52:33 +00:00
+								```
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								If a character set is specified via `-c` or `-k`, then `--smallest` and
 								`--largest` should not be specified: these values are computed from the
 								character set.
-												V0.27 Command line as comment in font file.

											
										
										
											2019-09-06 08:06:23 +00:00
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+								Any requirement for arguments -xr will be specified in the device driver
 								documentation. Bit reversal is required by some display hardware.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								There have been reports that producing fonts with Unicode characters outside
 								the ASCII set from ttf files is unreliable. If expected results are not
 								achieved, use an otf font. I have successfully created Cyrillic and extended
 								fonts from a `ttf`, so I suspect the issue may be source fonts lacking the
 								required glyphs.
-												Add warning about ttf extended ASCII.

											
										
										
											2018-03-11 11:01:46 +00:00
-												Add --iterate argument.

											
										
										
											2018-09-06 10:38:52 +00:00
+								The `-i` or `--iterate` argument. For specialist applications. Specifying this
 								causes a generator function `glyphs` to be included in the Python font file. A
 								generator instantiated with this will yield `bitmap`, `height`, and `width` for
 								every glyph in the font.
-												Height determination improved. Better user feedback.

											
										
										
											2016-11-20 17:12:32 +00:00
+								### Output
 								The specified height is a target. The algorithm gets as close to the target
 								height as possible (usually within one pixel). The actual height achieved is
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+								displayed on completion, along with the width of the widest character.
-												Height determination improved. Better user feedback.

											
										
										
											2016-11-20 17:12:32 +00:00
 								A warning is output if the output filename does not have a .py extension as the
 								creation of a binary font file may not be intended.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								## The font file
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								Assume that the you have employed the utility to create a file `myfont.py`. In
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								your code you will issue
 								```python
 								import myfont
 								```
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								The `myfont` module name will then be used to instantiate a `Writer` object
-												Tidy up docs

											
										
										
											2016-11-15 13:58:52 +00:00
+								to render strings on demand. A practical example may be studied
-												Fix broken links in docs.

											
										
										
											2018-08-20 10:34:49 +00:00
+								[here](./writer/writer_demo.py).
 								The detailed layout of the Python file may be seen [here](./writer/DRIVERS.md).
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												Precise height determination. Binary file output option.

											
										
										
											2016-11-18 18:10:03 +00:00
+								### Binary font files
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								There is an option to create a binary font file, specified with a `-b` or
 								`--binary` command line argument. In this instance the output filename must
 								not have a `.py` extension. This is primarily intended for the e-paper driver
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+								in applications where the file is to be stored on the display's internal flash
-												Precise height determination. Binary file output option.

											
										
										
											2016-11-18 18:10:03 +00:00
+								memory rather than using frozen Python modules.
-												Optional support for extended ASCII

											
										
										
											2017-01-12 15:48:26 +00:00
+								The technique of accessing character data from a random access file is slow
 								and thus probably only applicable to devices such as e-paper where the update
 								time is slow.
 								Binary files currently support only the standard ASCII character set. There is
 								no error character: the device driver must ensure that seeks are within range.
 								Consequently the following arguments are invalid:
 								 * -s or --smallest
 								 * -l or --largest
 								 * -e or --errchar
-												Precise height determination. Binary file output option.

											
										
										
											2016-11-18 18:10:03 +00:00
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								# Dependencies, links and licence
-												GUI elements split out to nanogui.

											
										
										
											2018-08-31 10:17:58 +00:00
+								The code is released under the MIT licence. The `font_to_py.py` utility
 								requires Python 3.2 or later.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								The module relies on [Freetype](https://www.freetype.org/) which is included in most Linux distributions.
 								It uses the [Freetype Python bindings](http://freetype-py.readthedocs.io/en/latest/index.html)
 								which will need to be installed.
 								My solution draws on the excellent example code written by Daniel Bader. This
-												GUI elements split out to nanogui.

											
										
										
											2018-08-31 10:17:58 +00:00
+								may be viewed [here](https://dbader.org/blog/monochrome-font-rendering-with-freetype-and-python)
 								and [here](https://gist.github.com/dbader/5488053).
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
+								# Appendix 1: RAM utilisation Test Results
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								The supplied `freesans20.py` and `courier20.py` files were frozen as bytecode
 								on a Pyboard V1.0. The following code was pasted at the REPL:
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								```python
 								import gc, micropython
 								gc.collect()
 								micropython.mem_info()
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								import freesans20
 								gc.collect()
 								micropython.mem_info()
 								import courier20
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								gc.collect()
 								micropython.mem_info()
 								def foo():
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								    addr, height, width = freesans20.get_ch('a')
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								foo()
 								gc.collect()
 								micropython.mem_info()
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								print(len(freesans20._font) + len(freesans20._index))
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
+								```
-												font_to_py: update RAM use figures in docs.

											
										
										
											2018-08-20 12:53:20 +00:00
+								The memory used was 1712, 2032, 2384 and 2416 bytes. As increments over the
 								prior state this corresponds to 320, 352 and 32 bytes. The `print` statement
 								shows the RAM which would be consumed by the data arrays: this was 3956 bytes
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								for `freesans20`.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
-												Improve docs to clarify frozen bytecode technique.

											
										
										
											2018-07-04 06:38:00 +00:00
+								The `foo()` function emulates the behaviour of a device driver in rendering a
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								character to a display. The local variables constitute memory which is
 								reclaimed on exit from the function. Its additional RAM use was 16 bytes.
-												Docs updated

											
										
										
											2016-11-15 12:10:21 +00:00
 								## Conclusion
-												Writer version 0.3

											
										
										
											2018-08-14 16:22:21 +00:00
+								With a font of height 20 pixels RAM saving was an order of magnitude. The
 								saving will be greater if larger fonts are used as RAM usage is independent of
 								the array sizes.
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+								# Appendix 2: Recent improvements
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+								The representation of non-contiguous character sets such as the `extended` set
 								presents a challenge because the ordinal values of the Unicode characters can
 								be expected to span a range much greater than the number of characters in the
 								set. Using an index of the type used for the ASCII set would be inefficient as
 								most of the elements would be null (pointing to the default character).
-												V0.28 release.

											
										
										
											2019-09-07 10:55:12 +00:00
-												Dual indices.

											
										
										
											2019-09-08 11:55:09 +00:00
+								The code now behaves as follows. If the character set contains no more than 95
 								characters (including the default) the emitted Python file is as before. This
 								keeps the code small and efficient for the common (default) case).
 								Larger character sets are assumed to be sparse. Characters with ordinal values
 								which place them in the first 95 characters are looked up using the normal
 								index. Those above use an index optimised for sparse values and a binary search
 								algorithm.