Koch method with real words. Prepare a list of words to learn Morse code by the Koch method, using real words.

Go to file

Andreas Krüger 9dc07a26b1 The companion morse2sound is finally ready.		2024-06-27 13:55:32 +02:00
de_DE	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
en_GB	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
en_US	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
.dockerignore	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
.gitignore	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
Dockerfile	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
LICENSE	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
README.md	The companion morse2sound is finally ready.	2024-06-27 13:55:32 +02:00
count_letters.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
dockerignore	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
find_learning_order.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
generate_wordlist.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
letter2bitmask.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
letters_rare_first.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
mk_bin_file.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
mk_lessons.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
sources.list	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00
tasks.py	Koch method for learning Morse code with real words: Initial public commit.	2020-12-22 18:33:49 +01:00

README.md

What is this?

Koch method with real words. Prepare a list of words to learn Morse code by the Koch method, using real words.

TL;DR

If you just want to learn Morse code:

Find a programm that can convert text files to Morse code sound. One such program is morse2sound, which in more than one sense is this software's companion.
Choose your language. We have German, British English and American English on offer.
Let your program play lesson_01_0.txt to you at speed 12 wpm = 60 cpm. There are several blanks after each word, so you'll get a small break after each word to comprehend it. You may need to copy to paper at first, but try to copy in your head only.
Do not set up your program to introduce additional gaps between letters of the same word, so as to slow down the speeds of the words below 60 cpm.
Next, let your program play lesson_01_1.txt, then lesson_01_2.txt and so on. Keep track how many of the words you can copy. If you are down to five non-copies per lesson file, you are done with lesson 01 and can continue to lesson 02.
The initial lesson 01 teaches you the three letters. All other lessons introduce one new letter per lesson. Which letters these are depends on the language. You can read up learning_order.json to find out the details.
Skip to the next lesson whenever you are down to five or fewer non-copies per lesson file. No reason to listen to all files from 0 to 9 just because they are there.
It is recommended to practice for 30 minutes per day.
For the practice frequency, it is recommended to practice anything between daily down to twice a week. This depending on how much in a hurry you are to learn Morse code and how much time you're willing to spend per week.
On each new learning session (day), start by listening again to one file (or more, as needed) of the lesson you stopped with at the end of the previous learning session. In that sense, it is not quite "one lesson per day". Proceed further when you have five non-copies in that file or less.

With this system, you can expect to learn receiving the 26 letters of the alphabet as Morse code within some 25 to 30 half-hour learning sessions. As you have always practiced at 12 wpm or 60 cpm, that is the speed you will have mastered.

This speed is already above the dreaded 50 cpm "plateau of unlearning of thinking", so you'll never get stuck there.

What is that "Koch method"?

Ludwig Koch was a German scientist who developed, in his research, a method of teaching Morse code. This was published in February 1936:

Ludwig Koch (Braunschweig): "Arbeitspsychologische Untersuchung der Tätigkeit bei der Aufnahme von Morsezeichen, zugleich ein neues Anlernverfahren für Funker" (Dissertation der Technischen Hochschule Braunschweig), Zeitschrift für angewandte Psychologie und Charakterkunde, Band 50 Heft 1 u. 2, Februar 1936.

The main results of his research and key points of his method are (crudely) summarized as follows (all speeds in characters per minute, divide by 5 to get to words per minute):

Speeds below 50 invite people to think while listening to Morse code.
As long as thinking is involved, speeds above about 50 cannot be reached.
It is important to hear Morse code as characters "automatically" without thinking.
Thinking doesn't work at speeds above 50. So all Morse code training should be at speeds above that, from the very beginning.
Koch's research established his method as an improvement over the Fansworth method of sending individual letters at high speed and leaving gaps between them to reduce speed much below 50, which again facilitates thinking. For the record: At the time, the method wasn't called by its now popular name "Fansworth" yet, Koch calls it "Klangbildverfahren", but it was commonly used.
A moderate speed of 60 is ideal for learning Morse code. (He also experimented with higher initial speeds, but that lead to slower learning.)
One should start initial Morse code practice by being offered two different characters only, at speed 60.
During the initial minutes of the original Koch-method course, one listens to those Morse characters, but does not yet know which letters they represent. One simply writes a dot for each letter heard.
Then the characters are disclosed and are copied on paper as heard.
Practice is always in groups of five letters, as common for the encrypted traffic of that day.
When 90 % of the characters are copied correctly by the learning group as a whole, the next (third) letter is introduced.
This continues throughout the course: Each time the group as a whole copies the letters learned thus far 90 % correctly, a new letter is added.
Each new letter is introduced as sound only. It is not initially disclosed which character the letter stands for. At first, students copy a dot for each new letter heard.
Throughout the course, it is stressed to always put a dot for a letter not immediately copied. Trying to think when a letter is not immediately copied is discouraged: Thinking is not likely to help, but likely to hinder copying the next letter.
Course lectures are given in half-hour units. Koch gives reasons why this is the optimal time frame.
Koch recommends against binge learning. Even in the common military setting of the day, Koch recommends half-hour Morse code sessions should not be held more frequently than twice per day. In his own experiments, volunteers from all walks of life were given half-hour sessions two or three times a week.
When the first two half-hour sessions have been completed and the 90 % rule has been followed, the typical course will have picked up 4 different characters. Some courses manage 5.
Koch observes that people who have not picked up 4 characters after two half-hour sessions of learning, fairly typically do not manage to learn Morse code at all, even when putting in a lot of effort. He recommends using these first two sessions as a sort of entry exam.
It should be mentioned that, in his day, the qualification target was speed 100 in copying cipher text and speed 125 in copying plain text.
Each half-hour session starts with a repetition of the previously learned material. Later in the course, in most session, only a single new character is introduced.
The letters p x v y q z tend to be particularly difficult to pick up. Some of these characters may require two sessions before the course copies them 90 % and the next character can be introduced.
Koch gives no clear recommendation in which order to learn the letters. The first letters introduced should be quite different-sounding from each other, that much is clear. One of his courses used: h f a b g c d e ch l k i o t m p q r s n u w v y x z. This contradicts a recommendation he also gives: The more difficult letters should be introduced somewhat early in the course, after about a third or so, so they see some repetition later.
Koch recommends dual pitch: Sending "dits" in a slightly lower pitch than "dahs", at first. The different in pitch should be "quite low" (Koch is not specific here). The general impression is that difference was not rigorously controlled and may have varied somewhat from course session to course session. This dual pitch is held up only for the first half or third or so of the course. Thereafter, the pitch difference is gradually reduced; so normal single pitch is reached some time before the end of the course. Koch does not claim dual pitch to be essential, but offers a comparison: A dual-pitch course finished three sessions earlier than a single-pitch companion course.
How long does it take to learn Morse code by the Koch method? During his research, Koch took the shortcut of teaching only the 26 letters of the alphabet. As his courses were taught to conduct research, not to train radio people, digits, punctuation, or pro signs were not introduced. Teaching 26 alphabet characters usually took 24 to 28 half-hour sessions, apparently spread over some 8 to 14 weeks (depending on twice or thrice a week schedule). For the record: The text mentions teaching "ch" (no longer a character in today's international Morse code as specified by the ITU), but all the graphs show 26 characters being taught; the publication is not entirely clear here.

It strikes me that in his 70 page paper, Koch never even mentions Morse code keying training. Sending does not seem to be an issue that needs a lot of concern, once reception is mastered.

That rather fits my own experience and that of many other radio amateurs: Once solid code copy is mastered, keying is reduced to a mere mechanical task. There is some concern about avoiding cramping and "glass fist", yes. But Morse code knowledge straightforwardly transfers from copy to keying.

Koch and today's radio amateur

The speeds are comparable. In Koch's age, the top-notch professional radio men on board of ships or airplanes had passed exams demanding solid copy of 100 characters per minute cipher text or 125 plain, for five minutes. Today's radio amateur "high speed club" demands those same 125 characters per minute, plain text, but for half an hour, mixed reception and sending. In contests, many CW stations use similar speed ranges.

Koch mentions less demanding environments of his days, where average communication speeds of 60-90 were common. Today, we have a lot of Morse code communication going on in these slower 60+ speed ranges, in particular on the lower bands. Quite a few radio amateurs need such relatively slow speeds in order to be able to join the party.

Unfortunately, many have learned to copy Morse code in a way that involves some kind of thinking or mental processing, rather than automatism. Trying to get faster, they have thoroughly practiced, but what they practice is exactly such mental processing. In the end, they find themselves stuck at a speed level of maybe 50 or, with lots of more practice, 60 characters per minute.

That's the end speed that can be reached with mental processing, with thinking. In contrast, higher speed requires getting rid of thinking, replacing the processing with automatism. This is by no means impossible, but hard to do.

Many never break that barrier.

That same barrier at about speed 50 cpm was well-known in Koch's days. His brilliant idea was to sneak new learners around it.

But Koch's method can also be used to break the barrier in those that already struggle with it. I happen to know from first-hand personal experience.

Important differences between then and now exist. In Koch's days, radio operators copied to paper all they heard, to be forwarded to the intended recipients. Today's Morse code users, we radio amateurs, are typically ourselves the intended recipients of messages we receive. There is no need for paper copy of every character received, if we manage to manage to comprehend the messages directly.

The traditional training back then involved copying five-letter groups of random text to paper. If comprehension independent of paper copy is our goal, this might not be optimal.

I therefore propose doing away with meaningless five-letter groups. If comprehension is to be trained, let us use comprehensible words.

Can we combine Koch' method with real words? That's exactly what this software is about.

Koch with real words

Pretty much same as Koch, but with real words from your mother tongue replacing traditional meaningless five-letter groups.

You may choose to copy them on paper. But I recommend to simply try to understand them in your head. You'll know the word once you "have" it.

Koch starts initial training with two characters. We have to bite the lemmon and start with three characters instead. There is hope that we can find a few meaningful words that can be spelled with some chosen three letters.

Which initial three letters allow at least some words to be spelled? How to choose?

That's where software comes in.

Given our alphabet of 26 letters, there are 2600 choices of three different letters from that alphabet. Software can simply and systematically try out all these 2600 choices, and find the choice that allows the most words to be spelled with just those three letters chosen.

In German, those three letters are "a", "p", and "s". 21 short German words can be spelled with just these three letters. In alphabetic order, these are:

aa aas app apps ass papa papas papp pass passa passas ppp pps ps sap saps sas sass spa spas spass

21 different words is not a whole lot. But for the first 20 minutes of practice or so, such a restricted set of words may just be good enough. Then, a new, fourth letter can be added.

Again, software to the rescue. The "t" is a good choice here, adding it to gives 29 new German words. And so on. For details, see learning_order.json (of your chosen language).

Disadvantages

If you follow these suggestion, you'll learn only the letters and have to deal with numbers and other characters separately later.
The present implementation is based on a dictionary. So odd, funny words are practiced. We might want an improvement that takes word frequency into account.

How to run this?

Installation

Have docker installed.

Preparation

Run, in this directory (that contains this README.md file):

docker build -t registry.invalid/kmrw:latest .

Beware! There's a . at the end of that line. It is needed.

This prepares the software and a spell checking dictionary and packs it all into the Docker image.

If you want to develop the code and rebuild frequently, it is convenient to have a local HTTP proxy, in which case you'd use something like

docker build --build-arg=http_proxy=http://172.18.0.1:3128/ -t registry.invalid/kmrw:latest .

If you don't know what a local HTTP proxy is, you can safely ignore this.

What's this `registry.invalid`?

(Skip this section if you only want the recipe, not the explanation.)

Docker uses a registry, which is a place in the cloud where Docker images are served (and, more likely than not, the company that operates it spies on you). Out of the box, the docker command uses registry-1.docker.io when you don't mention another one.

I would like a way to say "cut out this registry stuff, this particular Docker image lives locally on my computer only and goes nowhere". I would much prefer a Docker image without a registry would be just that, a Docker image without a registry, but that's not how the docker command has been designed. But I can achieve the same effect by using a bogus registry name.

The .invalid top level domain has been set aside officially to be just that - invalid. A registry ending with ".invalid" will be nowhere. (Technically, DNS will not yield an IP for any hostname in the .invalid domain.)

Run

This is for the people who want to play with the system and create their own lesson files.

I run the following in a bash.

The docker build ... command needs to happen in the directory that has this README.md file and, more to the point, the Dockerfile.

In contrast, you can run everything that follows in any (one) directory of your choosing.

Why "(one)"? Later commands generally depend on files produced by earlier ones. So, for smoothest sailing, don't switch directories in between calls.

Unless you want to switch languages. One directory per language is an excellent idea.

Some of the reasonably fast running commands produce no output. This follows the time-honored "no news is good news" command line tradition. If you know about exit values, check those.

I run the commands in a bash on a Linux system. If you are in a non-Linux environment, you can probably remove the --user="$(id --user):$(id --group)" stances. If you remove those, there is some concern: Both the user inside the Docker container and you on the outside want access to the files. But then, rumors have it that on some non-Linux operating systems, the Docker installation handles this for you automatically. I don't know; in my spare time, I'm not polyglot, generally, I never leave Linuxland unless I have to.

All in one swoop

If you are like me and want to run the whole thing in one swoop, for all three languages supported (which takes some 4,5 hours on my laptop):

docker run --rm -ti --mount=type=bind,src="$(pwd)",dst=/fromhost --user="$(id --user):$(id --group)" --workdir=/fromhost registry.invalid/kmrw:latest invoke all

Or you can do stuff for only one language: Replace the all in the above by de-de, or en-gb, or en-us. This cuts the time needed down to about 1,5 hours on my laptop.

A word on my invoke tasks

Those invoke tasks tend to be conservative. What one invocation constructed, the next will not knowingly overwrite. Even invoke --list will show fewer and fewer possible tasks.

If you actually do want to run stuff again, delete the file that was produced.

Wordlist generation

This step generates a list of words, only ASCII characters, one word per line, from the material of spell checking dictionaries.

Those spell checking dictionaries and accompanying software is in the Docker container we built. Currently it has German, and the two English flavors GB and US.

The German command line is:

docker run --rm -ti --mount=type=bind,src="$(pwd)",dst=/fromhost --user="$(id --user):$(id --group)" --workdir=/fromhost registry.invalid/kmrw:latest invoke de-de.mk-wordlist-de-de

For US-English, replace the invoke target de-de.mk-wordlist-de-de with en-us.mk-wordlist-en-us, for British English, with en-gb.mk-wordlist-en-gb.

This spills out a bit of output from the spell check software we use.

If all is well, you'd find a file wordlist.txt in the language sub-directory that contains, one word a line, all words the spell checker knows that are at most 6 characters long.

Reading words in your head gets much harder with increasing word length. Hence the restriction to at most 6 characters.

Letter count file generation

Next, I use that to generate a count, for each letter, in how many words that letter occurs. The invoke targets are de-de.mk-lettercount-de-de, en-gb.mk-lettercount-en-gb, and en-us.mk-lettercount-en-us.

Each of these results in a plain text file lettercount.txt. Have a look, if you want to.

Database generation

Now, the fun stuff with "databases" starts. We generate a database letterset2count.kmrw. (The abbreviation kmrw is intended to stand for "Koch method - real words".) This database contains, for each set of letters, the number of words that can be build from (spelled with) those letters.

That file letterset2count.kmrw will be 268,435,456 bytes long, 2 ** 26 * 4.

An earlier version took roughly 10 hours on my laptop. After some optimization, that's down to about 1 hour 20 minutes.

This program tries to keep you entertained by occasionally posting estimates for when it'll finish. The time printed is (probably) UTC. With each estimate, it also posts a number, counting down from 2**26 = 67108864. As the hard work is done first, all estimates tend to be (decreasingly) pessimistic. The initial estimate is outright ridiculous.

The invoke targets are de-de.mk-lettercount-de-de, en-gb.mk-lettercount-en-gb, and en-us.mk-lettercount-en-us.

Upon completion, this process leaves a database file letterset2count.kmrw in your current directory. For any possible choice of a handful of letters from the alphabet a...z, that database gives you the number of words that can be spelled with only the letters from your chosen handful.

To be interpreted, that database requires the lettercount.txt to be available as well. The details:

What's the deal with these database files?

You can skip this section if you are only interested in the results, not the details.

Database logic

The databases can be thought of as key/value maps.

The key is a set of letters a-z. Order of letters is ignored, as is repetition, so "aab" and "ba" come down to the same set of letters, hence the same key.

The value is simply an integral number between 0 (inclusive) and 2^{32} (exclusive): The numbers of words than can be spelled with just the letters from the set.

Database file layout

Every individual letter is associated with a power of 2.

Which power of 2 depends in the number of words containing that letter, hence on the language. Common letters get higher powers of two, less common letters get lower ones. To see for yourself, change to the language directory that has the lettercount.txt file and there, execute this:

docker run --rm -ti --mount=type=bind,src=$(pwd),dst=/fromhost --user="$(id --user):$(id --group)" --workdir=/fromhost  registry.invalid/kmrw:latest letter2bitmask.py --lettercount lettercount.txt

Given a set of these letters, I simply "bitwise or" the corresponding powers of two. (If that sounds like a weird or interesting idea to you, I recommend you research "bitmasks".) (If you don't understand "bitwise or", you can think "add". As long as the same number isn't added twice, it boils down to the same thing.)

The result is a number. That number completely determines the set of letters, hence the key into the dictionary.

If you have a lettercount file in your current directory, you can explore this mapping from sets of letters to numbers and reverse at the interactive python3 prompt. Here is one sample session.

Gory details:

The stuff following ">>>" or "..." is what I typed at the Python prompt. I typed two spaces before map = ..., and the next line was completely empty. (If you know Python, this will not surprise you.)
You may get different numbers if you use a letterfile different from mine. (This particular example was done with the en-US wordlist.)

andreas@meise:~/amateurfunk/floss/python/koch-method-real-words/en_US
$ docker run --rm -ti --mount=type=bind,src=$(pwd),dst=/fromhost -e HOME=/tmp --user="$(id --user):$(id --group)" --workdir=/fromhost  registry.invalid/kmrw:latest python3
Python 3.7.3 (default, Jul 25 2020, 13:03:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from letters_rare_first import from_lettercount_file
from letters_rare_first import from_lettercount_file
>>> from letter2bitmask import Letter2Bitmask
from letter2bitmask import Letter2Bitmask
>>> with open('lettercount.txt', 'r') as file:
with open('lettercount.txt', 'r') as file:
...   map = Letter2Bitmask(from_lettercount_file(file))
  map = Letter2Bitmask(from_lettercount_file(file))
... 

>>> map.number("kochmethod")
map.number("kochmethod")
34826368
>>> map.chars(34826368)
map.chars(34826368)
['c', 'd', 'e', 'h', 'k', 'm', 'o', 't']
>>> map.number(['c', 'd', 'e', 'h', 'k', 'm', 'o', 't'])
map.number(['c', 'd', 'e', 'h', 'k', 'm', 'o', 't'])
34826368
>>> [map.number(ch) for ch in ['c', 'd', 'e', 'h', 'k', 'm', 'o', 't']]
[map.number(ch) for ch in ['c', 'd', 'e', 'h', 'k', 'm', 'o', 't']]
[16384, 65536, 33554432, 2048, 128, 8192, 1048576, 131072]
>>> 16384 | 65536 | 33554432 | 2048 | 128 | 8192 | 1048576 | 131072
16384 | 65536 | 33554432 | 2048 | 128 | 8192 | 1048576 | 131072
34826368
>>> 16384 + 65536 + 33554432 + 2048 + 128 + 8192 + 1048576 + 131072
16384 + 65536 + 33554432 + 2048 + 128 + 8192 + 1048576 + 131072
34826368
>>> exit(0)
exit(0)

Letters common in the language are coded to larger numbers. This tremendously helps speed up the database generation and also makes kmrw files more easily compressible (with xz, zip, and the like).

So, what we now have is a key that still represents a set of letters, but actually is a mere integral number.

The database format is rather simple. Here is how it works:

Multiply that number by 4 to get at a position. E.g., the number 34826368 would yield a position of 139305472.
Go to that position in the file, and read the next four bytes.
Interpret those four bytes as a 32 bit integer - and voila, that is the value.

In my example, that value happened to be 270:

$ docker run --rm -ti --mount=type=bind,src=$(pwd),dst=/fromhost --user="$(id --user):$(id --group)" --workdir=/fromhost  registry.invalid/kmrw:latest /bin/bash -c "dd if=letterset2count.kmrw bs=1 skip=139305472 count=4 | od -td"
4+0 records in
4+0 records out
0000000         270
0000004
4 bytes copied, 0.0213296 s, 0.2 kB/s

That can be verified by straightforwardly counting all words in the wordlist.txt file that consist of those letters:

$ docker run --rm -ti --mount=type=bind,src=$(pwd),dst=/fromhost --user="$(id --user):$(id --group)" --workdir=/fromhost  registry.invalid/kmrw:latest /bin/bash -c "grep -P '^[cdehkmot]+$' wordlist.txt | wc -l"
270

So that's what a .kmrw file is. (If you know what "memory mapped" means: Internally in my Python program, it's a memory mapped array of integers. That's all.)

Don't open a .kmrw file with standard tools (unless for you, a hex viewer or similar qualifies as "standard tool").