User:Morenus/MorseCodeOptimization

Abstract
The lengths of Morse codes for letters nearly correspond to their letter frequencies, with some odd exceptions. Why those exceptions? Could it be better? Different languages have different letter frequencies, but that doesn't entirely answer the question.

Discussion
Today's Morse code is not quite the same as the code invented by Samuel F. B. Morse with Alfred Vail in the 1830s, now known as American Morse Code.

International Morse Code was originally created in 1848 by Friedrich Clemens Gerke, a German. It was adopted with minor changes as a European standard in 1865, becoming widely used for radio in the 1890s. It was never adopted for American telegraphy, as American Morse code was 5% faster (for American purposes). But American Morse Code died with the telegraph, and the Morse code of today is International Morse Code.



The shortest unit of Morse code is a "dit". In text-book, full-speed Morse, a "dah" is conventionally 3 times as long as a dit. The spacing between dits and dahs within a character is the length of one dit; between letters in a word it is the length of a dah (3 dits); and between words it is 7 dits. Thus, the letter E in Morse code is the shortest, at one "dit", while the letters Q, Y, and J are tied for longest at 13 "dits" each. The letter E is also the most frequently used letter in English, French, German, Spanish, and Italian, while Q, Y, and J are uncommonly used. This is no accident; both American and International Morse codes were designed to transmit common letters more quickly.

Is International Morse Code optimum for one language? Letter frequencies are somewhat different for each language, as shown in graphs above, and the table below. The table below also shows the Morse codes for each letter, with their proper relative lengths. At the top of each language column is a symbol; click on it twice to sort the table by letter frequency for that language. When sorted this way, the shortest Morse codes should be at the top of the chart, longest at the bottom.

Morse code and relative letter frequencies, for six languages
Try sorting for letter frequencies in the different languages.

Conclusion
International Morse Code is certainly not best for English. French and German seem to have it best, in different ways. Only in French are the six shortest codes the six most frequent letters, but overall, code length fits closest to German letter frequency. The anomalies that slow it down for most languages are only good for German. Since it was invented by a German for German use, and became an international standard in Paris, France, perhaps that is not surprising. It was never used for American telegraphy.

For English, Spanish, and Italian, the code for O stands out as far too long. All six languages would benefit from a shorter O. All would benefit from a longer M. If the codes for O and M were switched, International Morse Code would work better in most languages, with a negligible hit to German, and big improvements to English, Spanish, and Italian.

A is a very important letter, except in German. It is the second most frequent in Spanish and Italian, and third in English. Yet the code for A is longer than the codes for I and T, which are tied for second-shortest. Every language (except German) would benefit from switching I and A, to shorten the code for A.

The only specific optimization for English appears to be the code for T, its second most common letter. In most other languages, T is much farther down the list. As mentioned above, T is tied with I for second shortest code.

There is no code for the German character ß, which means SS, but SS is already the right length, as the table shows. The poor French apparently have no codes for œ or ê.