Rare letter combinations and key chords

A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe key-chord combinations for Emacs.

Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.

The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to jk, that command will execute whenever you type j and k in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the j and the k. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.

The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.

Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)

Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.

Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.

Update (2 Feb 2015):

This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that qe would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).

Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).

Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).

9 thoughts on “Rare letter combinations and key chords

  1. I wonder if the statistics differ between code and prose. E.g. there’s a bunch of POSIX calls that start with mq (for message queue) and mk (for make).

  2. Much of the corpus predates the web, but not all of it. And you can tell that books containing source code were included because you’ll see fragments of code in their list of “words.”

  3. Aleksandr Vinokurov

    I agree with Petr: it is rather arguable for coding to use key-chords especially in CamelCase naming style and abbreviations. Plus, how it lives with one-key commands (like g in comlilation-mode)?

  4. That’s the beauty of a massively configurable editor. You can make up the key chords that make sense for your context, or not use them at all.

  5. By default the order in key-chord-mode doesn’t matter because its intent is for simultaneous (or very close to it) keypresses. Personally I’d rather have ordered key chords. For me it’s a lot easier to type “rolling” key chords than simultaneous ones. Luckily it’s very easy to redefine the key-chord-define function to make it only work with ordered keypresses (just remove the last s-expression in the function). This gives you a lot more options and you don’t lose any functionality. You only have to use 2 bindings in case you want them unordered.

Comments are closed.