In this post, I’ll share the typical hurdles that appear when you start a project. Something that sounds easy (get an online dictionary) ended up being one of the major challenges in developing a word game of chance.
Language is dynamic and changes over time, since words are added every year. That’s why it’s impossible to keep a perfect dictionary. Also, it’s quite a challenge to choose which words qualify to make it to the player’s dictionary – some words are just too tough to recall.
Our dictionary is built by scraping many many words from the world wide web. In total, over 25,000 five-letter words made it through our starting database – from “again” to “loofa” (which of course is “the dried fibrous part of the fruit of a plant of the genus Luffa; used as a washing sponge or strainer”) to “zupus” (which is not a word, please prove me wrong).
Filtering out non-existing words and tough-to-recall words
We then used an online dictionary (shout out to WordsAPI) that has helped us a ton to get a playworthy dictionary. With this API, we were able to get a value that objectively indicates how common/frequent a word is in the English language by analysing years of BBC subtitles.
Love it! But, after going through the set – we found many names and far-fetched words to still have a high frequency. So we manually went through the whole list and tried to delete:
- Uncommon words
- Names of people, rivers, places, brands (except countries)
This leads to a dictionary of 2,823 words.
From this point, we will improve the dictionary re-actively. Every month, players can indicate if they think a word is too uncommon to be included, or too common to be excluded. The next month, we let the community decide whether a word should be included by a voting system.
If a sharp-eyed player finds a word that shouldn’t have qualified in the first place (names and brands) we will immediately delete the word from our vocabulary and the player.
Thanks in advance to everyone participating in improving our service!