Codenames Cheatbot

In the game Codenames, you need to find a word that is associated with one or more given words. Here, you can input one to six words and press "Get Words", and ten options of connecting words will appear (it should take a few seconds). In my experience, usually one or two are actually helpful.

The program works by representing each word's meaning as a vector, and finding the words which maximize the product of their vectors' cosine similarity with each of your input words' vectors (see further explanation below). The output words are in decreasing order of relevance.

Currently, the page freezes annoyingly while it's running Python in the background, but I'm putting this online anyways because I don't want to spend more than a few hours debugging it. Maybe I'll try again later, but if you would want to help and/or know PyScript, please let me know!

Explanation:

This project uses word vectors, which are a bunch of numbers that are supposed to represent the meaning of a word.

How does that work in practice? For example, letting k be the vector for the work 'king', m the vector for 'man', w for 'woman', and q for 'queen', we might have:

k - m + w = q.

Also, the Euclidean distance between k and q should be much less than between k and w, and the cosine similarity (see below) between the former should be much greater than between the latter.

I'm not going to go into the details of how these vectors are found here. I'll just say that nowadays, word vectors tend to be derived from the first layer of large language models, and before that there were two popular (related) statistical word vector models, word2vec and GloVe.

I use pre-trained GloVe vectors here. Specifically, I'm using 50-dimensional vectors trained on a 6 billion-word dataset with a vocabulary of ~400,000 (from the official website), which I think is definitely enough for Codenames. Also, since this is a static website, I'm actually loading all the vectors into your browser and running a Python script on them, so that's probably taking at least a couple hundred megabytes of your RAM (sorry! it was a lot worse before). I've tried to mitigate this by deleting the vectors for a lot of weird super-rare words and numbers, leaving a vocabulary of ~73,300.

As described in the GloVe paper (on page 8), I'm calculating similarity scores between two words by taking the cosine similarity between their word vectors (= dot product divided by product of the vectors' magnitudes). But first, I normalize each element of the vector along the entire vocab; that is, if the 31st element of each word's vector is in the range 1–5, then I would multiply the 31st each vector by the same number so that the sum of their squares add up to 1. They would then be roughly in the range 0.001–0.005 (if the vocabulary size is 400,000).

My actual algorithm is relatively straightforward. I take the cosine similarity of each word in the vocabulary with each of the input words. Then I multiply those cosine similarities, and call that the relevance score for that word. I'm multiplying rather than adding them to try to penalize the situation where a word is really close to one of the input words but far from the others, although that still seems to be an issue. The algorithm just finds the 10 words with the highest relevance scores, and presto—Codenames!

In the future, it would not be difficult to incorporate avoiding the opposing team's words and the assassin word, but this program as is struggles with more than two words, so I don't think it would be very useful to implement that. It would also be very interesting to input a set of 8 or 9 words (as in the beginning of the game) and find the partition into sets of 2 or 3 or 4 that maximizes the relatedness in each set, but this would run up against RAM and processing time requirements.

Mainly, I wonder if there's a better way to do this from a website that's deployed on GitHub Pages than call a Python script with numpy. I could probably translate the whole thing to JavaScript and solve the issue of the site freezing while it's processing, but that would still involve loading the vectors into the browser. Web dev isn't my forte, but in my current understanding, I don't have any control over my server, so I can't, like, load things into background processes to request information from, like I could with a Node or Flask application.

In the meantime, enjoy! As I'm writing this, I am realizing that many people on the Internet have had the same idea, but it was still cool to put together something that is personally useful.