This site uses the word-finding api Datamuse and the python package python-datamuse to retrieve data for English words. D3 Observable is used to generate visualizations illustrating this data. The site was built using django.
Visualizations:
- Word Frequencies: A chart illustrating the frequency of each words in a Word Set. Here, frequency refers to how often a word appears per million words of English text (according to Google Books Ngrams via Datamuse).
- Word Frequencies Scatterplot: For all words in a Word Set, plots the frequency vs. the number of occurrences in the Word Set.
- Related Words: For a given word, displays its related words for one or more word relationship types on a radial dendrogram.
Word Relationships These are the types of relationships that can be used when asking Datamuse for words related to a certain word, identified by the three-letter code used in the rel_[code] Datamuse parameter:
| [code] | Description | Example |
|---|---|---|
| jja | Popular Related Nouns (nouns that are frequently associated with the given word) | gradual → increase |
| jjb | Popular Related Adjectives (adjectives that are frequently associated with the given word) | beach → sandy |
| syn | Synonyms | ocean → sea |
| trg | "Triggers" (words that are statistically associated with the query word in the same piece of text.) | cow → milking |
| ant | Antonyms | late → early |
| spc | Direct Hypernyms (words with a similar, but broader meaning) | gondola → boat |
| gen | Direct Hyponyms (words with a similar, but more specific meaning) | boat → gondola |
| com | "Comprises" (things which this is composed of) | car → accelerator |
| par | "Part of" (things of which this is a part of) | trunk → tree |
| bga | Frequent followers (words that frequently follow this) | wreak → havoc |
| bgb | Frequent predecessors (words that frequently precede this) | havoc → wreak |
| rhy | Rhymes ("perfect" rhymes) | spade → aid |
| nry | Near rhymes (that is, approximate rhymes) | forest → chorus |
| hom | Homophones (sound-alike words) | course → coarse |
| cns | Consonant matches | sample → simple |
Format of Datamuse results:
Queries to Datamuse return json such as the following (the response for http://api.datamuse.com/words?rel_jja=test&md=dpf&max=3):
[
{
"word": "results",
"score": 1001,
"tags": [
"n",
"f:345.463352"
],
"defs": [
"n\tsomething that results",
"n\tthe semantic role of the noun phrase whose referent exists only by virtue of the activity denoted by the verb in the clause",
"n\ta statement that solves a problem or explains how to solve the problem",
"n\ta phenomenon that follows and is caused by some previous phenomenon"
],
"defHeadword": "result"
},
{
"word": "tube",
"score": 1000,
"tags": [
"n",
"f:59.310244"
],
"defs": [
"n\tconduit consisting of a long hollow object (usually cylindrical) used to hold and conduct objects or liquids or gases",
"n\telectronic device consisting of a system of electrodes arranged in an evacuated glass or metal envelope",
"n\t(anatomy) any hollow cylindrical body structure",
"n\ta hollow cylindrical shape",
"n\telectric underground railway"
]
},
{
"word": "scores",
"score": 999,
"tags": [
"n",
"f:43.123432"
],
"defs": [
"n\ta large number or amount"
]
}
]
Here, score is an arbitrary number indicating the popularity of the word in English text, or in queries involving word relationships, the strength of the relationship (higher numbers indicate more popular or relevant words).
The items in "tags" indicate either the parts of speech of the word (such as "n" for noun) or the word's frequency in English text according to Google Books Ngrams ("f:...").
Running the site locally
First you must install Python 3.7 and the database system PostgreSQL (version 11). After installing PostgreSQL, you will need to create a new database. Open pgAdminIII and create a new database with the name "datamuse_words" and user "postgres".
Then clone datamuse_viz:
git clone git@github.com:joncros/DataMuse_Viz.git cd datamuse_viz
branch 'dev'
The version of the site on the branch "dev" is mostly up to date, but is missing specific configuration settings needed for running the site in production. To run this branch, run:
git checkout dev
Then to complete setting up PostgreSQL, open the datamuse-viz file settings.py and look for this section:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'datamuse_words',
'USER': 'postgres',
'PASSWORD': 'wxZPAHz89GSHY',
'HOST': '127.0.0.1',
'PORT': '5432',
}
}
On the line 'PASSWORD': 'wxZPAHz89GSHY', replace 'wxZPAHz89GSHY' with the password you set when installing PostgreSQL on your system. (If you are on linux, PostgreSQL may be pre-installed on your system without a password, in which case you will need to set the password.)
branch 'master'
The version of the site on the master branch is configured to work on the Heroku platform. It also uses redis-queue to offload longer running tasks in a separate process. Redis-queue is not supported in Windows, so to run the master branch you would need to run it from linux (Ubuntu 18 is suggested, since this is the OS that Heroku uses) or from Windows 10 WSL (Windows Subsystem for Linux).
To run the master branch, type
git checkout master
on the command line. Then create a new text file named .env and add the following lines to it:
DATABASE_URL="postgres://postgres:password@localhost/datamuse_words" DEBUG=True LOCAL=True LOGLEVEL=DEBUG SECRET_KEY="secret"
replacing "password" in the first line with the password you set when installing PostgreSQL. (If you are on linux, PostgreSQL may be pre-installed on your system without a password, in which case you will need to set the password.)
Then install Redis:
sudo apt install redis-server
In order to be able to run unit tests, install the packages listed in requirements-test.txt (currently, only fakeredis).
Then, you need to install Heroku CLI.
The following additional setup steps apply to either branch:
The required python packages are listed in requirements-primary.txt. You can install these packages manually using pip; or you can install all of these packages along with their dependencies using requirements.txt. It is recommended to first create a virtual environment using virtualenvwrapper (linux or macOS) or virtualenvwrapper-win (Windows). After installing mkvirtualenv, type the following on the command line:
mkvirtualenv datamuse_viz_env
To install all dependencies at once, type:
pip3 install --requirement requirements.txt
<-- todo add note on requirements-dev.txt dependencies for unit tests -->
To work in this environment from the command line, type
workon datamuse_viz_env
If you use an IDE such as PyCharm, you also will need to set the environment to datamuse_viz_env in the ide. Instructions to do this in PyCharm are found here.
Then run the database migrations that create the database tables. (Substitute python for py when on linux)
py manage.py migrate
To run the unit tests, type
py manage.py test
If you are on the dev branch, to run the server on your local machine, type
py manage.py runserver
and then visit http://127.0.0.1:8000 in your web browser to view the site.
If you are on the master branch, type
heroku local
and then visit http://127.0.0.1:5000 in your web browser to view the site.