Skip to content

joncros/DataMuse_Viz

Repository files navigation

This site uses the word-finding api Datamuse and the python package python-datamuse to retrieve data for English words. D3 Observable is used to generate visualizations illustrating this data. The site was built using django.

Visualizations:

  • Word Frequencies: A chart illustrating the frequency of each words in a Word Set. Here, frequency refers to how often a word appears per million words of English text (according to Google Books Ngrams via Datamuse).
  • Word Frequencies Scatterplot: For all words in a Word Set, plots the frequency vs. the number of occurrences in the Word Set.
  • Related Words: For a given word, displays its related words for one or more word relationship types on a radial dendrogram.

Word Relationships These are the types of relationships that can be used when asking Datamuse for words related to a certain word, identified by the three-letter code used in the rel_[code] Datamuse parameter:

[code] Description Example
jja Popular Related Nouns (nouns that are frequently associated with the given word) gradual → increase
jjb Popular Related Adjectives (adjectives that are frequently associated with the given word) beach → sandy
syn Synonyms ocean → sea
trg "Triggers" (words that are statistically associated with the query word in the same piece of text.) cow → milking
ant Antonyms late → early
spc Direct Hypernyms (words with a similar, but broader meaning) gondola → boat
gen Direct Hyponyms (words with a similar, but more specific meaning) boat → gondola
com "Comprises" (things which this is composed of) car → accelerator
par "Part of" (things of which this is a part of) trunk → tree
bga Frequent followers (words that frequently follow this) wreak → havoc
bgb Frequent predecessors (words that frequently precede this) havoc → wreak
rhy Rhymes ("perfect" rhymes) spade → aid
nry Near rhymes (that is, approximate rhymes) forest → chorus
hom Homophones (sound-alike words) course → coarse
cns Consonant matches sample → simple

Format of Datamuse results:

Queries to Datamuse return json such as the following (the response for http://api.datamuse.com/words?rel_jja=test&md=dpf&max=3):

 [
  {
    "word": "results",
    "score": 1001,
    "tags": [
      "n",
      "f:345.463352"
    ],
    "defs": [
      "n\tsomething that results",
      "n\tthe semantic role of the noun phrase whose referent exists only by virtue of the activity denoted by the verb in the clause",
      "n\ta statement that solves a problem or explains how to solve the problem",
      "n\ta phenomenon that follows and is caused by some previous phenomenon"
    ],
    "defHeadword": "result"
  },
  {
    "word": "tube",
    "score": 1000,
    "tags": [
      "n",
      "f:59.310244"
    ],
    "defs": [
      "n\tconduit consisting of a long hollow object (usually cylindrical) used to hold and conduct objects or liquids or gases",
      "n\telectronic device consisting of a system of electrodes arranged in an evacuated glass or metal envelope",
      "n\t(anatomy) any hollow cylindrical body structure",
      "n\ta hollow cylindrical shape",
      "n\telectric underground railway"
    ]
  },
  {
    "word": "scores",
    "score": 999,
    "tags": [
      "n",
      "f:43.123432"
    ],
    "defs": [
      "n\ta large number or amount"
    ]
  }
]

Here, score is an arbitrary number indicating the popularity of the word in English text, or in queries involving word relationships, the strength of the relationship (higher numbers indicate more popular or relevant words).

The items in "tags" indicate either the parts of speech of the word (such as "n" for noun) or the word's frequency in English text according to Google Books Ngrams ("f:...").

Running the site locally

First you must install Python 3.7 and the database system PostgreSQL (version 11). After installing PostgreSQL, you will need to create a new database. Open pgAdminIII and create a new database with the name "datamuse_words" and user "postgres".

Then clone datamuse_viz:

git clone git@github.com:joncros/DataMuse_Viz.git
cd datamuse_viz

branch 'dev'

The version of the site on the branch "dev" is mostly up to date, but is missing specific configuration settings needed for running the site in production. To run this branch, run:

git checkout dev

Then to complete setting up PostgreSQL, open the datamuse-viz file settings.py and look for this section:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'datamuse_words',
        'USER': 'postgres',
        'PASSWORD': 'wxZPAHz89GSHY',
        'HOST': '127.0.0.1',
        'PORT': '5432',
    }
}

On the line 'PASSWORD': 'wxZPAHz89GSHY', replace 'wxZPAHz89GSHY' with the password you set when installing PostgreSQL on your system. (If you are on linux, PostgreSQL may be pre-installed on your system without a password, in which case you will need to set the password.)

branch 'master'

The version of the site on the master branch is configured to work on the Heroku platform. It also uses redis-queue to offload longer running tasks in a separate process. Redis-queue is not supported in Windows, so to run the master branch you would need to run it from linux (Ubuntu 18 is suggested, since this is the OS that Heroku uses) or from Windows 10 WSL (Windows Subsystem for Linux).

To run the master branch, type

git checkout master

on the command line. Then create a new text file named .env and add the following lines to it:

DATABASE_URL="postgres://postgres:password@localhost/datamuse_words"
DEBUG=True
LOCAL=True
LOGLEVEL=DEBUG
SECRET_KEY="secret"

replacing "password" in the first line with the password you set when installing PostgreSQL. (If you are on linux, PostgreSQL may be pre-installed on your system without a password, in which case you will need to set the password.)

Then install Redis:

sudo apt install redis-server

In order to be able to run unit tests, install the packages listed in requirements-test.txt (currently, only fakeredis).

Then, you need to install Heroku CLI.

The following additional setup steps apply to either branch:

The required python packages are listed in requirements-primary.txt. You can install these packages manually using pip; or you can install all of these packages along with their dependencies using requirements.txt. It is recommended to first create a virtual environment using virtualenvwrapper (linux or macOS) or virtualenvwrapper-win (Windows). After installing mkvirtualenv, type the following on the command line:

mkvirtualenv datamuse_viz_env

To install all dependencies at once, type:

pip3 install --requirement requirements.txt 

<-- todo add note on requirements-dev.txt dependencies for unit tests -->

To work in this environment from the command line, type

workon datamuse_viz_env

If you use an IDE such as PyCharm, you also will need to set the environment to datamuse_viz_env in the ide. Instructions to do this in PyCharm are found here.

Then run the database migrations that create the database tables. (Substitute python for py when on linux)

py manage.py migrate

To run the unit tests, type

py manage.py test

If you are on the dev branch, to run the server on your local machine, type

py manage.py runserver

and then visit http://127.0.0.1:8000 in your web browser to view the site.

If you are on the master branch, type

heroku local

and then visit http://127.0.0.1:5000 in your web browser to view the site.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published