Word2Vec Homemade implementation

This repository is a from-scratch implementation of Word2Vec (Skip-Gram with naïve softmax).

The goal is not performance, but understanding.

Instead of directly using high-level libraries such as gensim, this project rebuilds Word2Vec step by step in order to expose:

how text is transformed into training data,
how the probabilistic model ( P(o \mid c) ) is defined,
what is really optimized,
how gradients modify the embedding matrices,
and why semantic structure emerges from co-occurrence.

Everything is written with an explicit math → code correspondence, so that every equation in the README has a concrete implementation in the code.

0) Implementation (Skip-Gram objective)

Let the corpus be a sequence of words:

$$ (w_1, w_2, \dots, w_T) $$

Let ( m ) be the context window radius.

Skip-Gram generates training pairs:

$$ (w_t, w_{t+j}) \quad \text{with} \quad -m \le j \le m,; j \ne 0 $$

The training set is:

$$ \mathcal{D} = {(w_t, w_{t+j}) \mid 1 \le t \le T,; -m \le j \le m,; j \ne 0} $$

Model parameters

The parameters of the model are:

$$ \theta = (\mathbf U, \mathbf V) $$

where:

$$ \mathbf V \in \mathbb{R}^{|\mathcal{V}| \times D}, \qquad \mathbf U \in \mathbb{R}^{|\mathcal{V}| \times D} $$

Each word ( w \in \mathcal{V} ) has:

$$ v_w \in \mathbb{R}^D, \qquad u_w \in \mathbb{R}^D $$

Probability model

$$ P(w_{t+j} \mid w_t; \theta) = \frac{\exp(u_{w_{t+j}}^\top v_{w_t})} {\sum_{w \in \mathcal{V}} \exp(u_w^\top v_{w_t})} $$

Likelihood

$$ L(\theta) = \prod_{t=1}^{T} \prod_{\substack{-m \le j \le m \ j \ne 0}} P(w_{t+j} \mid w_t; \theta) $$

Objective function

$$ J(\theta) = - \sum_{t=1}^{T} \sum_{\substack{-m \le j \le m \ j \ne 0}} \log P(w_{t+j} \mid w_t; \theta) $$

1) Data and preprocessing (text → tokens)

A small toy corpus is used:

"king is a man"
"queen is a woman"
"boy is a man"
"girl is a woman"
"paris is france"
"rome is italy"
"france is europe"
"italy is europe"

After preprocessing, each sentence becomes a sequence of tokens.

Example:

"king is a man" → ["king", "man"]

2) Vocabulary

Let the set of unique words be ( \mathcal{V}{=tex} ).

We define a bijection:

$$ \text{id}:\mathcal{V}\rightarrow {0,1,\dots,V-1} $$

Implemented as:

wordi[word] = id(word)
iindex[id] = word

3) Embedding matrices

Two vectors per word:

Center embedding: $$v_w \in \mathbb{R}^D$$
Context embedding: $$u_w \in \mathbb{R}^D$$

Stored as:

$$ \mathbf{V} \in \mathbb{R}^{V \times D}, \quad \mathbf{U} \in \mathbb{R}^{V \times D} $$

4) Skip-Gram pairs

Given a window ( m ):

$$ \mathcal{D} = {(w_t, w_{t+j}) \mid -m\le j\le m, j\ne 0} $$

Each pair ( (c,o) ) is one training example.

5) Model

Scores:

$$ s(w;c) = u_w^\top v_c $$

Probabilities:

$$ P(w\mid c) = \frac{\exp(u_w^\top v_c)}{\sum_{w'} \exp(u_{w'}^\top v_c)} $$

In code:

logits = v @ self.UM.T
loss = F.cross_entropy(logits, o)

6) Optimization

Update rule:

$$ \theta \leftarrow \theta - \alpha \nabla_\theta \ell(c,o) $$

In code:

opt.zero_grad()
loss.backward()
opt.step()

This project is a minimal, math-first Word2Vec implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checker.py		checker.py
requirements.txt		requirements.txt
test_torch.py		test_torch.py
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word2Vec Homemade implementation

0) Implementation (Skip-Gram objective)

Model parameters

Probability model

Likelihood

Objective function

1) Data and preprocessing (text → tokens)

2) Vocabulary

3) Embedding matrices

4) Skip-Gram pairs

5) Model

6) Optimization

About

Uh oh!

Releases

Packages

Languages

License

MeidiLprog/word2vec

Folders and files

Latest commit

History

Repository files navigation

Word2Vec Homemade implementation

0) Implementation (Skip-Gram objective)

Model parameters

Probability model

Likelihood

Objective function

1) Data and preprocessing (text → tokens)

2) Vocabulary

3) Embedding matrices

4) Skip-Gram pairs

5) Model

6) Optimization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages