Recurrent neural network is used to solve Named Entity Recognition (NER) problem (recognizing named entities from Twitter). jupynet notebook + python codes can be found under "codes" folder. NER is a common task in natural language processing systems. It serves for extraction such entities from the text as persons, organizations, locations, etc.
For example, we want to extract persons' and organizations' names from the text. Than for the input text:
Ian Goodfellow works for Google Brain
a NER model needs to provide the following sequence of tags:
B-PER I-PER O O B-ORG I-ORG
Where B- and I- prefixes stand for the beginning and inside of the entity, while O stands for out of tag or no tag. Markup with the prefix scheme is called BIO markup. This markup is introduced for distinguishing of consequent entities with similar types.
A solution of the task will be based on neural networks, particularly, on Bi-Directional Long Short-Term Memory Networks (Bi-LSTMs).
Submitted repository contains all dependensies. All you need to install and run the code. You may read information regarding related libraries whic is given below.
For this task you will need the following libraries:
- Tensorflow — an open-source software library for Machine Intelligence.
- Numpy — a package for scientific computing.
If you have never worked with Tensorflow, you would probably need to read some tutorials during your work on this assignment, e.g. this one could be a good starting point. The resources which were used to increase the speed for processing were given in the following paragraphs.
Google has released its own flavour of Jupyter called Colab, which has free GPUs!
Here's how you can use it:
- Open https://colab.research.google.com, click Sign in in the upper right corner, use your Google credentials to sign in.
- Click GITHUB tab, paste https://github.com/hse-aml/natural-language-processing and press Enter
- Choose the notebook you want to open, e.g. week1/week1-MultilabelClassification.ipynb
- Click File -> Save a copy in Drive... to save your progress in Google Drive
- If you need a GPU, click Runtime -> Change runtime type and select GPU in Hardware accelerator box
- Execute the following code in the first cell that downloads dependencies (change for your week number):
! wget https://raw.githubusercontent.com/hse-aml/natural-language-processing/master/setup_google_colab.py -O setup_google_colab.py
import setup_google_colab
# please, uncomment the week you're working on
# setup_google_colab.setup_week1()
# setup_google_colab.setup_week2()
# setup_google_colab.setup_week3()
# setup_google_colab.setup_week4()
# setup_google_colab.setup_project()
# setup_google_colab.setup_honor()- If you run many notebooks on Colab, they can continue to eat up memory,
you can kill them with
! pkill -9 python3and check with! nvidia-smithat GPU memory is freed.
Known issues:
- No support for
ipywidgets, so we cannot use fancytqdmprogress bars. For now, we use a simplified version of a progress bar suitable for Colab. - Blinking animation with
IPython.display.clear_output(). It's usable, but still looking for a workaround.