This repository contains the code for my master thesis at the University of Stavanger in June 2024. It includes a conda environment file (environment.yml), this README file, and three folders: indexers, experiments, and demo.
-
indexers
Li_indexer_eng.pyLi_indexer_fas.pyLi_indexer_rus.pyLi_indexer_zho.py
-
experiments
Li_CLIR_Fas_1000_Retrieve_Translate_Rank.ipynbLi_CLIR_Zho_1000_Retrieve_Translate_Rank.ipynbmlir_experiment.pyLi_CLIR_Rus_1000_Retrieve_Translate_Rank.ipynbLi_get_dict_docid_lang.ipynb
-
demo
clirnews.py
To run the code, you need to have PyLucene installed. Here are the instructions on how to install PyLucene from the official website:
PyLucene is completely code-generated by JCC whose sources are included with the PyLucene sources.
To build PyLucene, a Java Development Kit (JDK) is required; use of the resulting PyLucene binaries requires only a Java Runtime Environment (JRE). A recent C/C++ compiler is also required.
- Starting with release 9.x, Lucene requires Java 11 or above.
- Starting with release 6.x, Lucene requires Java 1.8.
On macOS and Linux, the Temurin JDK is recommended. See "Notes for Linux" on the PyLucene install page for installation instructions on Linux Debian 11.
On any system, if you're upgrading your Java installation, please rebuild JCC as well. You must use the same version of Java for both JCC and PyLucene.
A modern version of setuptools is required for building JCC in shared mode. See JCC's installation instructions for more information.
pushd jcc
# Edit setup.py to match your environment
python setup.py build
sudo python setup.py install
popd
# Edit Makefile to match your environment
make
make test # (look for failures)
sudo make installFor more detailed instructions, please refer to the official documentation.
- Clone the repository:
git clone https://github.com/linool/clirnews.git
- Navigate to the repository directory:
cd clirnews - Create and activate the conda environment:
conda env create -f environment.yml conda activate clir
- Run the desired Jupyter notebooks or Python scripts from the respective folders (
indexers,experiments, ordemo).
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.