This project provides (a) a minimal Lucene index builder and (b) a minimal SQLite database builder for Medline (abstract collection for PubMed articles).
The Lucene index has three fields: pmid, abstractText, and articleTitle with the latter two searchable, and the SQLite database has two fields: pmid and abstract with the an index built on pmid.
This project is used in the preparation of the OAQA BioASQ System.
Two use cases have been configured as exec:exec goals in the pom.xml file.
Build Lucene index
mvn -Ddocs.dir=DOCS_DIR -Dindex.dir=INDEX_DIR exec:exec@index
where DOCS_DIR is the input directory that contains the downloaded .xml.gz or .xml files and INDEX_DIR is the output directory for the Lucene index.
Build SQLite database
mvn -Ddocs.dir=DOCS_DIR -Ddb.path=DB_PATH exec:exec@store
where DOCS_DIR is the input directory that contains the downloaded .xml.gz or .xml files and DB_PATH is the output SQLite database file path.