Text extraction: PDF - https://pdfbox.apache.org/ ODT, DOCX, DOC - https://tika.apache.org/
Text extraction:
PDF
ODT, DOCX, DOC