DocumentClassifier

Classifies text documents using Apache Lucene 4.10

Trains a set of documents and then tests it on another set. Has the ability to train in 3 different ways (LC , Lucene Standard Analyzer and ngrams analyzer) Indexes the files specified in the path and then trains them. Also has the ability to test the accuracy using a confusion matrix.

Clone the repository

Open it in any preferable IDE.
Change the path to training and test set in DocumentClassifier.java
Specify the type of Analyzer in DocumentClassifier method(default set to std)
Build and Run

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
build/classes		build/classes
nbproject		nbproject
src/documentclassifier		src/documentclassifier
README.md		README.md
build.xml		build.xml
manifest.mf		manifest.mf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocumentClassifier

About

Uh oh!

Releases

Packages

Languages

wimpykid26/DocumentClassifier

Folders and files

Latest commit

History

Repository files navigation

DocumentClassifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages