-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
A quick solution to this issue from the Java implementation would be downloading this code file (from the original C tool) and compiling, and executing it from Clojure. This marks the multi-word phrases with an underscore in between in the training text corpus. (Refer 'From words to phrases and beyond' from here)
Below is the code to run the executable in /resources in the project directory using Java Runtime instance and alternatively, by shelling out in Clojure. Here, the input is placed in /resources/train.txt, the output may be found at /resources/output/out.txt and the other parameters to the word2phrase training take default values.
(import '(java.lang Runtime Process))
(import '(java.io BufferedReader InputStreamReader))
(use '[clojure.java.shell :only [sh]])
(let [tmp (-> (System/getProperty "user.dir")
(.replace "\\" "/")) ;File path modified for Unix.
;Windows accepts both style file paths.
res (str tmp "/resources/")]
(comment
(let [proc (.(Runtime/getRuntime) exec (str res "word2phrase.exe
-train " res "train.txt
-output " res "output/out.txt"))
br (BufferedReader. (InputStreamReader. (.getInputStream proc)))]
(println (clojure.string/join "\n" (line-seq br)))
(.close br)))
(println (:out (sh (str res "word2phrase.exe")
"-train" (str res "train.txt")
"-output" (str res "output/out.txt"))))
(System/exit 0))
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels