I need create a list of words and their frequencies for corpora.
Try This to get me started: cat comment_file.txt | tr " " "\n" | sort | uniq -c | wc -l from http://www.unix.com/shell-programming-and-scripting/45838-how-read-all-unique-words-text-file.html
Here are some links to get me started:
Via Python: