Language Models trained with LSTM using the Lord of The Rings trilogy corpus.
- Two methods were explored:
-
sparse_categorical_crossentropymodel where the input sentences were vectorized usingTextVectorizationfrom Keras, yielding input shape of (MAX_SEQ_LEN,) whereMAX_SEQ_LENis the maximum sequence of the input sentences (Tx). Each individual sequence in theMAX_SEQ_LENwould correspond to the vectorized integer number for the correspondingN_UNIQUE_CHARSof the corpus (whereN_UNIQUE_CHARSis the unique characters found in the corpus) -
categorical_crossentropymodel or One-Hot Encoding model where the input sentences were vectorized into one-hot encoded arrays, yielding input shape of (MAX_SEQ_LEN,N_UNIQUE_CHARS), whereMAX_SEQ_LENis the maximum sequence of the input sentences (Tx). For each of the individual sequence in theMAX_SEQ_LENthere would be corresponding one-hot-encoded vector of lengthN_UNIQUE_CHARSwhere N_UNIQUE_CHARS` is the unique characters found in the corpus.
-