Thanks for the interesting work!
While going through the implementations, I found the hidden_size parameter to be a bit confusing. What exactly is the role of the hidden_size parameter within the standalone FFM implementations? It does not seem to be used anywhere.