Track how often utterances are spoken and sounds played. Use the info to warm the in-memory audio node and file name caches. Maybe store this info server-side too to share across users? If so, need to add more levels of caching other than yes/no.