Apart from text we will also need their spoken (audio) correspondence. It will be easy to adapt the model to include a few locations for the demo but for a systematic training in order to include many locations we will have to find a way to obtain these recordings.