Why is it not yet picked up by organizations with huge resource to just train a sota (img, video, audio, even text, whatever) Or is it already?