When we are understanding videos, the coordinates of critical objects change. How do we express them? the first frame of the critical object?