This task involves generating detailed descriptions for high-quality videos across various categories. The primary focus is on capturing spatial relationships, object positioning, and scene dynamics, particularly from a first-person perspective.
DeepAI-Research/Spatial-Scene-Dataset
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|