Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 35 additions & 24 deletions docs/core_concepts/data_curation/overview.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Data Curation

> **Authors:** [Jingyi Jin](https://www.linkedin.com/in/jingyi-jin) • [Alice Luo](https://www.linkedin.com/in/aliceluoqian)
> **Organization:** NVIDIA
> **Authors:** [Jingyi Jin](https://www.linkedin.com/in/jingyi-jin) • [Alice Luo](https://www.linkedin.com/in/aliceluoqian) > **Organization:** NVIDIA

## Overview

Expand Down Expand Up @@ -30,12 +29,13 @@ Data curation is a complex, multi-stage process. As shown below, it systematical

![Comprehensive Data Curation Pipeline](images/data_curation_pipeline.png)

The **Cosmos video curation pipeline**—first established in *Cosmos-Predict1* and later scaled in *Cosmos-Predict2.5*—consists of seven stages:
The **Cosmos video curation pipeline**—first established in _Cosmos-Predict1_ and later scaled in _Cosmos-Predict2.5_—consists of seven stages:

1. **Shot-Aware Video Splitting** – Long-form videos are segmented into coherent clips using shot boundary detection. Short (<5 s) clips are discarded, while longer ones (5–60 s) form the basis for downstream curation.
2. **GPU-Based Transcoding** – Each clip is transcoded in parallel to optimize format, frame rate, and compression quality for model ingestion.
3. **Video Cropping** – Black borders, letterboxing, and spatial padding are removed to ensure consistent aspect ratios.
4. **Filtering** – A multi-stage filtering pipeline removes unsuitable data. Filters include:

- **Aesthetic Quality Filter** – Screens for poor composition or lighting.
- **Motion Filter** – Removes clips with excessive or insufficient movement.
- **OCR Filter** – Detects overlays, watermarks, or subtitles.
Expand All @@ -56,13 +56,13 @@ The result is a dataset that is **clean, diverse, and semantically organized**

## From Pre-Training to Post-Training

Although *Cosmos-Predict2.5* operates at petabyte scale, its principles directly inform post-training data practices:
Although _Cosmos-Predict2.5_ operates at petabyte scale, its principles directly inform post-training data practices:

- **Scale down, specialize up:** Post-training uses smaller but more domain-specific datasets.
- **Refine rather than expand:** Instead of collecting more data, focus on *improving alignment* and *removing noise*.
- **Refine rather than expand:** Instead of collecting more data, focus on _improving alignment_ and _removing noise_.
- **Iterate via feedback loops:** Use model evaluation results to guide the next round of curation—closing the loop between data and learning outcomes.

In other words, post-training data curation inherits the *structure* of pre-training pipelines but applies it to **targeted, feedback-driven refinement**.
In other words, post-training data curation inherits the _structure_ of pre-training pipelines but applies it to **targeted, feedback-driven refinement**.

---

Expand All @@ -75,33 +75,44 @@ Data sourcing involves acquiring datasets from diverse locations—internal stor

### Cloud Storage Tools

| Tool | Purpose | Best For |
|------|----------|----------|
| **s5cmd** | High-performance S3-compatible storage client | Large-scale parallel transfers |
| **AWS CLI** | Official AWS command-line tool | AWS-native workflows |
| **rclone** | Multi-cloud sync for 70+ providers | Complex multi-cloud setups |
| Tool | Purpose | Best For |
| ----------- | --------------------------------------------- | ------------------------------ |
| **s5cmd** | High-performance S3-compatible storage client | Large-scale parallel transfers |
| **AWS CLI** | Official AWS command-line tool | AWS-native workflows |
| **rclone** | Multi-cloud sync for 70+ providers | Complex multi-cloud setups |

### Web Content Tools

| Tool | Purpose | Best For |
|------|----------|----------|
| **HuggingFace CLI** | Access to model/dataset repositories | Community datasets and checkpoints |
| **yt-dlp** | High-throughput video downloader | Batch ingestion and quality selection |
| **wget/curl** | General-purpose file downloaders | API retrieval and recursive crawling |
| Tool | Purpose | Best For |
| ------------------- | ------------------------------------ | ------------------------------------- |
| **HuggingFace CLI** | Access to model/dataset repositories | Community datasets and checkpoints |
| **yt-dlp** | High-throughput video downloader | Batch ingestion and quality selection |
| **wget/curl** | General-purpose file downloaders | API retrieval and recursive crawling |

### Physical AI Datasets

For Physical AI developers working with Cosmos models, NVIDIA provides open, curated, and commercial-grade datasets for Physical AI development in **[NVIDIA Physical AI Collection](https://huggingface.co/collections/nvidia/physical-ai)** on Hugging Face, including:

- Autonomous vehicle datasets (driving scenes, synthetic data, teleoperation)
- Robotics datasets (GR00T, manipulation, grasping, navigation)
- Smart spaces and warehouse datasets
- Domain-specific training and evaluation datasets

These datasets are designed to work seamlessly with Cosmos models and can serve as starting points for domain-specific post-training workflows.

### Data Processing Tools

| Tool | Purpose | Best For |
|------|----------|----------|
| **ffmpeg** | Video transcoding and frame extraction | Reformatting and quality control |
| **PIL/Pillow** | Python imaging library | Lightweight image manipulation |
| Tool | Purpose | Best For |
| -------------- | -------------------------------------- | -------------------------------- |
| **ffmpeg** | Video transcoding and frame extraction | Reformatting and quality control |
| **PIL/Pillow** | Python imaging library | Lightweight image manipulation |

### Quality Control Tools

| Tool | Purpose | Best For |
|------|----------|----------|
| **OpenCV** | Computer vision toolkit | Visual inspection and analysis |
| **FFprobe** | Metadata extraction | Duration, codec, and resolution stats |
| Tool | Purpose | Best For |
| ----------- | ----------------------- | ------------------------------------- |
| **OpenCV** | Computer vision toolkit | Visual inspection and analysis |
| **FFprobe** | Metadata extraction | Duration, codec, and resolution stats |

---

Expand Down
12 changes: 12 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,17 @@ The Cosmos platform provides the following capabilities:

**Post-training:** Cosmos WFMs are fully customizable to develop downstream vision, robotics or autonomous vehicle foundation models tailored for customer data. Post-training can be done to change output type, output quantity, output quality, output style or output point of view.

### Where can I find datasets for training Physical AI models?

NVIDIA provides curated, open, commercial-grade datasets for Physical AI development on the [NVIDIA Physical AI Collection](https://huggingface.co/collections/nvidia/physical-ai) on Hugging Face. This collection includes datasets for:

- **Autonomous vehicles**: Driving scenes, synthetic data, and teleoperation datasets
- **Robotics**: GR00T, manipulation, grasping, and navigation datasets
- **Smart spaces and warehouses**: Multi-camera tracking, detection, and spatial intelligence datasets
- **Domain-specific training and evaluation**: Specialized datasets for various Physical AI applications

These datasets are designed to work seamlessly with Cosmos models and can serve as starting points for domain-specific post-training workflows.

### How do Cosmos models differ from other video foundation models?

Cosmos world foundation models are designed specifically for physical AI applications. The models are openly available and customizable, with Cosmos Predict and Cosmos Reason supporting post-training for autonomous vehicle, robotics, and vision-action generation models.
Expand Down Expand Up @@ -279,6 +290,7 @@ Existing NVIDIA Omniverse Enterprise (NVOVE) licenses can be used for Cosmos ent
- **Documentation**: Comprehensive guides in each repository
- **Examples**: Reference implementations and tutorials
- **Community Forums**: Engage with other developers
- **Physical AI Datasets**: Access curated datasets for autonomous vehicles, robotics, smart spaces, and warehouse environments on the [NVIDIA Physical AI Collection](https://huggingface.co/collections/nvidia/physical-ai) on Hugging Face

#### Official Channels

Expand Down
21 changes: 12 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ The Cosmos Cookbook is an open-source resource where NVIDIA and the broader Phys

We welcome contributions—from new examples and workflow improvements to bug fixes and documentation updates. Together, we can evolve best practices and accelerate the adoption of Cosmos models across domains.

**📊 Physical AI Datasets:** Access curated datasets for autonomous vehicles, intelligent transportation systems, robotics, smart spaces, and warehouse environments on the [NVIDIA Physical AI Collection](https://huggingface.co/collections/nvidia/physical-ai) on Hugging Face.

## Case Study Recipes

The cookbook includes comprehensive use cases demonstrating real-world applications across the Cosmos platform.
Expand Down Expand Up @@ -60,18 +62,18 @@ The cookbook includes comprehensive use cases demonstrating real-world applicati

#### Vision-language reasoning and quality control

| **Workflow** | **Description** | **Link** |
|--------------|-----------------|----------|
| **Training** | Physical plausibility check for video quality assessment | [Video Rewards](recipes/post_training/reason1/physical-plausibility-check/post_training.md) |
| **Training** | Spatial AI understanding for warehouse environments | [Spatial AI Warehouse](recipes/post_training/reason1/spatial-ai-warehouse/post_training.md) |
| **Training** | Intelligent transportation scene understanding and analysis | [Intelligent Transportation](recipes/post_training/reason1/intelligent-transportation/post_training.md) |
| **Training** | AV video captioning and visual question answering for autonomous vehicles | [AV Video Caption VQA](recipes/post_training/reason1/av_video_caption_vqa/post_training.md) |
| **Training** | Temporal localization for MimicGen robot learning data generation | [Temporal Localization](recipes/post_training/reason1/temporal_localization/post_training.md) |
| **Workflow** | **Description** | **Link** |
| ------------ | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| **Training** | Physical plausibility check for video quality assessment | [Video Rewards](recipes/post_training/reason1/physical-plausibility-check/post_training.md) |
| **Training** | Spatial AI understanding for warehouse environments | [Spatial AI Warehouse](recipes/post_training/reason1/spatial-ai-warehouse/post_training.md) |
| **Training** | Intelligent transportation scene understanding and analysis | [Intelligent Transportation](recipes/post_training/reason1/intelligent-transportation/post_training.md) |
| **Training** | AV video captioning and visual question answering for autonomous vehicles | [AV Video Caption VQA](recipes/post_training/reason1/av_video_caption_vqa/post_training.md) |
| **Training** | Temporal localization for MimicGen robot learning data generation | [Temporal Localization](recipes/post_training/reason1/temporal_localization/post_training.md) |

### **Cosmos Curator**

| **Workflow** | **Description** | **Link** |
|--------------|-----------------|----------|
| **Workflow** | **Description** | **Link** |
| ------------ | ---------------------------------------------------- | ------------------------------------------------------------------------------- |
| **Curation** | Curate video data for Cosmos Predict 2 post-training | [Predict 2 Data Curation](recipes/data_curation/predict2_data/data_curation.md) |

### **End-to-End Workflows**
Expand Down Expand Up @@ -124,6 +126,7 @@ Visual examples of Cosmos Transfer results across Physical AI domains:
This cookbook provides flexible entry points for both **inference** and **training** workflows. Each section contains runnable scripts, technical recipes, and complete examples.

- **Inference workflows:** [Getting Started](getting_started/setup.md) for setup and immediate model deployment
- **Physical AI datasets:** [NVIDIA Physical AI Collection](https://huggingface.co/collections/nvidia/physical-ai) on Hugging Face for curated datasets across domains
- **Data processing:** [Data Processing & Analysis](core_concepts/data_curation/overview.md) for content analysis workflows
- **Training workflows:** [Model Training & Fine-tuning](core_concepts/post_training/overview.md) for domain adaptation
- **Case study recipes:** [Case Study Recipes](#case-study-recipes) organized by application area
Loading