Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 215 additions & 0 deletions contrib/models/Isaac-0.2-2B/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Contrib Model: PerceptronAI Isaac-0.2-2B-Preview VLM

NeuronX Distributed Inference implementation for the PerceptronAI Isaac-0.2-2B-Preview Vision-Language Model. Isaac combines a Qwen3 text backbone with a SigLIP2 vision encoder and 2-layer MLP projector with pixel shuffle.

## Model Information

- **HuggingFace ID:** [`PerceptronAI/Isaac-0.2-2B-Preview`](https://huggingface.co/PerceptronAI/Isaac-0.2-2B-Preview)
- **Model Type:** VLM with SigLIP2 vision encoder, pixel shuffle, MLP projector, and Qwen3 text decoder
- **License:** CC-BY-NC-4.0 (non-commercial)
- **Requires:** `trust_remote_code=True`

## Architecture Details

### Text Backbone (Qwen3)

| Spec | Isaac 2B |
|---|---:|
| **Layers** | 28 |
| **Hidden Size** | 2048 |
| **Head Dim** | 128 |
| **Attention Heads** | 16 |
| **KV Heads** | 8 |
| **Intermediate Size** | 6144 |
| **Vocabulary Size** | 151,936 |
| **Max Position Embeddings** | 40,960 |
| **Position Encoding** | RoPE (mRoPE-capable) |
| **Normalization** | RMSNorm |
| **Activation** | SiLU |
| **Total Parameters** | 2.57B |

### SigLIP2 Vision Encoder

| Spec | Value |
|---|---:|
| **Layers** | 27 |
| **Hidden Size** | 1152 |
| **Head Dim** | 72 |
| **Attention Heads** | 16 |
| **KV Heads** | 16 |
| **Intermediate Size** | 4304 |
| **Activation** | GELU (approximate) |
| **Image Size** | 256×256 |
| **Patch Size** | 16 |
| **Pixel Shuffle Scale** | 2 |
| **Vision Tokens per Image** | 64 |

### MLP Projector

| Spec | Value |
|---|---:|
| **Layer 1** | Linear(4608 → 18432, no bias) + SiLU |
| **Layer 2** | Linear(18432 → 2048, no bias) |
| **Parameters** | ~122M |

## Validation Results

**Validated:** 2026-04-30
**Configuration:** trn2.3xlarge, TP=1, batch_size=1, seq_len=1024, bfloat16

### Accuracy

| Test | Status | Result |
|------|--------|--------|
| Text logit cosine (5 prompts) | PASS | avg 0.99998 vs CPU ref |
| Top-1 token match | PASS | 100% match (8/8 prompts) |
| Image+text generation | PASS | Coherent descriptions |
| TP=2 accuracy | PASS | cosine 0.99997 |
| TP=4 accuracy | PASS | cosine 0.99997 |

### Performance (trn2.3xlarge, TP=1, BS=1)

| Metric | seq_len=1024 | seq_len=4096 |
|--------|-------------|-------------|
| **TKG Throughput** | 110.7 tok/s | 94.0 tok/s |
| **TPOT** | 9.0 ms | 10.6 ms |
| **TTFT** | 9.0 ms | 10.6 ms |
| **Image+text tok/s** | 108.7 tok/s | 93.1 tok/s |
| **Projected DP=4** | ~443 tok/s | ~376 tok/s |

**Compilation time:** ~196s (one-time, seq_len=1024)

### GPU Comparison (L40S, vLLM 0.20.0, CUDA graphs enabled)

| Metric | L40S GPU | trn2 Neuron (TP=1) | trn2 Neuron (DP=4) |
|--------|----------|---------------------|---------------------|
| **TPOT (short input)** | 5.75 ms | 9.0 ms | — |
| **Throughput (short input)** | 174 tok/s | 111 tok/s | ~443 tok/s |
| **TPOT (long input)** | 6.09 ms | 9.0 ms | — |
| **Throughput (long input)** | 164 tok/s | 111 tok/s | ~443 tok/s |

- **Per-core:** L40S is ~1.5x faster than a single NeuronCore
- **Per-device (DP=4):** trn2.3xlarge is ~2.5x faster than L40S
- GPU benchmark: L40S with vLLM 0.20.0, batch_size=1, CUDA graphs enabled (default)
- Neuron benchmark: trn2.3xlarge, TP=1, batch_size=1, bfloat16, CTE flash attention

## Usage

```python
import torch
from transformers import AutoConfig, AutoTokenizer
from neuronx_distributed_inference.models.config import NeuronConfig, OnDeviceSamplingConfig
from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config

from isaac_neuron.modeling_isaac import (
NeuronIsaacForConditionalGeneration,
IsaacInferenceConfig,
)

model_path = "/path/to/Isaac-0.2-2B-Preview"
compiled_path = "/path/to/compiled/model"

# Configure
text_config = NeuronConfig(
batch_size=1,
seq_len=1024,
torch_dtype=torch.bfloat16,
tp_degree=1,
is_continuous_batching=True,
ctx_batch_size=1,
enable_bucketing=True,
context_encoding_buckets=[1024],
token_generation_buckets=[1024],
on_device_sampling_config=OnDeviceSamplingConfig(
dynamic=True, do_sample=True, deterministic=True,
top_k=1, global_topk=256, top_k_kernel_enabled=True,
),
attn_kernel_enabled=True, # CTE flash attention
fused_qkv=False,
mlp_kernel_enabled=False,
)

vision_config = NeuronConfig(
batch_size=1, seq_len=1024, torch_dtype=torch.bfloat16,
tp_degree=1, is_continuous_batching=True, ctx_batch_size=1,
enable_bucketing=True, buckets=[1],
)

hf_config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
config = IsaacInferenceConfig(
text_neuron_config=text_config,
vision_neuron_config=vision_config,
load_config=load_pretrained_config(hf_config=hf_config),
)
config.image_token_index = 151655 # <|image_pad|>

# Compile and load
model = NeuronIsaacForConditionalGeneration(model_path, config)
model.compile(compiled_path, debug=False)
model.load(compiled_path)

# Generate (see integration tests for full examples)
```

## Compatibility Matrix

| Instance/Version | SDK 2.29 | SDK 2.28 and earlier |
|------------------|----------|----------------------|
| trn2.3xlarge (TP=1) | Tested | Not tested |
| trn2.3xlarge (TP=2) | Tested | Not tested |
| trn2.3xlarge (TP=4) | Tested | Not tested |
| trn1 | Not tested | Not tested |
| inf2 | Not tested | Not tested |

## Known Limitations

- **Batch size:** Only BS=1 supported (NxDI VLM framework limitation, shared with all VLM contribs)
- **MLP NKI kernel:** Not compatible at TP=1 (intermediate=6144 exceeds SBUF capacity). Use default kernels.
- **QKV NKI kernel:** Not compatible (Q/K layernorm incompatible with fused QKV kernel)
- **Image size:** Fixed at 256×256 (64 vision tokens per image)
- **License:** CC-BY-NC-4.0 — non-commercial use only

## Testing

Run integration tests:

```bash
# Set up environment
source /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate
export PYTHONPATH=/path/to/neuronx-distributed-inference/contrib/models/Isaac-0.2-2B/src:$PYTHONPATH

# Run validation
cd contrib/models/Isaac-0.2-2B
python test/integration/run_isaac.py
```

## Module Structure

```
contrib/models/Isaac-0.2-2B/
├── README.md
├── src/
│ └── isaac_neuron/
│ ├── __init__.py
│ ├── modeling_isaac.py # VLM orchestrator + config + state dict mapping
│ ├── modeling_isaac_text.py # Text model (NeuronBaseModel + Qwen3 layers)
│ ├── modeling_isaac_vision.py # Vision wrapper + MLP projector + pixel shuffle
│ ├── ndxi_patch.py # SDK 2.29 compatibility patches
│ ├── utils.py # QKV fusion + pixel shuffle utilities
│ └── siglip/
│ ├── modeling_siglip.py # SigLIP2 vision encoder
│ └── layers.py # OutputChannelParallelConv2d
└── test/
└── integration/
├── run_isaac.py # Main compilation + generation test
├── benchmark.py # Formal benchmark script
├── test_tp.py # TP=2/4 validation
├── validate_text_logits.py # Text logit validation vs CPU
├── validate_tkg.py # TKG multi-token validation
├── validate_image_text.py # Image+text E2E validation
└── validate_vision_encoder.py # Vision encoder sanity checks
```

## Example Checkpoint

* [`PerceptronAI/Isaac-0.2-2B-Preview`](https://huggingface.co/PerceptronAI/Isaac-0.2-2B-Preview)
Loading