Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/_apps/smolvlm2-captioner/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: posts
classes: wide
title: smolvlm2-captioner
date: 1970-01-01T00:00:00+00:00
---
Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.
- [v0.1](v0.1) ([`@kelleyl`](https://github.com/kelleyl))
113 changes: 113 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.1/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
layout: posts
classes: wide
title: "SmolVLM2 Captioner (v0.1)"
date: 2025-11-20T14:50:10+00:00
---
## About this version

- Submitter: [kelleyl](https://github.com/kelleyl)
- Submission Time: 2025-11-20T14:50:10+00:00
- Prebuilt Container Image: [ghcr.io/clamsproject/app-smolvlm2-captioner:v0.1](https://github.com/clamsproject/app-smolvlm2-captioner/pkgs/container/app-smolvlm2-captioner/v0.1)
- Release Notes

(no notes provided by the developer)

## About this app (See raw [metadata.json](metadata.json))

**Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.**

- App ID: [http://apps.clams.ai/smolvlm2-captioner/v0.1](http://apps.clams.ai/smolvlm2-captioner/v0.1)
- App License: Apache 2.0
- Source Repository: [https://github.com/clamsproject/app-smolvlm2-captioner](https://github.com/clamsproject/app-smolvlm2-captioner) ([source tree of the submitted version](https://github.com/clamsproject/app-smolvlm2-captioner/tree/v0.1))


#### Inputs
(**Note**: "*" as a property value means that the property is required but can be any value.)

- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
(of any properties)

- [http://mmif.clams.ai/vocabulary/ImageDocument/v1](http://mmif.clams.ai/vocabulary/ImageDocument/v1) (required)
(of any properties)

- [http://mmif.clams.ai/vocabulary/TimeFrame/v6](http://mmif.clams.ai/vocabulary/TimeFrame/v6) (required)
(of any properties)



#### Configurable Parameters
(**Note**: _Multivalued_ means the parameter can have one or more values.)

- `frameInterval`: optional, defaults to `30`

- Type: integer
- Multivalued: False


> The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.
- `defaultPrompt`: optional, defaults to `Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.`

- Type: string
- Multivalued: False


> default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.
- `promptMap`: optional, defaults to `[]`

- Type: map
- Multivalued: True


> mapping of labels of the input timeframe annotations to new prompts. Must be formatted as "IN_LABEL:PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`
- `config`: optional, defaults to `config/default.yaml`

- Type: string
- Multivalued: False


> Name of the config file to use.
- `num_beams`: optional, defaults to `1`

- Type: integer
- Multivalued: False


> Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.
- `pretty`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The JSON body of the HTTP response will be re-formatted with 2-space indentation
- `runningTime`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The running time of the app will be recorded in the view metadata
- `hwFetch`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata


#### Outputs
(**Note**: "*" as a property value means that the property is required but can be any value.)

(**Note**: Not all output annotations are always generated.)

- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
(of any properties)

- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
(of any properties)

89 changes: 89 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.1/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
{
"name": "SmolVLM2 Captioner",
"description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
"app_version": "v0.1",
"mmif_version": "1.1.0",
"app_license": "Apache 2.0",
"identifier": "http://apps.clams.ai/smolvlm2-captioner/v0.1",
"url": "https://github.com/clamsproject/app-smolvlm2-captioner",
"input": [
{
"@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
"required": true
},
{
"@type": "http://mmif.clams.ai/vocabulary/ImageDocument/v1",
"required": true
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v6",
"required": true
}
],
"output": [
{
"@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
},
{
"@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
}
],
"parameters": [
{
"name": "frameInterval",
"description": "The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.",
"type": "integer",
"default": 30,
"multivalued": false
},
{
"name": "defaultPrompt",
"description": "default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.",
"type": "string",
"default": "Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.",
"multivalued": false
},
{
"name": "promptMap",
"description": "mapping of labels of the input timeframe annotations to new prompts. Must be formatted as \"IN_LABEL:PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`",
"type": "map",
"default": [],
"multivalued": true
},
{
"name": "config",
"description": "Name of the config file to use.",
"type": "string",
"default": "config/default.yaml",
"multivalued": false
},
{
"name": "num_beams",
"description": "Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.",
"type": "integer",
"default": 1,
"multivalued": false
},
{
"name": "pretty",
"description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
"type": "boolean",
"default": false,
"multivalued": false
},
{
"name": "runningTime",
"description": "The running time of the app will be recorded in the view metadata",
"type": "boolean",
"default": false,
"multivalued": false
},
{
"name": "hwFetch",
"description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
"type": "boolean",
"default": false,
"multivalued": false
}
]
}
5 changes: 5 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.1/submission.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"time": "2025-11-20T14:50:10+00:00",
"submitter": "kelleyl",
"image": "ghcr.io/clamsproject/app-smolvlm2-captioner:v0.1"
}
10 changes: 10 additions & 0 deletions docs/_data/app-index.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
{
"http://apps.clams.ai/smolvlm2-captioner": {
"description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
"latest_update": "2025-11-20T14:50:10+00:00",
"versions": [
[
"v0.1",
"kelleyl"
]
]
},
"http://apps.clams.ai/tonedetection": {
"description": "Detects spans of monotonic audio within an audio file",
"latest_update": "2025-11-20T08:01:02+00:00",
Expand Down
2 changes: 1 addition & 1 deletion docs/_data/apps.json

Large diffs are not rendered by default.

Loading