clamsproject · clams-bot · Nov 20, 2025
diff --git a/docs/_apps/smolvlm2-captioner/index.md b/docs/_apps/smolvlm2-captioner/index.md
@@ -0,0 +1,8 @@
+---
+layout: posts
+classes: wide
+title: smolvlm2-captioner
+date: 1970-01-01T00:00:00+00:00
+---
+Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.
+- [v0.1](v0.1) ([`@kelleyl`](https://github.com/kelleyl))
diff --git a/docs/_apps/smolvlm2-captioner/v0.1/index.md b/docs/_apps/smolvlm2-captioner/v0.1/index.md
@@ -0,0 +1,113 @@
+---
+layout: posts
+classes: wide
+title: "SmolVLM2 Captioner (v0.1)"
+date: 2025-11-20T14:50:10+00:00
+---
+## About this version
+
+- Submitter: [kelleyl](https://github.com/kelleyl)
+- Submission Time: 2025-11-20T14:50:10+00:00
+- Prebuilt Container Image: [ghcr.io/clamsproject/app-smolvlm2-captioner:v0.1](https://github.com/clamsproject/app-smolvlm2-captioner/pkgs/container/app-smolvlm2-captioner/v0.1)
+- Release Notes
+
+    (no notes provided by the developer)
+
+## About this app (See raw [metadata.json](metadata.json))
+
+**Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.**
+
+- App ID: [http://apps.clams.ai/smolvlm2-captioner/v0.1](http://apps.clams.ai/smolvlm2-captioner/v0.1)
+- App License: Apache 2.0
+- Source Repository: [https://github.com/clamsproject/app-smolvlm2-captioner](https://github.com/clamsproject/app-smolvlm2-captioner) ([source tree of the submitted version](https://github.com/clamsproject/app-smolvlm2-captioner/tree/v0.1))
+
+
+#### Inputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/ImageDocument/v1](http://mmif.clams.ai/vocabulary/ImageDocument/v1) (required)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/TimeFrame/v6](http://mmif.clams.ai/vocabulary/TimeFrame/v6) (required)
+(of any properties)
+
+
+
+#### Configurable Parameters
+(**Note**: _Multivalued_ means the parameter can have one or more values.)
+
+- `frameInterval`: optional, defaults to `30`
+
+    - Type: integer
+    - Multivalued: False
+
+
+    > The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.
+- `defaultPrompt`: optional, defaults to `Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.
+- `promptMap`: optional, defaults to `[]`
+
+    - Type: map
+    - Multivalued: True
+
+
+    > mapping of labels of the input timeframe annotations to new prompts. Must be formatted as "IN_LABEL:PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`
+- `config`: optional, defaults to `config/default.yaml`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > Name of the config file to use.
+- `num_beams`: optional, defaults to `1`
+
+    - Type: integer
+    - Multivalued: False
+
+
+    > Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.
+- `pretty`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The JSON body of the HTTP response will be re-formatted with 2-space indentation
+- `runningTime`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The running time of the app will be recorded in the view metadata
+- `hwFetch`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
+
+
+#### Outputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+(**Note**: Not all output annotations are always generated.)
+
+- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
+(of any properties)
+
diff --git a/docs/_apps/smolvlm2-captioner/v0.1/metadata.json b/docs/_apps/smolvlm2-captioner/v0.1/metadata.json
@@ -0,0 +1,89 @@
+{
+  "name": "SmolVLM2 Captioner",
+  "description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
+  "app_version": "v0.1",
+  "mmif_version": "1.1.0",
+  "app_license": "Apache 2.0",
+  "identifier": "http://apps.clams.ai/smolvlm2-captioner/v0.1",
+  "url": "https://github.com/clamsproject/app-smolvlm2-captioner",
+  "input": [
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
+      "required": true
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/ImageDocument/v1",
+      "required": true
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v6",
+      "required": true
+    }
+  ],
+  "output": [
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
+    }
+  ],
+  "parameters": [
+    {
+      "name": "frameInterval",
+      "description": "The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.",
+      "type": "integer",
+      "default": 30,
+      "multivalued": false
+    },
+    {
+      "name": "defaultPrompt",
+      "description": "default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.",
+      "type": "string",
+      "default": "Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.",
+      "multivalued": false
+    },
+    {
+      "name": "promptMap",
+      "description": "mapping of labels of the input timeframe annotations to new prompts. Must be formatted as \"IN_LABEL:PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`",
+      "type": "map",
+      "default": [],
+      "multivalued": true
+    },
+    {
+      "name": "config",
+      "description": "Name of the config file to use.",
+      "type": "string",
+      "default": "config/default.yaml",
+      "multivalued": false
+    },
+    {
+      "name": "num_beams",
+      "description": "Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.",
+      "type": "integer",
+      "default": 1,
+      "multivalued": false
+    },
+    {
+      "name": "pretty",
+      "description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "runningTime",
+      "description": "The running time of the app will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "hwFetch",
+      "description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    }
+  ]
+}
diff --git a/docs/_apps/smolvlm2-captioner/v0.1/submission.json b/docs/_apps/smolvlm2-captioner/v0.1/submission.json
@@ -0,0 +1,5 @@
+{
+  "time": "2025-11-20T14:50:10+00:00",
+  "submitter": "kelleyl",
+  "image": "ghcr.io/clamsproject/app-smolvlm2-captioner:v0.1"
+}
diff --git a/docs/_data/app-index.json b/docs/_data/app-index.json
@@ -1,4 +1,14 @@
 {
+  "http://apps.clams.ai/smolvlm2-captioner": {
+    "description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
+    "latest_update": "2025-11-20T14:50:10+00:00",
+    "versions": [
+      [
+        "v0.1",
+        "kelleyl"
+      ]
+    ]
+  },
   "http://apps.clams.ai/tonedetection": {
     "description": "Detects spans of monotonic audio within an audio file",
     "latest_update": "2025-11-20T08:01:02+00:00",

diff --git a/docs/_data/apps.json b/docs/_data/apps.json