Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 70 additions & 9 deletions python/AzureTranslation/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,26 @@
# Overview

This repository contains source code for the OpenMPF Azure Cognitive Services
Translation Component. This component utilizes the [Azure Cognitive Services
Translator REST endpoint](https://docs.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-translate)
to translate the content of detection properties. It has been tested against v3.0 of the
API.
Translation Component. This component utilizes the Azure Cognitive Services
Translator REST APIs to translate the content of detection properties.

The component supports both:

- The legacy Translator Text v3.0 API, and
- The newer Translator preview API (`api-version=2025-10-01-preview`), with
optional integration to the Azure Language "Analyze Text" API
(`api-version=2025-11-15-preview`) for language + script detection.

By default, the component uses the latest preview translation API. Legacy
v3.0 behavior can be enabled via the `TRANSLATION_API_VERSION` job property
(see **Primary Job Properties** below).

This component translates the content of existing detection properties,
so it only makes sense to use it with
[feed forward](https://openmpf.github.io/docs/site/Feed-Forward-Guide) and
when it isn't the first element of a pipeline.

When a detection property is translated, the translation is put in to a new
When a detection property is translated, the translation is put into a new
detection property named `TRANSLATION`. The original detection property is not
modified. A property named `TRANSLATION TO LANGUAGE` containing the BCP-47
language code of the translated text will also be added. If the language
Expand All @@ -25,16 +34,29 @@ translate one of the languages. For example, translating
"你叫什么名字? ¿Cómo te llamas?" to English results in
"What is your name? The Cómo te llamas?".

For languages that have multiple scripts (for example, Mongolian or Serbian),
the component will attempt to use script-aware language tags (such as `mn-Cyrl`,
`mn-Mong`, `sr-Cyrl`, `sr-Latn`) when script information is available from the
language detection endpoint or from job properties.


# Required Job Properties
In order for the component to process any jobs, the job properties listed below
must be provided. Neither has a default value.

- `ACS_URL`: Base URL for the Azure Cognitive Services Translator Endpoint.
e.g. `https://api.cognitive.microsofttranslator.com` or
`https://<custom-translate-host>/translator/text/v3.0`. The URL should
not end with `/translate` because two separate endpoints are
used. `ACS_URL + '/translate'` is used for translation.
For example:
- Legacy v3.0:
- `https://api.cognitive.microsofttranslator.com`
- `https://<custom-translate-host>/translator/text/v3.0`
- Preview API:
- `https://<resource-name>.cognitiveservices.azure.com/translator`

The URL should **not** end with `/translate` because the component appends
the appropriate path internally. `ACS_URL + '/translate'` is used for
translation, and `ACS_URL + '/detect'` may be used when falling back to
the legacy detection endpoint.

This property can also be configured using an environment variable
named `MPF_PROP_ACS_URL`.

Expand All @@ -48,6 +70,10 @@ must be provided. Neither has a default value.
- `TO_LANGUAGE`: The BCP-47 language code for the language that the properties
should be translated to.

- `LANGUAGE_DETECTION_ENDPOINT`: Optional base URL for the Azure Language
"Analyze Text" API used for language and script detection. If this parameter is not provided,
the component falls back to the legacy /detect behavior on `ACS_URL` (`v3.0`).

- `FEED_FORWARD_PROP_TO_PROCESS`: Comma-separated list of property names indicating
which properties in the feed-forward track or detection to consider
translating. For example, `TEXT,TRANSCRIPT`. If the first property listed is
Expand Down Expand Up @@ -82,6 +108,41 @@ must be provided. Neither has a default value.
for some Azure deployments. If provided, will be set in the
'Ocp-Apim-Subscription-Region' request header.

- `TRANSLATION_API_VERSION`: Selects which Azure Translator API flavor to use.
This controls both the URL structure and the request/response format.

Supported values:

- `LEGACY` or `3.0` – Uses the original Translator Text v3.0 API.
- `ACS_URL` should point to a v3.0-compatible endpoint such as
`https://api.cognitive.microsofttranslator.com`.

- `LATEST` or `PREVIEW_2025` – Uses the latest preview Translator API (`api-version=2025-10-01-preview`).
- `ACS_URL` should point to a preview-compatible Translator endpoint, such
as `https://<resource-name>.cognitiveservices.azure.com/translator`.

If this property is not set, the component defaults to `LATEST`(`2025-10-01-preview`).

- `SUGGESTED_FROM_SCRIPT`: Optional ISO 15924 script code for the source
text. Only used when script information is not returned by the
detection endpoint. For example, you can set `SUGGESTED_FROM_SCRIPT=Mong`
to suggest that Mongolian input text is in traditional Mongolian script when
language detection is unable to identify the proper script for the given text.


- `DEPLOYMENT_NAME`: Optional preview-only setting that indicates which
Translator deployment to use for the target. When provided and
`TRANSLATION_API_VERSION` is `LATEST`, the deployment name is
passed in the `deploymentName` field for each target, allowing you to route
translations to a specific models of interest.

- `TRANSLATION_TONE`: Optional preview-only setting that configures the tone
for generated translations. Options are `formal`, '`informal`, and `neutral` (default).

- `TRANSLATION_GENDER`: Optional preview-only setting that configures the
preferred grammatical gender in translations Options are `male`, `female`, and `neutral` (default).



# Text Splitter Job Properties
The following settings control the behavior of dividing input text into acceptable chunks
Expand Down
Loading