GitHub - mpivarski/Transcriber-CLI-V3: A rework and optimization of the second transcriber-cli for Math390 Spring 2026

  _______                            _ _                       _____ _      _____ 
 |__   __|                          (_) |                     / ____| |    |_   _|
    | |_ __ __ _ _ __  ___  ___ _ __ _| |__   ___ _ __ ______| |    | |      | |  
    | | '__/ _` | '_ \/ __|/ __| '__| | '_ \ / _ \ '__|______| |    | |      | |  
    | | | | (_| | | | \__ \ (__| |  | | |_) |  __/ |         | |____| |____ _| |_ 
    |_|_|  \__,_|_| |_|___/\___|_|  |_|_.__/ \___|_|          \_____|______|_____|

A command-line interface tool for transcribing herbarium label details from images using AWS Bedrock AI models.

Overview

Transcriber CLI is designed to process and transcribe text from herbarium specimen images:

First Shot: Processes full images to extract label information
Second Shot: Processes Images once more. (First shot results + Image) for another pass

Prerequisites

Python 3.x
AWS account with Bedrock access
Properly configured AWS credentials: AWS CLI

Installation

Clone this repository:

git clone https://github.com/rherbst123/Transcriber-CLI-V2
cd Transcriber_CLI

Create a virtual enviroment

python3 -m venv "Whatever you want to call the venv"

Install required dependencies:
```
pip install -r requirements.txt
```

Usage

Run the main script:

python TranscribeCLI.py

The tool will:

Ask you to name the run
How many Runs to do
Choose Model For First run
Let Run complete
Ask Model for second shot (If chosen)
Output a completed .csv file to your desktop with a cost analysis and raw .json files

Validation Features

The tool includes comprehensive validation capabilities that can be configured in the settings menu:

1. Scientific Name Validation

Validates scientific names against the Global Names Validator on Tropicos
Flags potential issues with taxonomic nomenclature
Provides suggested corrections when available

2. Duplicate Records Validation

Searches the Bryophyte Portal for existing records by catalog number
Identifies potential duplicate specimens in the database
Adds detailed duplicate information to CSV output

3. Duplicate Entries Validation

Searches for existing entries based on collector, collection date, and collection ID
Helps identify specimens that may already be in the database under different catalog numbers
Useful for detecting field collection duplicates

All validation features can be enabled/disabled in the configuration menu and run automatically after transcription.

Future Updates

Scientific Name Validation (Done with Global Names Validator on Tropicos) Global Names
Search in Portal for Duplicate Catalog numbers
Search in Portal for Entries on "Collector, record Number and Date"
Automatic Segmentation for all images done before transcription
Validation using GBIF, IDigBio and Sybiota
Enhanced genus/species validation

Supported AI Models

The tool supports multiple AWS Bedrock models:

Claude 3 Sonnet
Claude 3.7 Sonnet
Claude 4 Sonnet
Claude 4.5 Sonnet
Claude 4 Opus
Claude 4.1 Opus
LLama 3.2 90b
LLama 4 17b
Amazon Nova-lite,pro,premier
Mistral Pixtral Large

More models will be added as they come out.

Output

Transcription results are saved in:

Desktop/Finished_Transcriptions_"User Entered Run Name"
Contents Include
- Segmented Images
- First and Second shot raw data
- First and Second shot .csv files

Prompts

The tool uses specialized prompts for herbarium label transcription, located in the Prompts/ directory. The default prompt (Prompt_1.5.3.txt) is designed to extract detailed information from herbarium labels following specific formatting rules.

Customization

Modify prompts in the Prompts/ directory to adjust transcription behavior
Add or remove models in the AVAILABLE_MODELS list in each transcriber module (This is updated frequently so you dont have to really)

Created by Riley Herbst, for the Field Museum. With much thanks to the following: Matt Von Konrat, Jeff Gwilliam, Dan Stille

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
ImageViewer		ImageViewer
Legacy		Legacy
NEW_DATA		NEW_DATA
Transcriber-CLI-V2		Transcriber-CLI-V2
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Prerequisites

Installation

Usage

Validation Features

1. Scientific Name Validation

2. Duplicate Records Validation

3. Duplicate Entries Validation

Future Updates

Supported AI Models

More models will be added as they come out.

Output

Prompts

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Prerequisites

Installation

Usage

Validation Features

1. Scientific Name Validation

2. Duplicate Records Validation

3. Duplicate Entries Validation

Future Updates

Supported AI Models

More models will be added as they come out.

Output

Prompts

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages