textract-io

Structured Text Extraction for Scientific & Factual Data

textract_io is a Python package designed to extract structured key information from scientific or factual text inputs. It leverages pattern matching and retry mechanisms to ensure accurate, reliable responses—ideal for generating summaries, extracting data, or categorizing text based on user prompts. Perfect for processing pre-extracted textual data from multimedia sources to produce concise, structured outputs for research, reporting, or database entry.

🚀 Features

Pattern-based extraction: Uses regex patterns to enforce structured output.
LLM7 integration: Defaults to ChatLLM7 (from langchain_llm7) for extraction tasks.
Flexible LLM support: Easily swap with any LangChain-compatible LLM (OpenAI, Anthropic, Google, etc.).
Error handling: Robust retry logic and clear error messages.
Environment-aware: Uses LLM7_API_KEY from environment variables or direct API key input.

📦 Installation

pip install textract_io

🔧 Usage

Basic Usage (Default LLM7)

from textract_io import textract_io

response = textract_io(user_input="Your text here...")
print(response)  # List of extracted data matching the pattern

Custom LLM Integration

Replace the default ChatLLM7 with any LangChain-compatible LLM (e.g., OpenAI, Anthropic, Google):

OpenAI Example

from langchain_openai import ChatOpenAI
from textract_io import textract_io

llm = ChatOpenAI()
response = textract_io(user_input="Your text here...", llm=llm)

Anthropic Example

from langchain_anthropic import ChatAnthropic
from textract_io import textract_io

llm = ChatAnthropic()
response = textract_io(user_input="Your text here...", llm=llm)

Google Generative AI Example

from langchain_google_genai import ChatGoogleGenerativeAI
from textract_io import textract_io

llm = ChatGoogleGenerativeAI()
response = textract_io(user_input="Your text here...", llm=llm)

🔑 API Key Configuration

Default: Uses LLM7_API_KEY from environment variables.

Manual Override: Pass the API key directly:

response = textract_io(user_input="Your text...", api_key="your_llm7_api_key")

Free API Key: Register at LLM7 Token to get started.

📝 Parameters

Parameter	Type	Description
`user_input`	`str`	The input text to process.
`api_key`	`Optional[str]`	LLM7 API key (defaults to `LLM7_API_KEY` environment variable).
`llm`	`Optional[BaseChatModel]`	Custom LangChain LLM (e.g., `ChatOpenAI`, `ChatAnthropic`). Defaults to `ChatLLM7`.

📊 Rate Limits

LLM7 Free Tier: Sufficient for most use cases.
Upgrade: Use your own API key or environment variable for higher limits.

🔄 Error Handling

If extraction fails, raises RuntimeError with a descriptive message.
Retries internally to improve reliability.

📜 License

MIT License (see LICENSE).

📢 Support & Issues

For bugs or feature requests, open an issue on GitHub.

👤 Author

Eugene Evstafev (@chigwell) 📧 hi@euegne.plus

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
textract_io		textract_io
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

textract-io

🚀 Features

📦 Installation

🔧 Usage

Basic Usage (Default LLM7)

Custom LLM Integration

OpenAI Example

Anthropic Example

Google Generative AI Example

🔑 API Key Configuration

📝 Parameters

📊 Rate Limits

🔄 Error Handling

📜 License

📢 Support & Issues

👤 Author

About

Uh oh!

Releases

Packages

Languages

chigwell/textract-io

Folders and files

Latest commit

History

Repository files navigation

textract-io

🚀 Features

📦 Installation

🔧 Usage

Basic Usage (Default LLM7)

Custom LLM Integration

OpenAI Example

Anthropic Example

Google Generative AI Example

🔑 API Key Configuration

📝 Parameters

📊 Rate Limits

🔄 Error Handling

📜 License

📢 Support & Issues

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages