Structured Text Extraction for Scientific & Factual Data
textract_io is a Python package designed to extract structured key information from scientific or factual text inputs. It leverages pattern matching and retry mechanisms to ensure accurate, reliable responses—ideal for generating summaries, extracting data, or categorizing text based on user prompts. Perfect for processing pre-extracted textual data from multimedia sources to produce concise, structured outputs for research, reporting, or database entry.
- Pattern-based extraction: Uses regex patterns to enforce structured output.
- LLM7 integration: Defaults to
ChatLLM7(fromlangchain_llm7) for extraction tasks. - Flexible LLM support: Easily swap with any LangChain-compatible LLM (OpenAI, Anthropic, Google, etc.).
- Error handling: Robust retry logic and clear error messages.
- Environment-aware: Uses
LLM7_API_KEYfrom environment variables or direct API key input.
pip install textract_iofrom textract_io import textract_io
response = textract_io(user_input="Your text here...")
print(response) # List of extracted data matching the patternReplace the default ChatLLM7 with any LangChain-compatible LLM (e.g., OpenAI, Anthropic, Google):
from langchain_openai import ChatOpenAI
from textract_io import textract_io
llm = ChatOpenAI()
response = textract_io(user_input="Your text here...", llm=llm)from langchain_anthropic import ChatAnthropic
from textract_io import textract_io
llm = ChatAnthropic()
response = textract_io(user_input="Your text here...", llm=llm)from langchain_google_genai import ChatGoogleGenerativeAI
from textract_io import textract_io
llm = ChatGoogleGenerativeAI()
response = textract_io(user_input="Your text here...", llm=llm)- Default: Uses
LLM7_API_KEYfrom environment variables. - Manual Override: Pass the API key directly:
response = textract_io(user_input="Your text...", api_key="your_llm7_api_key")
- Free API Key: Register at LLM7 Token to get started.
| Parameter | Type | Description |
|---|---|---|
user_input |
str |
The input text to process. |
api_key |
Optional[str] |
LLM7 API key (defaults to LLM7_API_KEY environment variable). |
llm |
Optional[BaseChatModel] |
Custom LangChain LLM (e.g., ChatOpenAI, ChatAnthropic). Defaults to ChatLLM7. |
- LLM7 Free Tier: Sufficient for most use cases.
- Upgrade: Use your own API key or environment variable for higher limits.
- If extraction fails, raises
RuntimeErrorwith a descriptive message. - Retries internally to improve reliability.
MIT License (see LICENSE).
For bugs or feature requests, open an issue on GitHub.
Eugene Evstafev (@chigwell) 📧 hi@euegne.plus