APT (Actuarial Personalized Tool)-Based Tool-Call sLLM Agent Pipeline for Enhancing Efficiency in Insurance Actuarial Work
The Insurance Analysis Pipeline is an integrated solution designed to perform comprehensive analyses of insurance products. Combining actuarial assessments, product development strategies, and market analysis, the system delivers rigorous, reproducible, and validated insights into insurance products. The architecture emphasizes high-quality outputs through detailed quality metrics such as completeness, consistency, and reliability.
The pipeline executes in the following major stages:
-
Environment and Database Initialization (run_prompts.sh):
- Loads database configuration from a .env file (e.g., DB_HOST, DB_PORT, DB_NAME, DB_USER, etc.).
- Forcefully stops any running PostgreSQL instances and cleans previous data directories to ensure a pristine starting state.
- Initializes a new PostgreSQL database cluster using
initdband updates essential configuration files (including modifications tolisten_addressesandpg_hba.conf). - Starts the PostgreSQL server and checks its readiness using
pg_isready. - Creates the required database and executes an initial schema SQL script (e.g.,
product_development_schema.sql) to establish tables, views, and indices. - Executes preliminary queries to validate the database connection and display key business metrics (such as the number of insurance products, total policy counts, and aggregated risk premiums).
- Activates the Conda environment (named "insurance") and adjusts the PYTHONPATH to include the project root.
- Finally, runs the main application using
python process_prompts.pyand, upon completion, gracefully shuts down the PostgreSQL server.
-
Prompt Processing and Analysis (process_prompts.py):
- PromptProcessor Class:
- Reads and parses a prompt file (e.g.,
prompt.txt), splitting content into individual prompts using[PROMPT]markers. - Initializes critical components, including:
- A tool model (
watt-tool-8B) for generating tool calls necessary for insurance analysis requests. - A response model (
Mistral-Small-24B-Instruct-2501) for synthesizing final analysis outputs. - A tokenizer, updated with special tokens (e.g., a PAD token with
mean_resizing=False) to manage embedding initialization. - An instance of the Insurance Analysis Engine (
InsuranceEngine) configured with database connectivity and domain-specific settings.
- A tool model (
- Reads and parses a prompt file (e.g.,
- Processing Pipeline:
- For each prompt, the system creates a series of validated tool calls to structure the insurance analysis request.
- Executes these calls sequentially, handling distinct analysis domains such as actuarial evaluation, market analysis, and product assessment.
- Aggregates the results from all tool calls and feeds them into the
generate_mistral_responsemethod, which produces a consolidated final output. - Computes qualitative and quantitative quality metrics (completeness, consistency, reliability) to validate and benchmark the analysis.
- Stores the integrated results (including prompt identifiers, raw prompts, detailed domain analyses, quality metrics, execution statuses, and final responses) into a JSON file (
results.json).
- PromptProcessor Class:
-
Detailed Insurance Analysis (InsuranceEngine):
- Located in
src/core/engines/insurance.py, the InsuranceEngine class is responsible for in-depth handling of insurance requests. - It validates incoming parameters, decomposes high-level prompts into specific sub-analyses (e.g., actuarial risk assessment, market positioning, product feasibility), and processes each domain accordingly.
- Uses a dedicated module (ResultSynthesizer) to aggregate sub-analysis results into a cohesive and comprehensive final outcome.
- Incorporates quantitative evaluations (e.g., RMSE, MAE, and other domain-specific measures) to ensure the robustness and reliability of the analysis.
- Located in
The project is structured as follows:
βββ run_prompts.sh # Shell script for environment setup, database initialization, and pipeline execution.
βββ process_prompts.py # Main application for prompt processing and orchestration of analysis tasks.
βββ src/core/engines/
βββ base.py # Base interface for analysis engines.
βββ main.py # Central engine coordinator.
βββ insurance.py # Domain-specific module that processes detailed insurance analysis requests.
- Python 3.8 or higher
- PostgreSQL
- Conda environment management
- Essential Python libraries: pandas, numpy, datetime, among others required by the analysis models.
Before running the pipeline, perform the following setup steps:
- Copy the example environment file to create a local configuration:
cp .env.example .env
- Create and activate the Conda environment:
conda env create -f environment.yml conda activate insurance_analysis
- Ensure that your
.envfile is correctly configured with the database settings (e.g., DB_HOST, DB_PORT, DB_NAME, DB_USER). - Execute the shell script:
This script will:
bash run_prompts.sh
- Initialize a fresh PostgreSQL server cluster.
- Set up the necessary database schema and perform initial connectivity checks.
- Activate the configured Conda environment and adjust the PYTHONPATH.
- Launch the main prompt processing application.
- Upon completion, review the aggregated results in the generated
results.jsonfile.
The following are sample CLI commands to execute specific analyses:
-
Actuarial Analysis:
python process_prompts.py actuarial loss_ratio --claims 1000000 --premiums 1200000 python process_prompts.py actuarial risk_metrics --historical_losses "[100000,120000,95000]" python process_prompts.py actuarial premium_adjustment --target_ratio 0.7 -
Product Development:
python process_prompts.py product coverage --type comprehensive --segment young_professionals python process_prompts.py product benefits --segment families python process_prompts.py product pricing --coverage_amount 100000000 --risk_level 1.2
-
Market Analysis:
python process_prompts.py market size --segment senior_citizens python process_prompts.py market competition --segment young_professionals python process_prompts.py market forecast --segment families
-
Integrated Analysis:
python process_prompts.py analyze --target young_professionals --coverage comprehensive --claims 1000000 --premiums 1200000
Below is a graphical representation of how the Insurance Analysis Engine processes an input prompt. If your Markdown viewer supports Mermaid, you can visualize the flowchart:
flowchart LR
A["Input Prompt: prompt.txt"] --> B["PromptProcessor reads prompt"]
B --> C["Parse prompt to extract analysis parameters"]
C --> D["Generate tool calls for each analysis domain"]
D --> E1["Actuarial Engine: Risk metrics, loss ratio, etc."]
D --> E2["Product Development Engine: Coverage, benefits, pricing"]
D --> E3["Market Analysis Engine: Market size, competitor analysis"]
E1 --> F["Return Actuarial Results"]
E2 --> F["Return Product Development Results"]
E3 --> F["Return Market Analysis Results"]
F --> G["Aggregate results via ResultSynthesizer"]
G --> H["Generate final response using response model"]
H --> I["Output final analysis"]
An example prompt configuration file (e.g., prompt.txt) may look like:
[PROMPT]
Please analyze the loss ratio for a policy with 1,200,000 in premiums and 1,000,000 in claims.
Additional configurations and prompt customizations can be added as needed.
-
Tool Model: watt-tool-8B
For detailed specifications, refer to the watt-tool-8B huggingface. -
Response Model: Mistral-Small-24B-Instruct-2501
More details are available at the Mistral-Small-24B-Base-2501 huggingface.
These models are seamlessly integrated to handle task-specific operations and provide high-fidelity outputs.
The analysis pipeline is built on a modular engine framework, where each engine serves a distinct purpose:
-
Actuarial Engine:
- Focuses on quantitative risk assessments, including loss ratio calculations, risk metrics (such as Value at Risk and Expected Loss), and premium adjustments.
-
Product Development Engine:
- Handles the design and optimization of insurance products through coverage structure design, benefit analysis, and pricing strategy formulation.
-
Market Analysis Engine:
- Analyzes market dynamics by assessing market size, competitor analysis, demand forecasting, and market share computations.
-
Integrated Insurance Engine:
- Aggregates results from the individual engines to produce a comprehensive analysis. This engine validates data quality through metrics such as completeness, consistency, and reliability, ensuring robust final outputs.
- Place all your data files in the
data/directory - Supported file formats: CSV, JSON, SQL dumps
- Example structure:
data/ βββ raw/ # Raw data files β βββ claims.csv β βββ policies.csv β βββ customers.csv βββ processed/ # Processed/transformed data βββ temp/ # Temporary files
- The current
example.sqlcontains sample schema and data - Important: Replace
example.sqlwith your actual data processing SQL scripts - Required modifications:
- Update table schemas to match your data structure
- Modify data insertion statements
- Adjust indexes and constraints as needed
- Update views and stored procedures if necessary
Example of replacing example.sql:
-- Original example.sql
CREATE TABLE example_policies (...);
-- Should be replaced with actual implementation
CREATE TABLE policies (
policy_id VARCHAR(50) PRIMARY KEY,
customer_id VARCHAR(50),
product_type VARCHAR(100),
premium DECIMAL(15,2),
start_date DATE,
end_date DATE,
-- Add other necessary fields
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);The process_insurance_data.py provides a template for processing insurance data and loading it into the PostgreSQL database. This template needs to be customized according to your specific data structure and requirements.
- InsuranceDataProcessor Class:
class InsuranceDataProcessor: def process_quarterly_contracts(self) -> pd.DataFrame: # Process quarterly insurance contract data pass def process_quarterly_claims(self) -> pd.DataFrame: # Process quarterly insurance claims data pass def process_age_based_contracts(self) -> pd.DataFrame: # Process age-based contract data pass
-
Configure Data Sources:
- Update file paths in
data/directory - Modify category mappings in
category_mappingdictionary - Adjust database connection parameters
- Update file paths in
-
Customize Data Processing:
- Implement the TODO sections in each processing method
- Match your CSV column names with database table structure
- Add any additional data transformation logic
-
Example Usage:
processor = InsuranceDataProcessor() # Process and load contracts data contracts = processor.process_quarterly_contracts() processor.save_to_db(contracts, 'quarterly_insurance_contracts') # Process and load claims data claims = processor.process_quarterly_claims() processor.save_to_db(claims, 'quarterly_insurance_claims')
-
Database Integration:
- Ensures compatibility with schema defined in
example.sql - Uses efficient bulk loading with
psycopg2.copy_from - Includes error handling and transaction management
- Ensures compatibility with schema defined in
The provided template is designed to work with the example schema in example.sql. When implementing your own data processing logic, ensure that the output DataFrame columns match your actual database table structure.
The system expects the following data files:
- Claims by amount data
- Claims by period data
- Health insurance premium data
- Claims by quarter data
- Contracts by quarter data
- Product returns and fees data
- Contracts by age data
Each file should contain the corresponding columns as mapped in KOREAN_TO_ENGLISH_COLUMNS.
The process_insurance_data.py file contains several TODO items that need to be implemented:
-
Category Mappings:
self.category_mapping = { 'category_1': 'type_a', 'category_2': 'type_b', 'category_3': 'type_c' }
- Replace with your actual insurance category mappings
- Ensure consistency with your data structure
- Map all possible categories in your data
-
Quarterly Contracts Processing:
def process_quarterly_contracts(self) -> pd.DataFrame:
- Implement data loading from your contracts file
- Map columns according to your schema
- Handle any data transformations needed
- Expected columns: year, insurance_category, q1_contracts, q1_premium, etc.
-
Quarterly Claims Processing:
def process_quarterly_claims(self) -> pd.DataFrame:
- Implement data loading from your claims file
- Map columns according to your schema
- Handle any data transformations needed
- Expected columns: year, insurance_category, q1_accident_count, q1_claims, etc.
-
Age-based Contracts Processing:
def process_age_based_contracts(self) -> pd.DataFrame:
- Implement age group data restructuring
- Process data according to defined age groups
- Handle aggregations if needed
- Expected columns: contracts_under_10, risk_premium_under_10, etc.
-
Claims by Period Processing:
def process_claims_by_period(self) -> pd.DataFrame:
- Implement period-based claims processing
- Map columns according to your schema
- Handle any time-based aggregations
- Expected columns: within_1year_incidents, within_1year_claims, etc.
-
Claims by Amount Processing:
def process_claims_by_amount(self) -> pd.DataFrame:
- Implement amount-based claims processing
- Map columns according to your schema
- Handle any amount range aggregations
- Expected columns: under_10m_incidents, under_10m_claims, etc.
-
Health Insurance Premium Processing:
def process_health_insurance_premium(self) -> pd.DataFrame:
- Implement health insurance premium data processing
- Extract monthly data
- Handle regional and type-based aggregations
- Expected columns: region, type, year, month, premium
-
Data Validation:
- Add input data validation
- Check for required columns
- Validate data types
- Handle missing values appropriately
-
Error Handling:
- Add proper error handling for file operations
- Handle data transformation errors
- Log errors appropriately
- Implement graceful fallbacks
-
Performance Optimization:
- Use efficient data processing methods
- Implement batch processing for large datasets
- Optimize memory usage
- Consider using parallel processing for large files
-
Testing:
- Add unit tests for each processing function
- Test with sample data
- Validate output data structure
- Test error handling
- Download the required models:
python download_model.py- Update the model paths in
.env:
MODEL_PATH=path/to/your/model
RESPONSE_MODEL_PATH=path/to/your/response/model
- Prepare your prompts in
prompt.txt - Run the analysis:
python process_prompts.pySee LICENSE file.
