This is an LLM-powered text analysis API built with FastAPI that extracts structured information from unstructured text. Here's what it does:
The system takes unstructured text and automatically extracts:
- Title - A concise title for the content
- Summary - A brief summary of the text
- Topics - Main topics (aims for 3)
- Sentiment - Positive, neutral, or negative
- Keywords - Most frequent nouns extracted locally
- Confidence Score - Quality metric based on structural integrity
I designed this system using a modular, service-oriented architecture to ensure a clean separation of concerns between the API layer, business logic, and data access. I chose FastAPI for its high performance, automatic interactive documentation, and excellent data validation with Pydantic. For LLM interaction, I used the instructor library to reliably parse the LLM's output into Pydantic models, which makes the extraction process robust and eliminates manual parsing errors. Finally, SQLAlchemy provides a flexible Object-Relational Mapper (ORM) that abstracts away database queries and allowed me to start with SQLite while making it easy to switch to a more powerful database like PostgreSQL in the future.
I made several deliberate trade-offs:
-
Synchronous API Calls: The
/analyzeand/analyze-batchendpoints are synchronous. For a production system, I would move the slow LLM API calls to a background task queue (like Celery) to prevent blocking and API timeouts. -
Simplified Search: The search functionality uses a basic SQL
LIKEquery. This is inefficient for large datasets and lacks linguistic intelligence; it should be replaced with a dedicated full-text search engine (like Elasticsearch) or a vector database for semantic similarity search. -
SQLite Database: I used SQLite because it requires zero setup. For any multi-user or high-write application, I would migrate to PostgreSQL to handle concurrency and scale effectively.
-
I also have what_i_would_improve.md talking further.
You can run this application either locally using a Python virtual environment or via Docker.
Prerequisites:
- Docker installed and running.
Instructions:
-
Create the environment file: Copy the example file and add your OpenAI API key.
cp .env.example .env # Now, edit the .env file and add your key # OPENAI_API_KEY="sk-..." # OPENAI_MODEL="your_open_ai_compatible_model" # OPENAI_BASE_URL="your_open_ai_base_url"
-
Build the Docker image:
docker build -t llm-extractor . -
Run the Docker container: This command will start the application and forward port 8000.
docker run --rm -p 8000:8000 --env-file .env llm-extractor
The API is now running and accessible at http://localhost:8000.
Prerequisites:
- Python 3.9+
- A Python virtual environment tool (
venv)
Instructions:
-
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Create the environment file: Copy the example and add your OpenAI API key.
cp .env.example .env # Edit the .env file with your key -
Run the application: The
--reloadflag will automatically restart the server when you make code changes.uvicorn app.main:app --reload
The API is now running and accessible at http://127.0.0.1:8000.
Once the application is running, the easiest way to interact with the API is through the auto-generated documentation.
-
Open your browser and navigate to
http://127.0.0.1:8000/docs. -
You will see the Swagger UI, which provides an interactive interface for all available endpoints.
POST /analyze: Accepts a JSON object with a singletextfield. It processes the text, stores the result, and returns the full analysis object.POST /analyze-batch: Accepts a JSON object with atextsfield (a list of strings) and returns a list of analysis objects.GET /search?q={query}: Searches for stored analyses where thequerystring matches one of the extracted topics or keywords.
To ensure the application is working as expected, you can run the integration test suite.
-
Make sure you have installed the development dependencies:
pip install -r requirements.txt
-
Run pytest from the root directory:
PYTHONPATH=. pytest tests/test_api.py -v