Rhombus-AI-Home-Test

Single-host Django + React application for browsing CSV and Excel files in Amazon S3, inferring Pandas data types, and previewing the processed result with optional column overrides.

Public deployment: https://rhombus-ai-home-test.onrender.com/

What it does

Connects to S3 using runtime AWS credentials supplied by the user.
Lists supported .csv, .xls, and .xlsx objects from a bucket or prefix.
Profiles columns with conservative inference rules for integers, floats, booleans, dates, datetimes, categories, and complex numbers.
Lets the user override inferred types before reprocessing.
Pages through the processed dataset from the backend so reviewers can inspect full files instead of a capped in-memory sample.
Stores sanitized processing metadata in Django without persisting AWS secrets.
Exposes a local CLI via infer_data_types.py for quick local-file smoke testing.

Inference and performance approach

CSV files are staged from S3 onto local disk, then profiled in chunks so larger datasets do not depend on a single long-lived streaming response.
Repeated preview-page requests can reuse a small bounded cache of staged S3 files to avoid re-downloading the same CSV on every page change or override.
Excel files are supported, but they still load the selected sheet into memory and are capped at 20 MB in this MVP.
Type inference is intentionally conservative: ambiguous short dates stay as text unless overridden, and manual overrides are validated to avoid lossy coercion.
Date and DateTime are both backed by pandas datetime storage internally, but the preview renders them differently so date-only columns stay calendar-shaped.
Category inference is strict for larger datasets and slightly softer for very small repeated-label samples such as grade-like columns.

Stack

Backend: Django 5, Django REST Framework, Pandas, boto3
Frontend: React 19, TypeScript, Vite, Vitest
Deployment shape: single host serving the built frontend from Django

Project structure

backend/: Django project and the data_processing app
frontend/: React + TypeScript frontend
docs/brief/: assignment brief and supporting project notes
examples/: sample datasets for local smoke testing
infer_data_types.py: local CLI wrapper around the shared processing service
Dockerfile: production-oriented single-container deployment

Requirements

Python 3.12 recommended for local work to match the Docker runtime
Node.js 22 or newer recommended
npm 11 or newer
Docker Desktop for container verification and deployment builds

The current dependency set also installs and runs on Python 3.14, but keeping local development on Python 3.12 reduces drift from the containerized runtime.

Local setup

1. Create the backend environment

py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt

2. Install the frontend dependencies

cd frontend
npm install
cd ..

3. Apply migrations

python manage.py migrate

Running the app locally

Backend plus built frontend

Build the frontend once, then let Django serve it:

cd frontend
npm run build
cd ..
python manage.py runserver

Open http://127.0.0.1:8000.

Split development mode

Run Django for the API:

python manage.py runserver

In a second terminal, run Vite for the frontend:

cd frontend
npm run dev

Open http://127.0.0.1:5173.

Environment variables

The app works locally without extra configuration, but these variables are supported for deployment:

DJANGO_SECRET_KEY: Django secret key
DJANGO_DEBUG: True or False
DJANGO_ALLOWED_HOSTS: comma-separated hostnames
DJANGO_CSRF_TRUSTED_ORIGINS: comma-separated trusted origins
DJANGO_SQLITE_PATH: optional SQLite database path
DJANGO_FRONTEND_BUILD_DIR: optional override for the built frontend location
WEB_CONCURRENCY: Gunicorn worker count for the container runtime
GUNICORN_TIMEOUT: Gunicorn request timeout in seconds
CSV_CHUNK_SIZE: chunk size used for CSV profiling and preview paging
STAGED_FILE_CACHE_MAX_ITEMS: max number of staged S3 files kept on local disk for reuse (0 disables the cache)
STAGED_FILE_CACHE_TTL_SECONDS: cache lifetime for staged S3 files
PORT: port used by the container startup command

On Render, RENDER_EXTERNAL_HOSTNAME and RENDER_EXTERNAL_URL are injected automatically and merged into Django's trusted host and origin lists. You only need to set DJANGO_ALLOWED_HOSTS or DJANGO_CSRF_TRUSTED_ORIGINS manually if you later add a custom domain.

See .env.example for a starter local or container configuration.

CLI usage

The local CLI uses the same inference service as the web application:

python infer_data_types.py examples/sample_data.csv --preview-rows 5

Optional Excel sheet selection:

python infer_data_types.py path\to\workbook.xlsx --sheet-name Sheet1

API summary

`POST /api/s3/files`

Lists supported S3 objects for a bucket or prefix.

Request body:

{
  "access_key_id": "AKIA...",
  "secret_access_key": "secret",
  "session_token": "",
  "region": "ap-southeast-2",
  "bucket": "demo-bucket",
  "prefix": "incoming/"
}

Project brief alignment

Pandas inference and conversion: the shared backend service profiles and converts CSV/XLS/XLSX data loaded into pandas DataFrames, with explicit handling for object-like mixed columns, dates, numerics, categories, booleans, and complex values.
Large-file handling and tuning: S3 CSVs are staged locally, profiled in chunks, and paged from the backend. Render tuning knobs such as WEB_CONCURRENCY, GUNICORN_TIMEOUT, CSV_CHUNK_SIZE, and staged-file cache settings are documented below.
Django backend: the API exposes S3 browsing, file processing, preview pagination, health checks, and persisted sanitized run metadata.
React frontend: the UI supports S3 connection, file browsing, file processing, schema review, manual overrides, reset/reprocess actions, and processed preview pagination.
Documentation and testing: the repo includes setup/deploy instructions, backend tests, frontend tests, and a local CLI for smoke testing with sample datasets.

`POST /api/data/process`

Processes the selected S3 object, stores the inferred schema, and returns the first preview page.

Request body:

{
  "access_key_id": "AKIA...",
  "secret_access_key": "secret",
  "session_token": "",
  "region": "ap-southeast-2",
  "bucket": "demo-bucket",
  "prefix": "incoming/",
  "object_key": "incoming/sample.csv",
  "sheet_name": "",
  "preview_row_limit": 100,
  "overrides": [
    { "column": "Score", "target_type": "float" }
  ]
}

Response highlights:

runId: stored processing run identifier used for later preview-page requests and display
schema: inferred or overridden schema metadata
previewRows: first processed page of rows
previewPage: page metadata including page, pageSize, totalRows, and totalPages

`POST /api/data/preview`

Returns a specific processed page using the current schema and file context. The API will reuse the stored ProcessingRun when it is available, but the request can also supply the preview context directly so paging still works in stateless or ephemeral deployment environments.

Request body:

{
  "access_key_id": "AKIA...",
  "secret_access_key": "secret",
  "session_token": "",
  "region": "ap-southeast-2",
  "bucket": "demo-bucket",
  "prefix": "incoming/",
  "run_id": 12,
  "object_key": "incoming/sample.csv",
  "file_type": "csv",
  "selected_sheet": "",
  "row_count": 14941,
  "schema": [
    {
      "column": "Score",
      "inferred_type": "float",
      "storage_type": "Float64",
      "display_type": "Float",
      "nullable": true,
      "confidence": 0.97,
      "warnings": [],
      "null_token_count": 0,
      "sample_values": ["90", "75"],
      "allowed_overrides": ["text", "integer", "float", "boolean", "date", "datetime", "category", "complex"]
    }
  ],
  "preview_columns": ["Score"],
  "page": 3,
  "page_size": 50
}

Testing

Backend:

python manage.py test
python manage.py check

Frontend:

cd frontend
npm test
npm run build

CLI smoke test:

python infer_data_types.py examples/sample_data.csv --preview-rows 5

Docker verification

Build the production image:

docker build -t rhombus-home-test .

Run the container locally:

docker run --rm -p 8000:8000 `
  -e DJANGO_SECRET_KEY=replace-me `
  -e DJANGO_DEBUG=False `
  -e DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1 `
  rhombus-home-test

Then open http://127.0.0.1:8000.

Render deployment

Render is the recommended public host for this project because it matches the app's single-container architecture and gives you one public URL for both the Django API and the React frontend.

Create the service

Sign in to Render and create a new Web Service.
Connect the GitHub repository BrownAssassin/Rhombus-AI-Home-Test.
Select the main branch.
Choose Docker as the runtime so Render builds from the repository Dockerfile.
Set the health check path to /api/health/.

Configure environment variables

Add these values in the Render dashboard before the first deploy:

DJANGO_SECRET_KEY: required secret value for Django
DJANGO_DEBUG=False

Optional settings:

DJANGO_ALLOWED_HOSTS: only needed if you later add a custom domain
DJANGO_CSRF_TRUSTED_ORIGINS: only needed if you later add a custom domain
DJANGO_SQLITE_PATH=/app/data/db.sqlite3: only needed if you later attach a persistent disk and want the SQLite file to survive redeploys
WEB_CONCURRENCY=1: recommended for the starter Render instance size
GUNICORN_TIMEOUT=180: gives longer CSV processing requests room to finish
CSV_CHUNK_SIZE=500: keeps CSV memory use conservative during profiling and paging
STAGED_FILE_CACHE_MAX_ITEMS=2: lets repeated paging and reprocessing reuse the same staged S3 file
STAGED_FILE_CACHE_TTL_SECONDS=900: expires staged files after 15 minutes to keep disk use bounded

Render automatically provides PORT, RENDER_EXTERNAL_HOSTNAME, and RENDER_EXTERNAL_URL, and the app is configured to trust those values without any extra setup.

Deploy and verify

Trigger the initial deploy from Render.
Open the generated onrender.com URL after the health check passes.
Confirm GET /api/health/ returns a healthy response.
Run one end-to-end flow through the UI with a real or demo S3 bucket.

Notes and limitations

AWS credentials are accepted at runtime and are intentionally not stored in the database.
CSV handling is chunked after staging the S3 object to a temp file, and repeat requests can reuse a small bounded local cache of staged files to avoid unnecessary re-downloads. If you set STAGED_FILE_CACHE_MAX_ITEMS=0, staged files are cleaned up after each request instead of being reused.
Excel handling is capped at 20 MB in this MVP, and each preview-page request reloads the selected sheet because pandas does not offer the same chunked read path as CSV.
Type inference is intentionally conservative. Ambiguous date columns stay as text unless the user overrides them.
Small repeated-label columns can infer as Category, but high-cardinality or mostly-unique string columns intentionally stay as Text.
The app supports paginated preview browsing across the processed dataset, but it does not export a full transformed file in this MVP.
A Render deployment that keeps SQLite in the container filesystem is suitable for demos, but ProcessingRun history resets whenever the service is rebuilt or restarted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rhombus-AI-Home-Test

What it does

Inference and performance approach

Stack

Project structure

Requirements

Local setup

1. Create the backend environment

2. Install the frontend dependencies

3. Apply migrations

Running the app locally

Backend plus built frontend

Split development mode

Environment variables

CLI usage

API summary

`POST /api/s3/files`

Project brief alignment

`POST /api/data/process`

`POST /api/data/preview`

Testing

Docker verification

Render deployment

Create the service

Configure environment variables

Deploy and verify

Notes and limitations

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
backend		backend
docker		docker
docs/brief		docs/brief
examples		examples
frontend		frontend
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
infer_data_types.py		infer_data_types.py
manage.py		manage.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Rhombus-AI-Home-Test

What it does

Inference and performance approach

Stack

Project structure

Requirements

Local setup

1. Create the backend environment

2. Install the frontend dependencies

3. Apply migrations

Running the app locally

Backend plus built frontend

Split development mode

Environment variables

CLI usage

API summary

POST /api/s3/files

Project brief alignment

POST /api/data/process

POST /api/data/preview

Testing

Docker verification

Render deployment

Create the service

Configure environment variables

Deploy and verify

Notes and limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

`POST /api/s3/files`

`POST /api/data/process`

`POST /api/data/preview`