Website, database schema, and data loading scripts for Mapping the Pinkertons.
- activities: Main table containing surveillance activities/events
- locations: Normalized location data (bars, neighborhoods, streets, etc.)
- activity_locations: Junction table linking activities to locations
- PostgreSQL 12+
- Python 3.13+ (specified in
.python-version) - golang-migrate (for migrations)
uv- Python package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# After installing uv:
uv install# Create database (if using dedicated database)
createdb detectives
# OR if using shared database (e.g., apiary)
# Just set DB_NAME in .env to the shared database name
# The migrations will create the 'detectives' schema within that databaseNote: This project can work in a shared database (e.g., apiary) alongside other projects. It uses:
- A dedicated PostgreSQL schema:
detectives - A dedicated migration tracking table:
detectives_schema_migrations
This prevents conflicts with other projects in the same database.
cp .env.example .envMigrations require the use of golang-migrate. You can use the Makefile targets which automatically use your .env configuration:
# Run migrations
make migrate-up
# Check migration version
make migrate-version
# Rollback if needed
make migrate-downMigration Isolation: This project uses a custom migration tracking table (detectives_schema_migrations) to allow multiple projects to coexist in the same PostgreSQL database. The Makefile automatically includes x-migrations-table=detectives_schema_migrations in the database URL.
Troubleshooting Migrations:
- If you get "relation already exists" errors on a database where tables exist but migration tracking is missing, force the migration version:
migrate -path db/migrations -database "$(DB_URL)&x-migrations-table=detectives_schema_migrations" force 4 - If migrations fail mid-way (dirty state), use
migrate force <last_good_version>to reset
uv run utils/load_data.py data/el_paso.csvThe script will:
- Parse dates, times, and durations
- Convert Yes/No values to booleans
- Normalize and deduplicate locations
- Create activity-location relationships
- Show progress and summary statistics
Database credentials are loaded from the .env file. The script uses these environment variables:
DB_NAME- Database name (default:detectives)DB_USER- Database user (default:postgres)DB_PASSWORD- Database password (default:postgres)DB_HOST- Database host (default:localhost)DB_PORT- Database port (default:5432)DB_SCHEMA- Database schema name (default:detectives)
The .env.example file includes commented sections for production, development, and local environments. Simply uncomment the block you want to use.
The script handles various duration formats:
5h45m→ 5 hours 45 minutes10h→ 10 hours[10h total]→ 10 hours (brackets and "total" removed)
- Converts
HH:MMformat to time objects - Ignores text values like "Evening", "Afternoon"
Yes→ trueNo→ false- Empty/other → null
Locations are matched on the combination of:
- locality
- street_address
- location_name
This prevents duplicate location records while allowing for nulls in any field.
Note: All queries must use the detectives schema prefix (e.g., detectives.activities).
-- Find all surveillance activities in Smeltertown
SELECT a.*, l.locality, l.location_name
FROM detectives.activities a
JOIN detectives.activity_locations al ON a.id = al.activity_id
JOIN detectives.locations l ON al.location_id = l.id
WHERE l.locality LIKE 'Smeltertown%';
-- Count activities by operative
SELECT operative, COUNT(*) as activity_count
FROM detectives.activities
WHERE operative IS NOT NULL
GROUP BY operative
ORDER BY activity_count DESC;
-- Find all activities at bars
SELECT a.date, a.mode, a.subject, l.location_name, l.locality
FROM detectives.activities a
JOIN detectives.activity_locations al ON a.id = al.activity_id
JOIN detectives.locations l ON al.location_id = l.id
WHERE l.location_type = 'Bar'
ORDER BY a.date;
-- Activities by date range
SELECT date, COUNT(*) as activity_count
FROM detectives.activities
WHERE date BETWEEN '1939-07-27' AND '1939-08-15'
GROUP BY date
ORDER BY date;