Intelligenz Seddeler Archive

Archive and analysis tools for Norske Intelligenssedler, Norway's first newspaper (1763-1920s).

About

This project provides tools to programmatically download and analyze the complete digital collection of Norske Intelligenssedler from the Norwegian National Library.

Collection Statistics:

4,270 digitized issues
Published: 1763-1920s
Location: Oslo
License: Public Domain / Creative Commons

Quick Start

# 1. Build corpus index of all 4,270 available issues
python3 scripts/build_corpus.py --output corpus_index.json

# 2. Download newspapers from a specific time period
python3 scripts/download.py --from-year 1768 --to-year 1770 --limit 10

# 3. Modernize old Norwegian text using AI (requires Ollama)
python3 scripts/modernize_text.py --input data/1768/ --output modernized/

LLM-Powered Text Modernization

This project includes AI-powered modernization of 18th-19th century Norwegian using two LLM options:

Option 1: Local AI with Ollama (Free, Private)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull language model
ollama pull llama3.2:3b

# Modernize texts locally
python3 scripts/modernize_text.py --input data/ --output modernized/

Option 2: Cloud AI with OpenRouter (Best Quality)

# Set up API key (get from https://openrouter.ai/keys)
export OPENROUTER_API_KEY="your_key_here"
# Or: cp .env.example .env  # and add your key

# Modernize with Claude 3.5 Sonnet (highest quality)
python3 scripts/modernize_text.py --openrouter --input data/ --output modernized/

# Or use a different model
python3 scripts/modernize_text.py --openrouter --model "google/gemini-pro" --input data/

Features

✅ OCR error correction - Fixes mistakes in old scanned text
✅ Language modernization - Old Norwegian (1700s-1900s) → Modern bokmål
✅ Automatic summarization - Brief summaries of each issue
✅ Named entity extraction - People, places, and events
✅ Two backends - Local (Ollama) or Cloud (OpenRouter)

Project Structure

intelligenz/
├── README.md                 # This file
├── WARP.md                   # Internal project documentation
├── shell.nix                 # Nix development environment
├── requirements.txt          # Python dependencies
├── scripts/
│   ├── build_corpus.py       # Fetch list of all issues
│   ├── download.py           # Download newspaper content
│   └── extract_data.py       # Extract and export data
└── data/                     # Downloaded newspaper issues
    └── {year}/{month}/{day}/{issue}.json

Usage Examples

Search for specific content

import dhlab as dh

# Create a corpus of all issues
corpus = dh.Corpus(
    doctype="digavis",
    title="Norske Intelligenssedler",
    from_year=1763,
    to_year=1920
)

# Find concordances for a word
results = corpus.conc(words="handel")

Analyze word frequency over time

from dhlab.api.dhlab_api import ngram_newspapers

# Get frequency of "handel" (trade) over time
freq = ngram_newspapers(
    word="handel",
    title="Norske Intelligenssedler"
)

Data Format

Each downloaded issue is stored as JSON:

{
  "urn": "URN:NBN:no-nb_digavis_norskeintelligenssedler_null_null_17680420_6_16_1",
  "title": "Norske Intelligenssedler",
  "date": "17680420",
  "year": 1768,
  "pages": 4,
  "text": "Full OCR text content...",
  "metadata": {
    "publisher": "...",
    "language": "Norsk NOR"
  }
}

API Documentation

DHLAB Python Library: https://nationallibraryofnorway.github.io/DHLAB/
NB Catalog API: https://api.nb.no/catalog/v1/
DHLAB API: https://api.nb.no/dhlab/

Requirements

Nix package manager
Python 3.10+
Internet connection for API access

License

Code: MIT License Data: Public Domain / CC-BY-NC-ND (varies by issue date)

Acknowledgments

Data provided by the Norwegian National Library (Nasjonalbiblioteket) through their DHLAB API.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
analysis		analysis
raw_responses		raw_responses
scripts		scripts
searches		searches
translations		translations
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.secrets		.secrets
ARCHIVE_STRUCTURE.md		ARCHIVE_STRUCTURE.md
CHRISTIANSANDS_OCR_COMPLETE.md		CHRISTIANSANDS_OCR_COMPLETE.md
CHRISTIANSANDS_RESULTS.md		CHRISTIANSANDS_RESULTS.md
CHRISTIANSANDS_SEARCH.md		CHRISTIANSANDS_SEARCH.md
DATABASE_ARCHITECTURE.md		DATABASE_ARCHITECTURE.md
NAMING_EXAMPLES.md		NAMING_EXAMPLES.md
NEXT_STEPS.md		NEXT_STEPS.md
PROJECT_STATUS.md		PROJECT_STATUS.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QWEN_OCR_RESULTS.md		QWEN_OCR_RESULTS.md
README.md		README.md
SUMMARY.md		SUMMARY.md
SUMMARY_old.md		SUMMARY_old.md
USAGE_GUIDE.md		USAGE_GUIDE.md
WARP.md		WARP.md
analysis_log.txt		analysis_log.txt
corpus_index.json		corpus_index.json
download_1763_1772_log.txt		download_1763_1772_log.txt
download_log.txt		download_log.txt
final_run.txt		final_run.txt
final_translation_log.txt		final_translation_log.txt
fresh_translation_log.txt		fresh_translation_log.txt
gemini2_log.txt		gemini2_log.txt
intelligenz.db		intelligenz.db
oldest_newspaper_analysis.json		oldest_newspaper_analysis.json
oldest_newspaper_translation.txt		oldest_newspaper_translation.txt
pixtral_log.txt		pixtral_log.txt
requirements.txt		requirements.txt
run_translation.sh		run_translation.sh
shell.nix		shell.nix
speed_test_models.sh		speed_test_models.sh
test_corpus.json		test_corpus.json
test_models.sh		test_models.sh
test_openrouter.sh		test_openrouter.sh
translate_final_log.txt		translate_final_log.txt
translation_log.txt		translation_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligenz Seddeler Archive

About

Quick Start

LLM-Powered Text Modernization

Option 1: Local AI with Ollama (Free, Private)

Option 2: Cloud AI with OpenRouter (Best Quality)

Features

Project Structure

Usage Examples

Search for specific content

Analyze word frequency over time

Data Format

API Documentation

Requirements

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intelligenz Seddeler Archive

About

Quick Start

LLM-Powered Text Modernization

Option 1: Local AI with Ollama (Free, Private)

Option 2: Cloud AI with OpenRouter (Best Quality)

Features

Project Structure

Usage Examples

Search for specific content

Analyze word frequency over time

Data Format

API Documentation

Requirements

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages