🎓 Yokatlas University Scraper

A robust Python-based scraping tool designed to extract comprehensive university program data from YÖK Atlas. It captures rankings, quotas, and score requirements across all major Turkish university entrance exam score types.

🚀 Quick Start

Get everything ready and running in minutes:

# 1. Install dependencies
uv sync

# 2. Run the full pipeline (Scrape all types + Finalize + Analytics)
python sync.py --headless

🛠️ Components

The project consists of several specialized scripts:

Script	Purpose
`sync.py`	Orchestrator: Runs the full pipeline (scraping -> finalization -> analytics).
`main.py`	Scraper: The core engine using Selenium and BeautifulSoup.
`finalize.py`	Processor: Normalizes, cleans, and merges raw JSON files into `data.json`.
`analytics.py`	Reporter: Provides high-level statistics about the collected data.

📊 How to Add a New Year's Data

Adding data for a new academic year is now a single-command process.

The Easy Way (Recommended)

Use the sync.py script to update everything:

python sync.py --year 2026 --headless

The Granular Way

If you want to scrape specific score types only:

python main.py --score-type say --year 2026 --output data_2026_say.json

Tip

The scraper stores year on each record. The deduplication key is code:year, allowing multiple years to exist within the same file without conflict.

⚙️ Advanced Usage

`main.py` Options

--score-type {say,ea,soz,dil,tyt}: Specific score type (default: say).
--output FILE: Custom output path.
--headless: Run without a browser window.
--year YEAR: Specific year to target.
--all-types: Scrape all types sequentially (with built-in delays).

Pipeline Flow

Scrape: main.py extracts data into universities_data_{type}.json.
Normalize: finalize.py cleans up fields (like "Doldu#" -> "Doldu") and merges files into data.json.
Analyze: analytics.py prints a summary of the dataset.

📦 Data Format

The final data.json contains records with the following structure:

{
  "code": "203910830",
  "year": 2025,
  "university_name": "KOÇ ÜNİVERSİTESİ",
  "name": "Karşılaştırmalı Edebiyat",
  "attributes": ["İngilizce", "Burslu", "4 Yıllık"],
  "city": "İSTANBUL",
  "university_type": "Vakıf",
  "scholarship_type": "Burslu",
  "education_type": "Örgün",
  "total_quota": ["3+0", "3+0", "3+0", "3+0"],
  "quota_status": "Doldu",
  "filled_quota": ["3", "3", "3", "3"],
  "max_rank": ["215", "606", "516", "513"],
  "min_score": ["536,38093", "503,50496", "521,12754", "519,65975"],
  "score_type": "dil"
}

🔧 Prerequisites

Python 3.13+
Google Chrome (required for Selenium)
Dependencies: selenium, beautifulsoup4, requests, lxml, webdriver-manager

🛡️ Features

✅ Resume Capability: Skips already scraped programs if interrupted.
✅ Incremental Saving: Saves data after every page.
✅ Anti-Detection: Uses random user agents and realistic delays.
✅ Memory Efficient: Headless mode support for low resource usage.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
analytics.py		analytics.py
finalize.py		finalize.py
main.py		main.py
pyproject.toml		pyproject.toml
sync.py		sync.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Yokatlas University Scraper

🚀 Quick Start

🛠️ Components

📊 How to Add a New Year's Data

The Easy Way (Recommended)

The Granular Way

⚙️ Advanced Usage

`main.py` Options

Pipeline Flow

📦 Data Format

🔧 Prerequisites

🛡️ Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎓 Yokatlas University Scraper

🚀 Quick Start

🛠️ Components

📊 How to Add a New Year's Data

The Easy Way (Recommended)

The Granular Way

⚙️ Advanced Usage

main.py Options

Pipeline Flow

📦 Data Format

🔧 Prerequisites

🛡️ Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main.py` Options

Packages