Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3901613
Fix 2020 official communications and cadre management data and folder…
zaeema-n Jan 19, 2026
b4bf8e6
Create 2020 flat folder structure and manifest yaml file
zaeema-n Jan 19, 2026
2f1046e
Create schema for major and minor kinds and relationship names
zaeema-n Jan 19, 2026
f2e2d88
refactor: Singularize minister, category, subcategory, and department…
zaeema-n Jan 19, 2026
2935a36
Create pydantic models schema
zaeema-n Jan 19, 2026
b438c55
Add ingestion and read services
zaeema-n Jan 19, 2026
e05787e
Add a yaml parser
zaeema-n Jan 19, 2026
52e2b6e
Add date utils
zaeema-n Jan 19, 2026
9a4c01d
feat: Implement entity resolver for ministers and departments
zaeema-n Jan 19, 2026
8ed303f
Create ingestion script for traversing down through the yaml file (wi…
zaeema-n Jan 19, 2026
d3e163f
refactor: Remove docstrings from various functions in ingestion and e…
zaeema-n Jan 20, 2026
729acaa
Add file traversal tests
zaeema-n Jan 20, 2026
53ebf54
Implement api refetch logic
zaeema-n Jan 21, 2026
ea5550e
Move ingestion files into ingestion folder
zaeema-n Jan 21, 2026
f4d08da
feat: Add logging for error handling in entity resolver service
zaeema-n Jan 21, 2026
cdb960b
Correct direction of as_department relationships
zaeema-n Jan 22, 2026
67ed080
Move data into statistics folder
zaeema-n Jan 22, 2026
198fda8
Added category creation logic
zaeema-n Jan 22, 2026
d42fa8d
Fix entity creation payload
zaeema-n Jan 22, 2026
b686b60
Correct category creation logic
zaeema-n Jan 22, 2026
9bcdcc7
Fixed category existence check
zaeema-n Jan 22, 2026
2d4b007
Add subcategory creation
zaeema-n Jan 22, 2026
23fde7e
Implement dataset attribute addition with validation and time period …
zaeema-n Jan 22, 2026
db4269f
Add dataset creation logic
zaeema-n Jan 22, 2026
ef2bc21
Modify comments for readability
zaeema-n Jan 23, 2026
d5706eb
Remove unused params and imports
zaeema-n Jan 23, 2026
987eddf
Fix file structure for ministry of foreign relations
zaeema-n Jan 23, 2026
567dc8d
Corrected cadre management data for 2022
zaeema-n Jan 23, 2026
aa10546
Remove year from create_category
zaeema-n Jan 23, 2026
5c528df
Append year to end of dataset name
zaeema-n Jan 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
__pycache__/*
# Python cache
__pycache__/
**/__pycache__/
*.py[cod]
*$py.class
*.so

# Environment
.DS_Store
docs/.DS_Store
.env

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

{
"columns": ["Category","Approved Cadre", "Existing", "Vacancies", "Excess"],
"rows": [
["Senior", 98, 86, 22, 10],
["Tertiary", 8, 2, 6, 0],
["Secondary", 363, 345, 28, 10],
["Primary", 131, 147, 7, 23]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Staff of the Ministry",
"column_count": 5,
"row_count": 4,
"node_count": null,
"edges_count": null,
"resource": "https://mfa.gov.lk/en/wp-content/uploads/2023/06/APR-2020_-Trilingual-Book.pdf",
"resource_page": "741",
"resource_section": "06.1 Cadre Management",
"resource_identifier": "Staff of the Ministry as at 31.12.2020",
"extracted_date": "2025-01-19",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": ""
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@

{
"columns": ["Category","Approved Cadre", "Existing", "Vacancies/Excess"],
"rows": [
["Senior/ Tertiary (Home Based)", 257, 179, 78],
["Secondary (Home Based)", 263, 245, 18],
["Primary (Home Based)", 7, 6, 1],
["Secondary (Locally Recruited)", 255, 174, 81],
["Primary (Locally Recruited)", 326, 278, 48]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Staff of Missions",
"column_count": 4,
"row_count": 5,
"node_count": null,
"edges_count": null,
"resource": "https://mfa.gov.lk/en/wp-content/uploads/2023/06/APR-2020_-Trilingual-Book.pdf",
"resource_page": "741",
"resource_section": "06.1 Cadre Management",
"resource_identifier": "Staff of Missions",
"extracted_date": "2025-01-19",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": ""
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"columns": ["District", "Number of Rooms"],
"rows": [
["Colombo", 8396],
["Galle", 6370],
["Gampaha", 3956],
["Kandy", 3482],
["Kalutara", 3465],
["Matale", 1942],
["Hambantota", 1863],
["Nuwara Eliya", 1850],
["Matara", 1717],
["Badulla", 1484],
["Anuradhapura", 1301],
["Puttalam", 1185],
["Batticaloa", 949],
["Ampara", 720],
["Polonnaruwa", 678],
["Trincomalee", 671],
["Ratnapura", 605],
["Moneragala", 544],
["Jaffna", 522],
["Kurunegala", 454],
["Kegalle", 365],
["Vavuniya", 81],
["Kilinochchi", 63],
["Mullaitivu", 50],
["Mannar", 37]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Accommodations by District",
"column_count": 2,
"row_count": 25,
"node_count": null,
"edges_count": null,
"resource": "https://www.sltda.gov.lk/storage/common_media/Annual%20Stastistical%20Report%202021%20-Final%2025.4.20223624932970.pdf",
"resource_page": "31",
"resource_section": "Distribution of accommodation capacity by districts -2020",
"resource_identifier": "Chart 17. II",
"extracted_date": "2025-09-04",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": "Dataset contains accommodation statistics by district for 2020, showing the number of tourists in each of the 25 districts of Sri Lanka. Columns: District and Number of Tourists. Data ranges from Colombo (8,396 tourists) to Mannar (37 tourists)."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"columns": ["Province", "Number of Rooms"],
"rows": [
["Western Province", 15817],
["Southern Province", 9950],
["Central Province", 7274],
["Eastern Province", 2340],
["Uva Province", 2028],
["North Central Province", 1979],
["North Western Province", 1639],
["Sabaragamuwa Province", 970],
["Northern Province", 753]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Accommodations by Province",
"column_count": 2,
"row_count": 9,
"node_count": null,
"edges_count": null,
"resource": "https://www.sltda.gov.lk/storage/common_media/Annual%20Stastistical%20Report%202021%20-Final%2025.4.20223624932970.pdf",
"resource_page": "30",
"resource_section": "Distribution of accommodation capacity by province – 2020",
"resource_identifier": "Chart 17: I",
"extracted_date": "2025-09-04",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": "Dataset contains accommodation statistics by province for 2020, showing the number of rooms in each of the 9 provinces of Sri Lanka. Columns: Province and Number of Rooms. Data ranges from Western Province (15,817 rooms) to Northern Province (753 rooms)."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"columns": ["Year", "Value in Rupees(Mn)", "Value in USD(Mn)"],
"rows": [
["2020",126608.1, 682.5]
]
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Annual Tourism Receipts",
"column_count": 3,
"row_count": 1,
"node_count": null,
"edges_count": null,
"resource": "https://www.sltda.gov.lk/storage/common_media/Annual%20Stastistical%20Report%202021%20-Final%2025.4.20223624932970.pdf",
"resource_page": "7",
"resource_section": "Highlight of the year - 2020",
"resource_identifier": "Foreign exchange earnings",
"extracted_date": "2025-09-07",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": "Dataset contains annual tourism receipts for 2020. Columns: Year, Value in Rupees(Mn), and Value in USD(Mn). Data shows tourism receipts of 126,608.1 million rupees (682.5 million USD) for the year 2020."
}
11 changes: 11 additions & 0 deletions data/statistics/2020_flat/datasets/Arrivals by Age/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"columns": ["Age Group", "Percentage"],
"rows": [
["3-19", 8.1],
["20-29", 16.8],
["30-39", 24.5],
["40-49", 15.9],
["50-59", 15.5],
["60 & Over", 19.2]
]
}
16 changes: 16 additions & 0 deletions data/statistics/2020_flat/datasets/Arrivals by Age/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Arrivals by Age",
"column_count": 2,
"row_count": 6,
"node_count": null,
"edges_count": null,
"resource": "https://www.sltda.gov.lk/storage/common_media/Annual%20Stastistical%20Report%202021%20-Final%2025.4.20223624932970.pdf",
"resource_page": "26",
"resource_section": "Distribution of tourists by age group (percentage) –2020",
"resource_identifier": "Chart 14",
"extracted_date": "2025-09-04",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": "Dataset contains tourist arrival statistics by age group for 2020, showing the distribution of tourists by age as percentages. Columns: Age Group and Distributions of tourists by age (percentage). Data includes 6 age groups ranging from 3-19 (8.1%) to 60 & Over (19.2%), with the 30-39 age group having the highest percentage (24.5%)."
}
35 changes: 35 additions & 0 deletions data/statistics/2020_flat/datasets/Arrivals by Carrier/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"columns": ["Airline", "Number of Passengers"],
"rows": [
["Aeroflot", 17943],
["Air Arabia", 8880],
["Air Asia", 8151],
["Air China", 3004],
["Air India", 16029],
["Cathay Pacific", 4316],
["China Eastern Airlines", 5944],
["China Southern Airlines", 770],
["Chongqing", 987],
["Edelweiss", 3934],
["Emirates", 62845],
["Etihad Airways", 11653],
["Fly Dubai", 14215],
["Gulf Air", 2045],
["Indigo", 21968],
["Korean Airlines", 3687],
["Kuwait Airways", 3652],
["Lot Polish", 7922],
["Malaysian Airlines", 4098],
["Oman Air", 8110],
["Qatar Airways", 59882],
["Saudia", 1342],
["Silk Air", 2631],
["Singapore Airlines", 14],
["Spicejet", 8033],
["Sri Lankan Airlines", 148251],
["Thai Airways", 2805],
["Turkish Airlines", 4520],
["Ukraine International Airlines", 222],
["Charter Flights", 41680]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"storage_type": "tabular",
"dataset_name": "Arrivals by Carrier",
"column_count": 2,
"row_count": 30,
"node_count": null,
"edges_count": null,
"resource": "https://www.sltda.gov.lk/storage/common_media/Annual%20Stastistical%20Report%202021%20-Final%2025.4.20223624932970.pdf",
"resource_page": "63",
"resource_section": "Tourist Arrivals by Country of Residence and Carrier -2020",
"resource_identifier": "Table 08",
"extracted_date": "2025-09-04",
"extracted_by": "Lanka Data Foundation",
"verified_by": "Lanka Data Foundation",
"remarks": "Dataset contains tourist arrival statistics by airline carrier for 2020, showing the number of passengers for each airline. Columns: Airline and Number of Passengers. Data includes 30 airlines with Sri Lankan Airlines having the highest number of passengers (148,251), followed by Emirates (62,845) and Qatar Airways (59,882)."
}
Loading