Access, retrieve, and work with Canadian Census data and geography.
pycancensus is a Python package that provides integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper API. This package produces analysis-ready tidy DataFrames and spatial data in multiple formats, with full equivalence to the R cancensus library.
- Full R Library Equivalence: Verified 100% data compatibility with R cancensus
- Enhanced API Reliability: Production-grade error handling and retry logic
- Vector Hierarchy Functions: Navigate census variable relationships like R
- Improved Data Quality: Fixed column naming and data processing issues
- Comprehensive Testing: 450+ integration tests covering real-world scenarios
- National-Level Support: Added level='C' for Canada-wide baseline comparisons
- Download Census data and geography in analysis-ready format
- Support for multiple Census years: 2021, 2016, 2011, 2006, 2001, 1996
- All Census geographic levels: PR, CMA, CD, CSD, CT, DA, EA, DB
- Taxfiler data at Census Tract level (2000-2018)
list_census_vectors()- Browse all available variablessearch_census_vectors()- Search variables by keywordparent_census_vectors()- Navigate variable hierarchies upwardchild_census_vectors()- Navigate variable hierarchies downwardfind_census_vectors()- Enhanced variable search with fuzzy matching
- GeoPandas integration for spatial analysis
- Multiple resolution options (simplified/high)
- Seamless geometry + data integration
- Production-grade error handling with helpful messages
- Automatic retry logic with exponential backoff
- Connection pooling for improved performance
- Rate limiting to respect API constraints
- Comprehensive caching system
Install from PyPI:
pip install pycancensusOr install the latest development version from GitHub:
pip install git+https://github.com/dshkol/pycancensus.gitFor development:
git clone https://github.com/dshkol/pycancensus.git
cd pycancensus
pip install -e .[dev]pycancensus requires a valid CensusMapper API key to use. You can obtain a free API key by signing up for a CensusMapper account.
Set your API key as an environment variable:
export CANCENSUS_API_KEY="your_api_key_here"Or set it programmatically:
import pycancensus as pc
pc.set_api_key("your_api_key_here")Full documentation is available at pycancensus.readthedocs.io
The documentation includes:
- Getting Started Tutorial - Learn the basics
- Working with Geographic Data - Maps and spatial analysis
- Example Gallery - Real-world usage examples
- API Reference - Complete function documentation
- R to Python Migration Guide - For R cancensus users
import pycancensus as pc
# Set your API key
pc.set_api_key("your_api_key_here")
# List available datasets
datasets = pc.list_census_datasets()
# Discover variables with new hierarchy functions
vectors = pc.list_census_vectors("CA21")
income_vars = pc.search_census_vectors("income", "CA21")
related_vars = pc.child_census_vectors("v_CA21_1", dataset="CA21")
# Get census data
data = pc.get_census(
dataset="CA21",
regions={"CMA": "35535"}, # Toronto CMA
vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"], # Population by gender
level="CSD"
)
# Get census data with geography for mapping
geo_data = pc.get_census(
dataset="CA21",
regions={"PR": "35"}, # Ontario
vectors=["v_CA21_1"], # Total population
level="CSD",
geo_format="geopandas" # Returns GeoDataFrame
)
# Advanced: Compare multiple Census years
data_2021 = pc.get_census("CA21", {"CSD": "5915022"}, ["v_CA21_1"], "CSD")
data_2016 = pc.get_census("CA16", {"CSD": "5915022"}, ["v_CA16_401"], "CSD")# Search for housing-related variables
housing = pc.search_census_vectors("dwelling", "CA21")
# Navigate variable hierarchies
population_base = "v_CA21_1"
breakdowns = pc.child_census_vectors(population_base, dataset="CA21")
parent_categories = pc.parent_census_vectors(population_base, dataset="CA21")
# Enhanced search with fuzzy matching
income_vectors = pc.find_census_vectors("CA21", "median household income")pycancensus includes production-grade error handling:
from pycancensus.resilience import CensusAPIError, RateLimitError
try:
data = pc.get_census("CA21", {"PR": "35"}, ["v_CA21_1"], "PR")
except RateLimitError as e:
print(f"Rate limited: {e}")
print(f"Retry after: {e.retry_after} seconds")
except CensusAPIError as e:
print(f"API error: {e}")
print(f"Suggestion: {e.suggestion}")pycancensus includes comprehensive testing to ensure reliability and R equivalence:
- 4/4 tests passing with full data equivalence
- Identical results for vector listing, data retrieval, and multi-region queries
- Automated testing against R cancensus library
- 6 real-world scenarios covering typical data analysis workflows
- Provincial population analysis, demographic breakdowns, income analysis
- Vector hierarchy navigation, time series comparisons, geographic analysis
- Performance benchmarking with large datasets
- Error handling with invalid regions/vectors
- Large dataset performance testing
- API resilience and retry logic validation
# Run the test suite
python -m pytest tests/ -v
# Run cross-validation against R
python tests/cross_validation/test_r_equivalence.py
# Run integration scenarios
python tests/integration/test_comprehensive_scenarios.pySee tests/cross_validation/results/ for detailed test results and validation reports.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Development setup
- Running tests
- Code style (Black, flake8)
- Submitting pull requests
- Reporting issues
This project is licensed under the MIT License - see the LICENSE file for details.
This package is explicitly a python port of the R cancensus package.
Subject to the Statistics Canada Open Data License Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source:
Acknowledgment of Source
(a) You shall include and maintain the following notice on all licensed rights of the Information:
- Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.
(b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:
- Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.