Skip to content

jfobrycki/birthdefectdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Final Project, December 2019

Overview

The goal of this map is to identify areas that have birth defect surveillance programs and other areas where programs could be started by comparing locations with surveillance programs to total birth numbers by country. These surveillance programs are important for collecting and combining data about the types of birth defects that are occurring across regions and around the world. This information can then be used to identify which birth defects are most common and if there are temporal or geographic trends in the birth defects distribution. A further discussion about the limitations of this current map and avenues for next steps are discussed in the [Future Efforts] (#future-efforts) section.

The final submission for this project is a static map page.

Data sources

This project uses three primary datasources. More information about how data sources were combined and edited is provided in the Methodology section

Total births

  • The United Nations maintains projections of global birth rates by countries and by region. This resource is available from this downloads page. The specific dataset of interest is Fertility Indicators.
  • Birth defects surveillance

  • The International Clearinghouse for Birth Defects Surveillance and Researchis one place where multiple entities combine data about birth defects. There are other places where this information is combined, for example EUROCAT. For this map, the ICBDSR provides a list of places where surveillance programs occur. This list can be copied and pasted into a text document or excel file.
  • Country maps

  • A shapefile that contains countries of the world is available from Natural Earth, specifically the Admin 0 – Countries option without boundary lakes. A direct download of this file is available here.

    Data summary

    1. The UN Fertility Indicators dataset.
    2. A list of places where surveillance programs occur from the ICBDSR.
    3. The Natural Earth country shapefile without boundary lakes.

    Methodology

    Datasets

    The UN Fertility Indicators

    This dataset can be loaded into QGIS and will appear as WPP2019_Fertility_By_Age. As noted on the UN's page listed above, this file contains

  • Fertility indicators, by age, annualy and 5-year periods, from 1950-1955 to 2095-2100.
  • ASFR: Age-specific fertility rate (births per 1,000 women)
  • PASFR: Percentage age-specific fertility rate
  • Births: Number of births, both sexes combined (thousands)
  • The key variable of interest is the Birth column. Rather than looking at all 5-year ranges, the final map targets the next 5 year time period of 2020 to 2025 and shows the total number of births for each country.

    After opening the table of contents in QGIS, there are several steps that should be taken care of using SQL. Each country has several rows of data, including number of births by mother's age, the different 5-year ranges going up to 2100, and several different population growth estimates.

    For this map, I selected the following options as shown in the SQL command I used.

    To apply this SQL, I first created a new virtual layer by navigating to Layer > Create Layer > New Virtual Layer in QGIS.

    The code selects five variables that will be carried into the new virtual layer. These variables are listed below the SELECT command and including the sum of births. By summing across mother's age categories, I could generate an estimate for the total number of births in a country. The FROM command specified the new virtual layer should be created from the WPP2019_Fertility_By_Age dataset. The WHERE command specified that only the table rows from the in which MidPeriod was equal to 2023 and the estimate population growth (Variant) was not changes ("No change"). Finally, the LocID column provided unique country codes that worked as a grouping variable.

    The result was a new virtual layer that looked like this.

    In the image, row 236 is highlighted that shows the results for the United States. This cell indicates that between 2020 and 2025 (5 years), about 20,093,000 children will be born. The birth column results are in thousands as noted above.

    To double check this, I visited the CDCs page about Births and Natality. This page showed the total number of births in 2018 was about 3.8 million. An estimate of 20 million births over 5 years seems reasonable and this indicates the SQL function worked correctly.

    The ICBDSR Locations

    The data available on ICBDSR locations was only available as a table on the website. This information was copied and pasted into an excel document called Surveillance_Locations_v01. I used text to columns to split the data into columns. After editing and adding some columns, I saved this file as a csv.

    Countries of the world

    The shapefile from Natural Earth was downloaded, unzipped,and loaded into QGIS.

    Merging

    A challenge in making the final map is that there were three datasets without a common linking code or country numbering system. An initial spatial join btween the Natural Earth shapefile and the five year birth totals by country resulted in over 30 countries that were missed due to spelling or naming variations. One option that I tried was fuzzy matching that could provide a method for partial spelling matches.

    The option I used for getting the different data sources to have a common element for merging was to create the column by hand. This involved using Excel and conditional formatting to identify where spelling diffences were occuring between country names betwen the Natural Earth list of countries and the new virtual layer made from the UN dataset. Along with the country spelling, I used the three letter country code available in the Natural Earth dataset, ADM0_A3.

    The end result of this process was the following:

  • The edited UN dataset had similar country names as the Natural Earth dataset and the edited UN dataset had a column of three letter country codes.
  • The list of surveillance programs also had the same spelling as the Natural Earth dataset and the three letter country code.
  • Data note
    When working with the edited UN dataset as a csv file, I rounded the number of births to the nearest whole number. Without rounding, QGIS did not seem to be reading the value correctly and I was not able to convert to a number. After rounded to the nearest whole number, the data column of births was read as a string and this could be converted to a number.

    Now that the three data files had similar naming systems, joins were used to add the UN birth data and the ICBDSR data to the Natural Earth shapefile.

    Mapping

    WIth all data joined together, two types of visualizations were conducted.

    The first created a chloropleth map of the total number of births by country. A graduated, single color ramp was selected because the values shown were a single variable that was increasing in magnitude. Four categories were selected to show contrasts among the countries but to avoid having too many shades of green on the map.

    The second visualization highlighted the countries where some form of birth defect surveillance was occurring. Rather than showing both pieces of information in one map, I chose to show the information in two maps. Data can be considered to be illuminating as it provides new information on a given topic. With incomplete data, an entire situation might be unable to be characterized. I selected this visual contrast through the two maps. A single map with country outlines showing which countries have birth defects surveillance is a possibility too.

    For display, I selected the World Robinson projection, EPSG 54030 because it provided a globe-like feel to the map by having the edges be rounded. I found this projection listed on a page from the The Future Mapping Company.

    Limitations

    These maps show the countries that are listed on the ICBDSR as having surveillance programs. There are many surveillance programs that can occur at local, regional, and national levels. This map is not intended to convey that collectively we have no information about birth defects in other countries. Instead, the map is intended to help spur discussion about ways to help organize and combine datasets across the world to improve our collective understanding of birth defects. Additionally, surveillance programs do not always operate on a national scale. These maps show an entire country because that was a reasonable merge to the Natural Earth dataset. There are multi-country efforts that were not shown on these maps.

    Future efforts

  • Create a map that shows the coverage areas of birth defect surveillance programs within countries
  • Add layers that indicate which parts of the world are covered by other birth defect surveillance programs
  • Generate estimates for the total number of births that are covered by surveillance programs
  • Analyze locations where new surveillance programs could be started to help increase our collective understanding of birth defects
  • A zoomable, interactive map that shows the locations of hospitals and clinics that participate in surveillance programs with shaded areas indicating the areas served by each hospital or clinic.

    Document Summary

    The final documents used for these two maps are as follows:

  • The Five Year Birth Totals by Country - calculated from the UN dataset, as a csv file
  • The locations of each surveillance program - edited from the ICBDSR website
  • The countries shapefile from Natural Earth
  • The map of total births between 2020 and 2025 available as a 1200 px or 8000 px wide PNG.
  • The map of birth surveillance locations availabe as a 1200 px or 8000 px wide PNG.
  • About

    No description, website, or topics provided.

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages