- Final Project
The goal of this map is to identify areas that have birth defect surveillance programs and other areas where programs could be started by comparing locations with surveillance programs to total birth numbers by country. These surveillance programs are important for collecting and combining data about the types of birth defects that are occurring across regions and around the world. This information can then be used to identify which birth defects are most common and if there are temporal or geographic trends in the birth defects distribution. A further discussion about the limitations of this current map and avenues for next steps are discussed in the [Future Efforts] (#future-efforts) section.
The final submission for this project is a static map page.
This project uses three primary datasources. More information about how data sources were combined and edited is provided in the Methodology section
Total births
Birth defects surveillance
Country maps
Data summary
- The UN Fertility Indicators dataset.
- A list of places where surveillance programs occur from the ICBDSR.
- The Natural Earth country shapefile without boundary lakes.
This dataset can be loaded into QGIS and will appear as WPP2019_Fertility_By_Age. As noted on the UN's page listed above, this file contains
The key variable of interest is the Birth column. Rather than looking at all 5-year ranges, the final map targets the next 5 year time period of 2020 to 2025 and shows the total number of births for each country.
After opening the table of contents in QGIS, there are several steps that should be taken care of using SQL. Each country has several rows of data, including number of births by mother's age, the different 5-year ranges going up to 2100, and several different population growth estimates.
For this map, I selected the following options as shown in the SQL command I used.
To apply this SQL, I first created a new virtual layer by navigating to Layer > Create Layer > New Virtual Layer in QGIS.
The code selects five variables that will be carried into the new virtual layer. These variables are listed below the SELECT command and including the sum of births. By summing across mother's age categories, I could generate an estimate for the total number of births in a country. The FROM command specified the new virtual layer should be created from the WPP2019_Fertility_By_Age dataset. The WHERE command specified that only the table rows from the in which MidPeriod was equal to 2023 and the estimate population growth (Variant) was not changes ("No change"). Finally, the LocID column provided unique country codes that worked as a grouping variable.
The result was a new virtual layer that looked like this.
In the image, row 236 is highlighted that shows the results for the United States. This cell indicates that between 2020 and 2025 (5 years), about 20,093,000 children will be born. The birth column results are in thousands as noted above.
To double check this, I visited the CDCs page about Births and Natality. This page showed the total number of births in 2018 was about 3.8 million. An estimate of 20 million births over 5 years seems reasonable and this indicates the SQL function worked correctly.
The data available on ICBDSR locations was only available as a table on the website. This information was copied and pasted into an excel document called Surveillance_Locations_v01. I used text to columns to split the data into columns. After editing and adding some columns, I saved this file as a csv.
The shapefile from Natural Earth was downloaded, unzipped,and loaded into QGIS.
A challenge in making the final map is that there were three datasets without a common linking code or country numbering system. An initial spatial join btween the Natural Earth shapefile and the five year birth totals by country resulted in over 30 countries that were missed due to spelling or naming variations. One option that I tried was fuzzy matching that could provide a method for partial spelling matches.
The option I used for getting the different data sources to have a common element for merging was to create the column by hand. This involved using Excel and conditional formatting to identify where spelling diffences were occuring between country names betwen the Natural Earth list of countries and the new virtual layer made from the UN dataset. Along with the country spelling, I used the three letter country code available in the Natural Earth dataset, ADM0_A3.
The end result of this process was the following:
Data note
When working with the edited UN dataset as a csv file, I rounded the number of births to the nearest whole number. Without rounding, QGIS did not seem to be reading the value correctly and I was not able to convert to a number. After rounded to the nearest whole number, the data column of births was read as a string and this could be converted to a number.
Now that the three data files had similar naming systems, joins were used to add the UN birth data and the ICBDSR data to the Natural Earth shapefile.
WIth all data joined together, two types of visualizations were conducted.
The first created a chloropleth map of the total number of births by country. A graduated, single color ramp was selected because the values shown were a single variable that was increasing in magnitude. Four categories were selected to show contrasts among the countries but to avoid having too many shades of green on the map.
The second visualization highlighted the countries where some form of birth defect surveillance was occurring. Rather than showing both pieces of information in one map, I chose to show the information in two maps. Data can be considered to be illuminating as it provides new information on a given topic. With incomplete data, an entire situation might be unable to be characterized. I selected this visual contrast through the two maps. A single map with country outlines showing which countries have birth defects surveillance is a possibility too.
For display, I selected the World Robinson projection, EPSG 54030 because it provided a globe-like feel to the map by having the edges be rounded. I found this projection listed on a page from the The Future Mapping Company.
These maps show the countries that are listed on the ICBDSR as having surveillance programs. There are many surveillance programs that can occur at local, regional, and national levels. This map is not intended to convey that collectively we have no information about birth defects in other countries. Instead, the map is intended to help spur discussion about ways to help organize and combine datasets across the world to improve our collective understanding of birth defects. Additionally, surveillance programs do not always operate on a national scale. These maps show an entire country because that was a reasonable merge to the Natural Earth dataset. There are multi-country efforts that were not shown on these maps.
The final documents used for these two maps are as follows: