InfoUSA_Database Overview

Basic pipeline:

                                       store_infousa.py                      BDEEPInfousa R package               
Household_Ethnicity_<year>.txt    --------------------------->   Postgres   ----------------------->     R      ------>    Further   
       (Raw TXT files)            --------------------------->   Database   ----------------------->    Data    ------>    Processing ...

TXT -> Postgres Database

store_infousa.py converts InfoUSA raw data from txt file to postgresql database. The script takes year number as an argument. For example, if you want to store year 2006, execute the following in the database machine:

python3 store_infousa.py 2006

The script uses sqlalchemy (Reference here) to create and insert into the database table. Different from the Zillow data, the InfoUSA data can be converted into a pandas data frame. Therefore, one can insert into the database by chunks, achieving better performance. Note that variable DTYPEIN is the type read by pandas, while variable DTYPE is that read by the database engine. These two must be consistent.

Postgres Database -> R

To transfer data from database into rds files, we use the BDEEPInfousa R package.

This package sets up a direct connection to the database and gets the data. The type reference table is also available. Details in the package folder.

An Example using database: Race Prediction Analysis

The InfoUSA data predicts the ethnicity of each of the recorded names and stores them as a separate column. The information is important for some researchers in the field of cultural differences and discrimination. Here, we analyzed the consistency of the InfoUSA prediction with that by another commonly used method, the R WRU package. See the folder for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
BDEEPInfousa		BDEEPInfousa
Race Prediction		Race Prediction
.gitignore		.gitignore
README.md		README.md
partition.py		partition.py
store_infousa.py		store_infousa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfoUSA_Database Overview

TXT -> Postgres Database

Postgres Database -> R

An Example using database: Race Prediction Analysis

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

uiuc-bdeep/InfoUSA_Database

Folders and files

Latest commit

History

Repository files navigation

InfoUSA_Database Overview

TXT -> Postgres Database

Postgres Database -> R

An Example using database: Race Prediction Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages