Languages: I work in Python in my day job as an academic for data analysis, which are demonstrated here. I am also proficient in SQL and bash scripting, and can work with C++ code as required. I'm eager to learn other languages.
I've enclosed the following Python files. There are two sets available, depending on what you are looking to verify about my coding ability.
Please note: in production, these code snippets would always be accompanied by unit tests. Those are not included here.
Examples of ETL of business data:
- These were written as part of a 3-day take-home exam and represent time-constrained work.
take-home/data_processing.pyand/data_cleaning.py: These files usepandasto clean raw data on listings, hosts, and users (e.g. de-dupe, transform date-time columns, sense-check numerical data). During the processing phase, these files merge relevant tables for easy analysis.take-home/Data-analysis.ipynb: Python notebook that uses the above python modules to derive business insights from analysis. This represents a subset of my findings during this take-home exam, which were summarized in a 10-page deck.
Examples of more advanced Python for analysis of physics data:
simulations/classes.py: The Python classes here allow me to (1) query a MongoDB NoSQL database and load matching simulation trajectory files, (2) normalize the data as needed, then (3) run spacial/clustering analyses and save the results as class variables. This data pipeline works in tandem withparallelize.pybelow to allow fast, reproducible analysis.simulations/parallelize.py: My academic research involves analyzing multiple time-series-based data files in aggregate. This class is amultiprocessingframework that enables me to run analysis in parallel with just two lines of code, leading to speed-ups of 20x.
I also have two class projects from grad school that are well-contained, if you're looking for more examples of how I organize a coding project. Both links below are to Python notebooks with final analyses.
