Using supervised learning classification techniques, created model that predicts if a company will be audited by the Big Four or another accounting firm
Target Variable: 1)Big Four, or 2)Other
Features:
- Cash Flows (feature engineered to indicate inflow/outflow)
- CAPEX
- NCF - Business Acquisitions and Disposals
- NCF - Cash & Cash Equivalents
- NCF - Financing
- Issuance/Repayments of Debt Securities
- Issuance/Purchase of Equity Shares
- Payment of Dividends & Other Cash Distributions
- NCF - Investing
- NCF- Operations
- Balance Sheet
- Assets
- Cash and Cash Equivalents
- Liabilities
- Debt
- Equity
- Earnings
- Revenue
- EBITDA
- Equity
- Market
- Market Capitilization
- Enterprise Value
- Industry
- Location(State)
- Exchange
PCAOB Auditor Search: https://pcaobus.org/resources/auditorsearch
Quandl - Core US Fundamentals Data: https://www.quandl.com/databases/SF1/data
- Python
- PostgreSQL
- Pandas
- NumPy
- matplotlip
- seaborn
- Geopandas
- scikit-learn
- SQLAlchemy
Database Exploration, Creation, and Querying
- explore_csv.ipynb
- create_database.sql
- query.sql
Preprocessing and EDA
- import_preprocess.ipynb
- eda_viz.ipynb
Modeling and Metrics
- knn.ipynb
- logistic_regression.ipynb
- naive_bayes.ipynb
- decision_tree.ipynb
- random_forest.ipynb
- xgboost.ipynb
- ensemble.ipynb
- roc_curve.ipynb