|
| 1 | +--- |
| 2 | +jupytext: |
| 3 | + formats: md:myst |
| 4 | + text_representation: |
| 5 | + extension: .md |
| 6 | + format_name: myst |
| 7 | +kernelspec: |
| 8 | + display_name: Python 3 |
| 9 | + language: python |
| 10 | + name: python3 |
| 11 | +--- |
| 12 | + |
| 13 | +(Chap_BasicEmpirMethods)= |
| 14 | +# Basic Empirical Methods |
| 15 | + |
| 16 | +Put basic empirical methods here. {numref}`ExerBasicEmpir_MultLinRegress` |
| 17 | + |
| 18 | + |
| 19 | +(SecBasicEmpirExercises)= |
| 20 | +## Exercises |
| 21 | + |
| 22 | +```{exercise-start} Multiple linear regression |
| 23 | +:label: ExerBasicEmpir_MultLinRegress |
| 24 | +:class: green |
| 25 | +``` |
| 26 | +For this problem, you will use the 397 observations from the [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/BasicEmpirMethods/Auto.csv) dataset in the [`/data/BasicEmpirMethods/`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/BasicEmpirMethods) folder of the repository for this book.[^Auto] This dataset includes 397 observations on miles per gallon (`mpg`), number of cylinders (`cylinders`), engine displacement (`displacement`), horsepower (`horsepower`), vehicle weight (`weight`), acceleration (`acceleration`), vehicle year (`year`), vehicle origin (`origin`), and vehicle name (`name`). |
| 27 | +1. Import the data using the [`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function. Look for characters that seem out of place that might indicate missing values. Replace them with missing values using the `na_values=...` option. |
| 28 | +2. Produce a scatterplot matrix which includes all of the quantitative variables `mpg`, `cylinders`, `displacement`, `horsepower`, `weight`, `acceleration`, `year`, `origin`. Call your DataFrame of quantitative variables `df_quant`. [Use the pandas scatterplot function in the code block below.] |
| 29 | +```python |
| 30 | +from pandas.plotting import scatter_matrix |
| 31 | + |
| 32 | +scatter_matrix(df_quant, alpha=0.3, figsize=(6, 6), diagonal='kde') |
| 33 | +``` |
| 34 | +3. Compute the correlation matrix for the quantitative variables ($8\times 8$) using the [`pandas.DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html) method. |
| 35 | +4. Estimate the following multiple linear regression model of $mpg_i$ on all other quantitative variables, where $u_i$ is an error term for each observation, using Python's `statsmodels.api.OLS()` function. |
| 36 | + \begin{equation*} |
| 37 | + \begin{split} |
| 38 | + mpg_i &= \beta_0 + \beta_1 cylinders_i + \beta_2 displacement_i + \beta_3 horsepower_i + ... \\ |
| 39 | + &\qquad \beta_4 weight_i + \beta_5 acceleration_i + \beta_6 year_i + \beta_7 origin_i + u_i |
| 40 | + \end{split} |
| 41 | + \end{equation*} |
| 42 | + * Which of the coefficients is statistically significant at the 1\% level? |
| 43 | + * Which of the coefficients is NOT statistically significant at the 10\% level? |
| 44 | + * Give an interpretation in words of the estimated coefficient $\hat{\beta}_6$ on $year_i$ using the estimated value of $\hat{\beta}_6$. |
| 45 | +5. Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with $mpg_i$? |
| 46 | + * Estimate a new multiple regression model by OLS in which you include squared terms on the three variables you identified as having a nonlinear relationship to $mpg_i$ as well as a squared term on $acceleration_i$. |
| 47 | + * Report your adjusted R-squared statistic. Is it better or worse than the adjusted R-squared from part (4)? |
| 48 | + * What happened to the statistical significance of the $displacement_i$ variable coefficient and the coefficient on its squared term? |
| 49 | + * What happened to the statistical significance of the cylinders variable? |
| 50 | +6. Using the regression model from part (5) and the `.predict()` function, what would be the predicted miles per gallon $mpg$ of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1? |
| 51 | +```{exercise-end} |
| 52 | +``` |
| 53 | + |
| 54 | + |
| 55 | +(SecBasicEmpirFootnotes)= |
| 56 | +## Footnotes |
| 57 | + |
| 58 | +The footnotes from this chapter. |
| 59 | + |
| 60 | +[^Auto]: The [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/BasicEmpirMethods/Auto.csv) dataset comes from {cite}`JamesEtAl:2017` (ch. 3) and is also available at http://www-bcf.usc.edu/~gareth/ISL/data.html. |
0 commit comments