Skip to content

Commit 645a603

Browse files
committed
further extended the readme by an example
1 parent 1e6b3c7 commit 645a603

File tree

1 file changed

+64
-9
lines changed

1 file changed

+64
-9
lines changed

README.md

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
This repo contains a prototype implementation **DoubleML-Serverless** of distributed double machine learning with a serverless infrastructure
44
using [AWS Lambda](https://aws.amazon.com/lambda).
55
A detailed discussion of this prototype can be found in the paper "Distributed Double Machine Learning with a Serverless Architecture" (Kurz, 2021).
6-
**DoubleML-Serverless** is an extension for serverless cloud computing of the Python package **DoubleML**.
7-
**DoubleML** is available via PyPI [https://pypi.org/project/DoubleML](https://pypi.org/project/DoubleML) and on GitHub [https://github.com/DoubleML/doubleml-for-py](https://github.com/DoubleML/doubleml-for-py).
8-
Also see [https://docs.doubleml.org](https://docs.doubleml.org) for a detailed documentation and user guide for the **DoubleML** package.
6+
DoubleML-Serverless is an extension for serverless cloud computing of the Python package **DoubleML**.
7+
DoubleML is available via PyPI [https://pypi.org/project/DoubleML](https://pypi.org/project/DoubleML) and on GitHub [https://github.com/DoubleML/doubleml-for-py](https://github.com/DoubleML/doubleml-for-py).
8+
Also see [https://docs.doubleml.org](https://docs.doubleml.org) for a detailed documentation and user guide for the DoubleML package.
99

1010
## Getting started
1111

@@ -47,19 +47,74 @@ There are two options for deployment:
4747

4848
2. The second option for deployment is based on AWS Serverless Application Model (AWS SAM).
4949

50-
2.1 Setup the AWS SAM CLI as described here: [https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started.html](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started.html)
50+
2.1 Setup the AWS SAM CLI as described here: [https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started.html](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started.html)
5151

52-
2.2 To deploy the application use the following commands (for more information see [https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html))
52+
2.2 To deploy the application use the following commands (for more information see [https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html))
53+
```
54+
cd aws_lambda_app
55+
sam build
56+
sam deploy --guided
57+
```
5358
59+
### Estimating a partially linear regression model with double machine learning and serverless scaling using AWS Lambda
60+
61+
To demonstrate the functionality of DoubleML-Serverless we revisit the Pennsylvania Reemployment Bonus experiment
62+
and estimate the effect of provisioning a cash bonus on the unemployment duration as studied in Chernozhukov et al. (2018).
63+
This example is also discussed in the accompanying paper to the DoubleML-Serverless package (Kurz, 2021).
64+
65+
We first load the data using functionalities from the DoubleML package.
66+
```python
67+
from doubleml.datasets import fetch_bonus
68+
df_bonus = fetch_bonus('DataFrame')
5469
```
55-
cd aws_lambda_app
56-
sam build
57-
sam deploy --guided
70+
71+
The class `DoubleMLDataS3` serves as data-backend for DoubleML-Serverless model classes.
72+
It is inherited from the `DoubleML` class `DoubleMLData`.
73+
We initialize an object of the `DoubleMLDataS3` for the bonus data and upload it to the S3 bucket `doubleml-serverless-data` used for the data transfer to AWS Lambda.
74+
```python
75+
from doubleml_serverless import DoubleMLDataS3
76+
77+
dml_data_bonus = DoubleMLDataS3(
78+
'doubleml-serverless-data', 'bonus_data.csv',
79+
df_bonus,
80+
y_col='inuidur1',
81+
d_cols='tg',
82+
x_cols=['female', 'black', 'othrace',
83+
'dep1', 'dep2', 'q2', 'q3',
84+
'q4', 'q5', 'q6', 'agelt35',
85+
'agegt54', 'durable', 'lusd', 'husd'])
86+
dml_data_bonus.store_and_upload_to_s3()
5887
```
5988

60-
### Estimating a partially linear regression model with double machine learning and serverless scaling using AWS Lambda
89+
To estimate the nuisance functions we use a random forest regressor which averages over 500 trees.
90+
We further apply repeated cross-fitting with 5 folds and 100 repetitions/splits.
91+
```python
92+
from doubleml_serverless import DoubleMLPLRServerless
93+
from sklearn.base import clone
94+
from sklearn.ensemble import RandomForestRegressor
95+
96+
ml = RandomForestRegressor(n_estimators = 500)
97+
ml_g = clone(ml)
98+
ml_m = clone(ml)
99+
dml_lambda_plr_bonus = DoubleMLPLRServerless(
100+
'LambdaCVPredict', 'eu-central-1',
101+
dml_data_bonus, ml_g, ml_m,
102+
n_folds=5, n_rep=100)
103+
```
61104

105+
To estimate the model locally we can call `dml_lambda_plr_bonus.fit()`.
106+
Estimation on AWS Lambda is achieved via `dml_lambda_plr_bonus.fit_aws_lambda()`.
107+
Note that you will be charged for all used resources in the AWS account you deployed the serverless application to.
108+
```python
109+
dml_lambda_plr_bonus.fit_aws_lambda()
110+
```
62111

112+
A summary of the estimation result is available via the property `dml_lambda_plr_bonus.summary`.
113+
Some metrics about the estimation on AWS Lambda can be obtained via the property `dml_lambda_plr_bonus.aws_lambda_metrics`.
63114

64115
## References
116+
117+
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018),
118+
Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68. doi:[10.1111/ectj.12097](https://doi.org/10.1111/ectj.12097).
119+
65120
Kurz, M.S. 2020. "Distributed Double Machine Learning with a Serverless Architecture". Unpublished Working Paper.

0 commit comments

Comments
 (0)