This is the repository for the work-in-progress study
Euler, Simone, Joachim Gassen, and Jonas Materna (2025): Disclosure Intensity Effects of Disclosure Regulations: European Evidence.
While the paper relies on commercial "flat files" Orbis data, we here provide the code to reproduce our simulated results as well as the code that generates our findings and paper based on the Orbis data.
Before running the code, you need to configure the repository by copying the file _discint.env to discint.env and editing it. It is safe to leave the default settings as they are. But if you want, you can set the log file location by changing the variable LOG_FILE.
The other options are only relevant if you have access to BvD Orbis data and want to reproduce the full paper. The processing of the Orbis data is quite memory intensive. If you run the code in an environment with low memory (less than 64GB of RAM and/or in a development container, see below), you should set the variable LOW_MEMORY to true (the default).
In this case, you also need to configure the following variables:
DUCKDB_FILE: Set this to the path where the DuckDB database file should be stored. The file will be created if it does not exist.DUCKDB_MEMORY_LIMIT: Set this to the maximum amount of memory DuckDB can use. A good first guess is to set this to about 40% of your available RAM when running the code in a development environment.DUCKDB_THREADS: Set this to the number of threads DuckDB can use. Maximum is the number of CPU cores available. Smaller values will reduce memory usage.
The default configuration has been tested to work on a M2 Macbook with 64GB RAM running docker. If you have more memory available, you can increase the memory limit and/or the number of threads to speed up the processing.
We suggest to use the development container provided in the repository to reproduce our results. You can open the repository in a container in VSCode by following these instructions.
Our simulated data is provided in the data/precomputed folder. You can create the R objects for the figures and tables presented in the paper by sourcing code/res_simulations.R via running make results_sim in the terminal. This will generate and store all result objects in the file data/generated/results_sim.rdata.
If you want to reproduce the simulation data itself, simply delete (or rename) the data file data/precomputed/sim_data_1000.rds and run make results_sim again. This will take several hours (monitor the logs) but should eventually result in the same findings.
To reproduce the full paper, you need to download the BvD Orbis flat files manually, convert them to parquet format, and place them into the folder data/pulled/. These are the files that we used (downloaded in December 2024):
data/pulled/industry_global_financials_and_ratios_eur.parquetdata/pulled/legal_info.parquetdata/pulled/industry_classifications.parquet
Then, you should be able to run the analysis and build the paper by running make all.
If you have read or even reproduced our study, we would be super interested in hearing your views. Besides reaching out to us via email, you could also start a public discussion by opening up a GitHub Issue here in this repository.
This project has received financial support from the TRR 266 "Accounting for Transparency".