Replication Package

Replication package for the paper:

"Theory Building from Data Strategy Studies: Aggregating Evidence on Model Quantization in Deep Learning Systems" submitted to the Empirical Software Engineering Journal.

Data:
- Raw, external, interim, and processed data are stored in the data directory.
Source Code:
- Located in the src directory, it includes scripts for data processing, analysis, and evidence extraction.
- Key modules:
  - data/papers/entities.py & data/papers/knowledge_extraction.py: Define the structure and data extraction logic for the papers analyzed.
  - data/download.py: Downloads the list of papers from arXiv and merges them with the Scopus list.
  - data/selection/llm.py: Implements logic for selecting studies using Gemini 3.0 Flash.
Jupyter Notebooks:
- Located in the notebooks directory, these notebooks contain the analysis and visualization of the data.
- Notebooks include:
  - 1.0-llm-promt-refinement.ipynb: Refines the prompt for LLMs and the selection of LLM.
  - 2.0-model-quantization-paper-selection.ipynb: Filters the raw list of papers using the selected GEMINI 3.0.
  - 3.0-final-selection-analysis.ipynb: Analyzes the final selection of papers.
  - 4.0-paper-metadata-analysis.ipynb: Analyzes metadata from selected papers.
  - 5.0-evidence-analysis.ipynb: Analyzes evidence extracted from the papers and generates the forest plot.
Documentation:
- data/processed/evidence-diagrams-mapping.md: Links to evidence diagrams generated during the study.
- data/processed/{paperkey}/metadata.json: Contains metadata for the specific paper.
- data/processed/{paperkey}/systematic-studies-quality-evaluation.md: Contains the filled quality evaluation form for the specific paper.

Project Structure

The project is organized as follows:

├── data/
│   ├── raw/                                <- Contains the original list of papers retrieved from Scopus
│   ├── external/                           <- Contains the raw data obtained from the selected papers
│   ├── interim/                            <- Contains the interim data used in the analysis
│   └── processed/                          <- Contains the processed data used in the analysis
│       └── evidence-diagrams-mapping.md    <- Contains links to the evidence diagrams
├── notebooks/
│   ├── 1.0-llm-promt-refinement.ipynb
│   ├── 2.0-model-quantization-paper-selection.ipynb
│   ├── 3.0-second-selection-analysis.ipynb
│   ├── 4.0-paper-metadata-analysis.ipynb
│   └── 5.0-evidence-analysis.ipynb
├── reports/
│   └── figures/
├── src/
│   ├── data/
│   │   ├── papers/                         <- Contains the logic for extracting and analyzing data from papers
│   │   │   ├── entities.py
│   │   │   └── knowledge_extraction.py
│   │   ├── download.py
│   │   └── selection/                      <- Utility functions for selecting studies using LLMs,
│   │       └── llm.py                         including the prompt
│   ├── forestplot/                         <- Utility functions for generating the forest plot
│   ├── effect_intensity.py                 <- Definition of the effect intensity thresholds
│   ├── run_evidence_extraction.py
│   └── config.py
├── .pre-commit-config.yaml
├── dot-env-template                        <- Template for environment variables
├── requirements.txt                        <- List of Python dependencies
├── uv.lock                                 <- Environment lock file
├── LICENSE
├── pyproject.toml                          <- Project configuration file
└── README.md

Usage Instructions

Setup:

Clone the repository:

git clone <repository-url>
cd green-tactics-synthesis

Install dependencies:
The project is managed with uv. To install the dependencies, run:
```
uv sync
```
Alternatively, you can use pip to install the dependencies listed in requirements.txt:
```
pip install -r requirements.txt
```

Using Docker (recommended for reproducibility):
A pre-built Docker image is available on Docker Hub:

docker pull santidr/model-quantization-aggregation

Run the container with Jupyter Lab:

docker run -it -p 8888:8888 santidr/model-quantization-aggregation

To use LLM features (paper selection), pass your API key:

docker run -it -p 8888:8888 \
  -e GEMINI_API_KEY=your_key \
  santidr/model-quantization-aggregation

To persist data changes, mount local directories:

docker run -it -p 8888:8888 \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/reports:/app/reports \
  santidr/model-quantization-aggregation

Getting the Data:
- Run the download script to fetch the list of papers from arXiv and merge it with the Scopus list:
```
python src/data/downlad.py
```
- We do not provide the raw data from the selected papers to prevent potential copyright issues. However, we provide instructions on how to obtain the data in each paper's README file. Located in the data/external directory.
Extracting the evidence:
- Use the run_evidence_extraction.py module to extract the evidence from the selected papers.
Explore the data with Jupyter Notebooks:
- Open the Jupyter notebooks in the notebooks directory to explore the data and analysis.

Notes

Ensure all required data is placed in the appropriate directories.
For any issues or questions, please contact the authors of the paper.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package

Contents

Project Structure

Usage Instructions

Notes

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
notebooks		notebooks
reports/figures		reports/figures
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dot-env-template		dot-env-template
figures.mplstyle		figures.mplstyle
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Replication Package

Contents

Project Structure

Usage Instructions

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages