Skip to content

santidrj/model-quantization-aggregation

Repository files navigation

Replication Package

DOI

Replication package for the paper:

"Theory Building from Data Strategy Studies: Aggregating Evidence on Model Quantization in Deep Learning Systems" submitted to the Empirical Software Engineering Journal.

Contents

This replication package consists of the following components:

  1. Data:

    • Raw, external, interim, and processed data are stored in the data directory.
  2. Source Code:

  3. Jupyter Notebooks:

  4. Documentation:

    • data/processed/evidence-diagrams-mapping.md: Links to evidence diagrams generated during the study.
    • data/processed/{paperkey}/metadata.json: Contains metadata for the specific paper.
    • data/processed/{paperkey}/systematic-studies-quality-evaluation.md: Contains the filled quality evaluation form for the specific paper.

Project Structure

The project is organized as follows:

├── data/
│   ├── raw/                                <- Contains the original list of papers retrieved from Scopus
│   ├── external/                           <- Contains the raw data obtained from the selected papers
│   ├── interim/                            <- Contains the interim data used in the analysis
│   └── processed/                          <- Contains the processed data used in the analysis
│       └── evidence-diagrams-mapping.md    <- Contains links to the evidence diagrams
├── notebooks/
│   ├── 1.0-llm-promt-refinement.ipynb
│   ├── 2.0-model-quantization-paper-selection.ipynb
│   ├── 3.0-second-selection-analysis.ipynb
│   ├── 4.0-paper-metadata-analysis.ipynb
│   └── 5.0-evidence-analysis.ipynb
├── reports/
│   └── figures/
├── src/
│   ├── data/
│   │   ├── papers/                         <- Contains the logic for extracting and analyzing data from papers
│   │   │   ├── entities.py
│   │   │   └── knowledge_extraction.py
│   │   ├── download.py
│   │   └── selection/                      <- Utility functions for selecting studies using LLMs,
│   │       └── llm.py                         including the prompt
│   ├── forestplot/                         <- Utility functions for generating the forest plot
│   ├── effect_intensity.py                 <- Definition of the effect intensity thresholds
│   ├── run_evidence_extraction.py
│   └── config.py
├── .pre-commit-config.yaml
├── dot-env-template                        <- Template for environment variables
├── requirements.txt                        <- List of Python dependencies
├── uv.lock                                 <- Environment lock file
├── LICENSE
├── pyproject.toml                          <- Project configuration file
└── README.md

Usage Instructions

  1. Setup:

    • Clone the repository:

      git clone <repository-url>
      cd green-tactics-synthesis
    • Install dependencies:
      The project is managed with uv. To install the dependencies, run:

      uv sync

      Alternatively, you can use pip to install the dependencies listed in requirements.txt:

      pip install -r requirements.txt
    • Using Docker (recommended for reproducibility):
      A pre-built Docker image is available on Docker Hub:

      docker pull santidr/model-quantization-aggregation

      Run the container with Jupyter Lab:

      docker run -it -p 8888:8888 santidr/model-quantization-aggregation

      To use LLM features (paper selection), pass your API key:

      docker run -it -p 8888:8888 \
        -e GEMINI_API_KEY=your_key \
        santidr/model-quantization-aggregation

      To persist data changes, mount local directories:

      docker run -it -p 8888:8888 \
        -v $(pwd)/data:/app/data \
        -v $(pwd)/reports:/app/reports \
        santidr/model-quantization-aggregation
  2. Getting the Data:

    • Run the download script to fetch the list of papers from arXiv and merge it with the Scopus list:

      python src/data/downlad.py
    • We do not provide the raw data from the selected papers to prevent potential copyright issues. However, we provide instructions on how to obtain the data in each paper's README file. Located in the data/external directory.

  3. Extracting the evidence:

  4. Explore the data with Jupyter Notebooks:

    • Open the Jupyter notebooks in the notebooks directory to explore the data and analysis.

Notes

  • Ensure all required data is placed in the appropriate directories.
  • For any issues or questions, please contact the authors of the paper.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

About

Replication package for the paper "Theory Building from Data Strategy Studies: Aggregating Evidence on Model Quantization in Deep Learning Systems".

Topics

Resources

License

Stars

Watchers

Forks

Contributors