Code Smells in AI4SE

Welcome! This repository contains the source code and experimental results for our paper, " A Causal Perspective on Measuring, Explaining and Mitigating Smells in LLM-Generated Code"
Our work investigates the prevalence of code smells in code generated by large language models (LLMs), provides insights into their causes, and explores strategies for mitigation. Here you will find all materials necessary to reproduce our analyses and findings.

Prerequisites

To use GPU for predictions, ensure you have PyTorch installed and a compatible GPU available.

Check your GPU setup with:

!nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 12.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   33C    P0    34W / 250W |      4MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Installation

Create a virtual environment (using conda, mamba, or virtualenv) and activate it:
```
mamba create -n code-smell-env
conda activate code-smell-env
```
Navigate to the project base path:
```
cd CodeSmells
```
Install dependencies:
```
pip install .
```
Compile and install the code_smell_lib using nbdev:
```
cd code_smell_lib
nbdev_export
pip install .
```

Note: Some dependencies may not be included in requirements.txt. Install them manually if prompted.

General Instructions

The dataset folder contains the datasets (CodeSmellData) used for our experiments.
The notebooks folder contains Jupyter notebooks for each research question.

Notebook Overview

Research Question	Subfolder	Notebook File	Function/Explanation
RQ1 (Measure)	baseline	01_information_gain.ipynb	Computes information gain for baseline model comparisons.
RQ1 (Measure)	robustness_generation	05_1_data_engineering-CodeLlama.ipynb	Processes logits for robustness generation experiments (CodeLlama model).
RQ1 (Measure)	robustness_generation	06_alignment_and_aggregation.ipynb	Aggregates and aligns logits for robustness generation experiments.
RQ1 (Measure)	robustness_transformations	04_1_data_engineering-CodeLlama.ipynb	Processes logits for robustness transformation experiments (CodeLlama model).
RQ1 (Measure)	robustness_transformations	05_alignment_and_aggregation.ipynb	Aggregates and aligns logits for robustness transformation experiments.
RQ2 (Explain)	causal_analysis	01_dataset_preprocessing.ipynb	Preprocesses datasets for causal analysis experiments.
RQ2 (Explain)	causal_analysis	02_causal_analysis.ipynb	Performs causal analysis to identify relationships between code smells and model outputs.
RQ2 (Explain)	causal_analysis	03_result_analysis.ipynb	Analyzes and visualizes results from causal analysis.
RQ2 (Explain)	prompting	04_alignment_and_aggregation.ipynb	Aggregates and aligns logits for PSC computation in prompting experiments.
RQ3 (Mitigation)	mitigation	01_dataset_preparation.ipynb	Prepares datasets for mitigation experiments.
RQ3 (Mitigation)	mitigation	02_extractor_CausalLM.ipynb	Extracts logits from CausalLM models for mitigation analysis.
RQ3 (Mitigation)	mitigation	03_2_data_engineering-CausalLM.ipynb	Processes extracted logits for mitigation experiments.
RQ3 (Mitigation)	mitigation	04_alignment_and_aggregation.ipynb	Aggregates and aligns logits to compute Propensity Smelly Score (PSC) for mitigation.
RQ3 (Mitigation)	mitigation	05_analysis.ipynb	Performs statistical analysis and visualization for mitigation results.
RQ4 (Pipeline)	pipeline	01_extractor_CausalLM.ipynb	Extracts logits from CausalLM models for pipeline experiments.
RQ4 (Pipeline)	pipeline	02_2_data_engineering-CausalLM.ipynb	Processes logits for pipeline experiments.
RQ4 (Pipeline)	pipeline	03_alignment_and_aggregation.ipynb	Aggregates and aligns logits for PSC computation in pipeline experiments.
RQ4 (Survey)	survey	result_analysis.ipynb	Analyzes survey results related to code smell perceptions.
All RQs	extension	models.md	Documents the models used in all experiments.

Subfolder Explanations

baseline: Notebooks for baseline model comparisons and metrics.
causal_analysis: Notebooks for dataset preprocessing, causal analysis, and result visualization.
mitigation: Notebooks for preparing data, extracting logits, engineering features, aggregating results, and analyzing mitigation strategies.
pipeline: Notebooks for extracting, processing, and aggregating logits in pipeline experiments.
prompting: Notebooks for aggregation and analysis in prompting experiments.
robustness_generation: Notebooks for robustness generation experiments, including data engineering and aggregation.
robustness_transformations: Notebooks for robustness transformation experiments, including data engineering and aggregation.
survey: Notebooks for analyzing survey data on code smell perceptions.

code_smells_lib/nbs

This folder contains the nbdev source notebooks that implement and document the main components of the code_smells_lib library. Each notebook focuses on a specific aspect of code smell detection and analysis:

00_ast_utils.ipynb: Utility functions for parsing and analyzing Python Abstract Syntax Trees (ASTs), essential for identifying structural code smells.
01_pos_tagging.ipynb: Implements part-of-speech tagging for code tokens, supporting advanced code analysis and smell detection.
02_smell_detectors.ipynb: Core logic for detecting various code smells, including heuristics and rule-based approaches.
03_metrics.ipynb: Defines metrics for quantifying code smells and evaluating code quality.
04_data_processing.ipynb: Functions for loading, cleaning, and transforming code datasets used in experiments.
05_visualization.ipynb: Tools for visualizing code smell distributions and analysis results.

Each notebook is designed for literate programming: code cells implement functionality, while markdown cells explain usage and design decisions. The code is exported to Python modules using nbdev_export, ensuring that documentation and implementation remain synchronized.

scripts

The scripts folder contains shell-executable versions of the main analysis pipelines found in the notebooks. Each subfolder corresponds to a research question or experimental setting (e.g., causal_analysis, mitigation, pipeline, prompting, robustness_generation, robustness_transformations) and includes scripts and configuration files tailored for batch execution.

Key points:

Purpose: These scripts automate the execution of data preprocessing, model inference, code smell detection, and result aggregation steps, making it easy to reproduce experiments from the command line.
Structure: The folder structure mirrors the organization of the main notebooks, with each subfolder containing the relevant Python scripts and shell scripts (run_script.sh) for its experimental stage.
Usage: To run an experiment, navigate to the desired subfolder and execute the provided shell script. Logs and outputs are saved in dedicated directories for easy inspection.

This setup enables reproducible, large-scale experiments and is ideal for running analyses on remote servers or clusters.

survey

The survey folder contains materials and analysis related to the user study conducted for our research. This study investigates human perceptions of code smells in generated code.

control.pdf and treatment.pdf: These files contain the survey forms shown to participants. The control version presents code snippets without explicit code smell annotations, while the treatment version includes additional information or highlighting related to code smells.
result_analysis.ipynb: This notebook performs statistical analysis of the survey responses. It processes Likert-scale ratings from participants, compares control and treatment groups, and applies significance tests (e.g., Mann-Whitney U) to assess the impact of code smell annotations on user judgments.

Use this folder to reproduce the survey analysis and explore how code smell explanations affect developer perceptions.

causation plots

The causation plots folder contains additional visualizations that were excluded from the main paper but provide valuable insights into the robustness of Propensity Smelly Score (PSC) for each type of code smell. These plots illustrate how PSC and related metrics behave under various robustness experiments, helping to further understand the stability and reliability of code smell detection across different scenarios.

The plots are organized by robustness experiment and include both mean and median statistics for actual and maximum probabilities associated with code smells.
You will find visualizations such as code_smell_actual_prob_mean, code_smell_actual_prob_median, code_smell_max_prob_mean, and code_smell_max_prob_median in PDF and PNG formats.
These results can be used to explore the sensitivity of PSC to code transformations and generation settings, offering deeper context for interpreting the main findings.

Researchers interested in the detailed behavior of PSC under robustness conditions can refer to these plots for supplementary analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code Smells in AI4SE

Prerequisites

Installation

General Instructions

Notebook Overview

Subfolder Explanations

code_smells_lib/nbs

scripts

survey

causation plots

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.devcontainer		.devcontainer
code_smells_lib		code_smells_lib
dataset		dataset
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

WM-SEMERU/CodeSmellExt

Folders and files

Latest commit

History

Repository files navigation

Code Smells in AI4SE

Prerequisites

Installation

General Instructions

Notebook Overview

Subfolder Explanations

code_smells_lib/nbs

scripts

survey

causation plots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages