Welcome, this is the repository that host the source code of CodeSmellEval and results of our paper ‘How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study’.
Large Language Models (LLMs) have shown significant potential in automating software engineering tasks, particularly in code generation. However, current evaluation benchmarks, which primarily focus on accuracy, fall short in assessing the quality of the code generated by these models, specifically their tendency to produce code smells. To address this limitation, we introduce CodeSmellEval, a benchmark designed to evaluate the propensity of LLMs for generating code smells. Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData. To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral. The results reveal that both models tend to generate code smells, such as simplifiable-condition and consider-merging-isinstance. These findings highlight the effectiveness of our benchmark in evaluating LLMs, providing valuable insights into their reliability and their propensity to introduce code smells in code generation tasks.
If you want to use GPU to perform the predictions, please make sure to have pytorch correctly installed and GPU available to use. https://pytorch.org
!nvidia-smi+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 Off | 00000000:01:00.0 Off | 0 |
| 0% 69C P0 295W / 300W | 29143MiB / 46068MiB | 82% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:25:00.0 Off | 0 |
| 0% 30C P8 33W / 300W | 26MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A40 Off | 00000000:41:00.0 Off | 0 |
| 0% 30C P8 35W / 300W | 26MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A40 Off | 00000000:61:00.0 Off | 0 |
| 0% 35C P0 82W / 300W | 26MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A40 Off | 00000000:81:00.0 Off | 0 |
| 0% 28C P8 32W / 300W | 26MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A40 Off | 00000000:A1:00.0 Off | 0 |
| 0% 58C P0 90W / 300W | 42659MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A40 Off | 00000000:C1:00.0 Off | 0 |
| 0% 62C P0 296W / 300W | 42659MiB / 46068MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A40 Off | 00000000:E1:00.0 Off | 0 |
| 0% 67C P0 321W / 300W | 42659MiB / 46068MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 0 N/A N/A 1807667 C python3.8 29117MiB |
| 1 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 2 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 3 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 4 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 5 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 5 N/A N/A 2970647 C julia 42633MiB |
| 6 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 6 N/A N/A 1778799 C julia 42633MiB |
| 7 N/A N/A 3048 G /usr/lib/xorg/Xorg 23MiB |
| 7 N/A N/A 1783425 C julia 42633MiB |
+-----------------------------------------------------------------------------+
Create a virtual environment, you can use conda, mamba or virtualenv. Then activate the envorinment and go to the project base path.
mamba create code-smell-evalcd CodeSmellsNow, install the dependencies using the package manager
pip install .dataset folder contains the dataset (CodeSmellData) that we used for our experiments in the proposed benchmark.
nbs folder contains the notebook to replicate our experiments.
| Netebook | Function |
|---|---|
| 00_dataset_curation | Contains methods to clean the dataset and select only samples that meets the conditions of the two models that were used. |
| 01_extractor_CausalLM | Contains methods to extract the logits of predictions using the samples exracted in the previous notebook. |
| 02_data_engineering_XX | Contains methods to process the logits extracted in the previous steps, for each of the two models used in the experiments |
| 03_alignment_and_aggregation | Contains methods to compute the PSC (Propensity Smelly Score), by aggregating logits of tokens associated to each type of smell |
| 04_global_analysis | Contains methods to analyze the aggregation results and perform statistical analysis of obained distributions |