Official implementation for the paper:
Mapping Consumer Voice into Engineering Insight: A Structured Language Model-Driven Design Support Framework for Electric Vehicles
Published in Journal of Engineering Design
Article page: https://www.tandfonline.com/doi/full/10.1080/09544828.2026.2639933
DOI: https://doi.org/10.1080/09544828.2026.2639933
This repository contains the code, processed artifacts, and demo application for the SCSI-SLM framework. The project studies how large language models and structured retrieval can transform unstructured consumer reviews into engineering design insight for electric vehicles. The full pipeline includes:
- structured semantic encoding of user reviews,
- product-side and user-side modeling,
- engineering knowledge graph construction,
- a hybrid RAG application for interactive design support.
- End-to-end research pipeline from raw EV review data to engineering insight generation.
- Modular implementation aligned with the paper sections.
- Included processed outputs, analysis reports, figures, and a runnable RAG demo.
- Support for both vector retrieval and graph retrieval in the final application.
| Path | Paper Section | Purpose |
|---|---|---|
00_Raw_Data/ |
Sec. 3.1.1 | Raw consumer review datasets used in the study. |
01_SSE_Analysis/ |
Sec. 3.1 | Data cleaning, LLM tag extraction, and engineering-dimension mapping. |
02_User_Modeling/ |
Sec. 3.2 | Importance-performance analysis and user preference clustering. |
03_Knowledge_Graph/ |
Sec. 3.3 | Neo4j-based engineering design knowledge graph construction. |
04_RAG_APP/ |
Sec. 4.3 | Interactive hybrid retrieval and reasoning system. |
docs/images/ |
Figures | Representative figures used in the manuscript and README. |
- Python 3.8+
- Docker and Docker Compose for Neo4j deployment
- OpenAI-compatible API key for LLM-powered steps
- Neo4j database for graph construction and graph retrieval
Install the Python dependencies from the project root:
pip install -r requirements.txtCore dependencies include:
pandas,numpy,scikit-learnjiebamatplotlib,seaborn,plotlylangchain,langchain-openai,langchain-community,openaichromadb,neo4j,streamlit
The repository includes both source data and intermediate outputs from the research workflow.
- Raw review data is stored in
00_Raw_Data/. - Cleaned review outputs are stored in
01_SSE_Analysis/1_Data_Preprocessing/outputs/. - User modeling outputs are stored under
02_User_Modeling/.../outputs/. - The knowledge graph module contains graph-building scripts and Neo4j setup files.
If you use the data or code in academic work, please cite the paper listed below.
The project is organized as a staged workflow rather than a single training script.
This stage converts raw consumer reviews into structured engineering tokens.
Key scripts:
01_SSE_Analysis/1_Data_Preprocessing/cleaning_pipeline.py01_SSE_Analysis/2_Dimension_Construction/tag_extraction_refinement.py01_SSE_Analysis/2_Dimension_Construction/dimension_mapping.json
Expected outputs:
- cleaned comments
- refined feature tags
- mapped engineering dimensions
This stage derives both product-side and user-side signals from the structured review data.
Product-side analysis:
02_User_Modeling/Product_IPA_Analysis/ipa_quantification.py- outputs include
car_model_scores.csv,feature_statistics.json, and per-model IPA figures
User-side analysis:
02_User_Modeling/User_Preference_Clustering/preference_profiling.py02_User_Modeling/User_Preference_Clustering/persona_visualization.py- outputs include user vectors, cluster characteristics, reports, and visualization figures
This stage organizes the extracted entities and relations into an engineering design knowledge graph.
Key files:
03_Knowledge_Graph/main.py03_Knowledge_Graph/scripts/init_database.cypher03_Knowledge_Graph/src/knowledge_graph_builder.py03_Knowledge_Graph/docker-compose.yml
The final stage exposes the research outputs through an interactive interface.
Key files:
04_RAG_APP/app.py04_RAG_APP/run.py04_RAG_APP/load_vector_data.py04_RAG_APP/core/rag_engine.py
To run the interactive application locally:
Create a root-level .env file based on .env.example.
Required variables include:
OPENAI_API_KEYNEO4J_URINEO4J_USERNAMENEO4J_PASSWORDNEO4J_DATABASE
From 04_RAG_APP/ or 03_Knowledge_Graph/, start the database with Docker Compose:
docker-compose up -d neo4jcd 04_RAG_APP
python load_vector_data.pycd 04_RAG_APP
streamlit run app.pyOr:
cd 04_RAG_APP
python run.pyThe default web interface is available at http://localhost:8501.
The SCSI-SLM framework supports multiple applications in electric vehicle product development.
Typical scenarios include:
- Product manager decision support based on consumer feedback
- Engineering design insight mining from large-scale reviews
- Knowledge-grounded product planning
An example application prototype built on this framework:
EV Product Manager Decision Support System (EV-PM-DSS)
https://github.com/DonkeyKing01/EV-PM-DSS
- This repository provides the full research pipeline as modular code and includes many intermediate outputs.
- Some stages rely on external LLM services and therefore require valid API credentials.
- The final demo depends on artifacts generated by the earlier modules and on a running Neo4j instance.
- Exact outputs may vary across model providers, prompts, or data versions.
If you find this repository useful, please cite:
@article{Jin2026Mapping,
title = {Mapping Consumer Voice into Engineering Insight: A Structured Language Model-Driven Design Support Framework for Electric Vehicles},
author = {Qingyang Jin and Luyao Wang and Wenyu Yuan and Danni Chang},
journal = {Journal of Engineering Design},
year = {2026},
doi = {10.1080/09544828.2026.2639933}
}


