Skip to content

M0hammadFarahani/ai-assistant-usage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Assistant Usage in Student Life β€” Classification Project

This project analyzes synthetic data of AI assistant usage among students, performs exploratory data analysis (EDA), feature engineering, preprocessing, and builds a classification model to predict whether a student will use the AI assistant again in the future.


πŸ“‚ Dataset

The dataset is publicly available on Kaggle:

AI Assistant Usage in Student Life (Synthetic)

It contains 10,000 records with the following columns:

  • SessionID β€” Unique identifier for each AI assistant usage session
  • StudentLevel β€” Education level (e.g., Undergraduate, Graduate, High School)
  • Discipline β€” Field of study (e.g., Computer Science, Psychology)
  • SessionDate β€” Date of the AI assistant session
  • SessionLengthMin β€” Duration of session in minutes
  • TotalPrompts β€” Number of prompts sent during the session
  • TaskType β€” Type of task performed (e.g., Studying, Coding)
  • AI_AssistanceLevel β€” Level of AI assistance (1 to 5)
  • FinalOutcome β€” Outcome of the session (e.g., Assignment Completed)
  • UsedAgain β€” Target variable (1 = Yes, 0 = No)
  • SatisfactionRating β€” Satisfaction rating given by the student

πŸ“Š Project Workflow

1. EDA (Exploratory Data Analysis)

  • Checked dataset shape, data types, and missing values
  • Summary statistics (describe())
  • Class distribution visualization for the target variable
  • Distribution plots for numeric variables
  • Count plots for categorical variables

2. Feature Engineering

Additional features were created to improve model performance:

  • Date-based features extracted from SessionDate:
    • year
    • month
    • dayofweek
    • day
    • weekofyear
  • Interaction-based feature:
    • prompts_per_min = TotalPrompts / SessionLengthMin

3. Preprocessing

  • Encoding categorical variables with LabelEncoder (for simplicity)
  • Scaling numeric features using StandardScaler
  • SMOTE applied to handle class imbalance
  • Train-test split (80/20 ratio)

4. Modeling

Models tested:

  • Logistic Regression
  • Random Forest Classifier
  • XGBoost Classifier

Final chosen model: RandomForestClassifier

  • Achieved accuracy: ~75%

5. Evaluation

  • Confusion Matrix
  • Classification Report (Precision, Recall, F1-score)
  • Accuracy Score
    imbalanced-learn xgboost

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors