Skip to content

Feature Engineering Pipeline + Dynamic Visualization Integration#6

Open
avijitmandal2004 wants to merge 1 commit intogarvjain7:featurefrom
avijitmandal2004:feature
Open

Feature Engineering Pipeline + Dynamic Visualization Integration#6
avijitmandal2004 wants to merge 1 commit intogarvjain7:featurefrom
avijitmandal2004:feature

Conversation

@avijitmandal2004
Copy link
Copy Markdown

@avijitmandal2004 avijitmandal2004 commented Apr 17, 2026

🚀 Changes Implemented:

• Integrated complete Feature Engineering pipeline in ML Engine

  • Date feature extraction
  • Business metrics (avg_order_value, derived_sales)
  • Relationship detection between columns
  • Automatic categorical encoding

• Fixed dataset processing pipeline

  • Corrected file validation flow
  • Ensured proper CSV/XLSX handling
  • Improved pipeline execution stability

• Connected frontend with backend APIs

  • Dynamic dataset analysis rendering
  • Removed hardcoded values from visualization
  • Enabled real-time data-driven UI

• Improved overall system flow
Upload → Processing → Feature Engineering → Visualization


✅ Result:

• Fully dynamic dashboard (no static data)
• Backend-driven analytics working successfully
• End-to-end ML pipeline running without errors
• Feature engineering validated through pipeline execution

Summary by Sourcery

Enhance the end-to-end data cleaning and visualization experience and improve backend dataset/visualization resolution and auth API integration.

New Features:

  • Allow visualizations to be generated when only an X-axis is selected by aggregating counts and updating chart titles and stats accordingly.
  • Add step navigation validation and a user-facing popup to enforce sequential completion of data cleaning steps and prevent skipping unresolved issues.

Bug Fixes:

  • Improve robustness of cleaned/original data and dashboard config lookup by handling temp dataset IDs, scanning user dataset directories, and gracefully handling missing files.
  • Ensure the employee cleaning page loads from original raw data when available, with fallback to cleaned data if necessary.

Enhancements:

  • Refine chart color schemes, gradients, tooltips, legends, and pie/donut rendering for more polished and informative visualizations on the employee visualization page.
  • Adjust cleaning workflow stats display to differentiate numeric aggregations from categorical totals and clarify chart headings.
  • Update frontend authentication and upload flows to use a configurable API base URL and align upload endpoint paths with the backend.
  • Display user full name where available and persist it in local storage for a better personalized employee experience.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 970d22d1-43b4-48f1-82d9-2f57b2cdac11

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 17, 2026

Reviewer's Guide

Implements a more robust, fully dynamic end-to-end pipeline from dataset upload through cleaning, feature engineering, and visualization, including smarter dataset file resolution on the backend, activity logging, and richer, data-driven charts and cleaning UX on the frontend.

Sequence diagram for EmployeeCleaningPage data_fetch_and_step_control

sequenceDiagram
    actor Employee
    participant CleaningPage as EmployeeCleaningPage
    participant Backend as BackendNode
    participant CleanedCtrl as CleanedDataController
    participant OriginalCtrl as OriginalDataController
    participant FS as MLEngine_filesystem

    Employee->>CleaningPage: Open cleaning page with datasetId
    CleaningPage->>Backend: GET /original-data/:datasetId (with token)
    Backend->>OriginalCtrl: getOriginalData
    OriginalCtrl->>FS: locate raw_data.csv via possiblePaths including temp ids
    alt raw_data_found
        OriginalCtrl->>FS: read raw_data.csv
        FS-->>OriginalCtrl: csvText
        OriginalCtrl-->>Backend: parsed headers rows success true
        Backend-->>CleaningPage: json { success true rows headers }
        CleaningPage->>CleaningPage: set tableRows and cleanedRows
    else raw_data_missing_or_error
        OriginalCtrl-->>Backend: json { success false message }
        Backend-->>CleaningPage: json { success false }
        CleaningPage->>Backend: GET /cleaned-data/:datasetId
        Backend->>CleanedCtrl: getCleanedData
        CleanedCtrl->>FS: locate cleaned_data.csv via possiblePaths including temp ids or any dataset folder
        FS-->>CleanedCtrl: cleaned_data.csv
        CleanedCtrl-->>Backend: parsed headers rows success true
        Backend-->>CleaningPage: json { success true rows headers }
        CleaningPage->>CleaningPage: set tableRows and cleanedRows
    end

    loop User navigates cleaning steps
        Employee->>CleaningPage: Click step indicator or Next
        CleaningPage->>CleaningPage: handleStepClick(stepId)
        CleaningPage->>CleaningPage: canProceedToStep(targetStep)
        alt trying_to_skip_multiple_steps
            CleaningPage->>Employee: show stepRestrictionPopup cannot skip steps
        else unresolved_null_values_in_step1
            CleaningPage->>Employee: show popup configure null handling
        else unresolved_duplicates_in_step2
            CleaningPage->>Employee: show popup configure duplicates handling
        else allowed_to_proceed
            CleaningPage->>CleaningPage: setCurrentStep(stepId)
            alt stepId == 5 and feature engineering not started
                CleaningPage->>MLEngine_filesystem: startFeatStream (feature engineering pipeline)
                MLEngine_filesystem-->>CleaningPage: streaming feature updates
            end
        end
    end
Loading

Entity relationship diagram for updated_users_schema

erDiagram
    COMPANIES {
      uuid company_id PK
      text name
      text domain
      timestamp created_at
    }

    USERS {
      uuid user_id PK
      uuid company_id FK
      text full_name
      text first_name
      text last_name
      text email
      text password_hash
      text phone
      text address
      text role
      text department
      text designation
      boolean is_active
      timestamp last_login
      timestamp created_at
    }

    COMPANIES ||--o{ USERS : has
Loading

Flow diagram for VisualizationPage dynamic_chart_data_and_ui_logic

flowchart TD
    subgraph Inputs
      DataHeaders[headers]
      DataRows[rows]
      ColumnTypes[columnTypes]
      ColumnStats[columnStats]
      ChartXAxis[chartXAxis state]
      ChartYAxis[chartYAxis state]
      ChartType[chartType state]
      Aggregation[aggregation state]
    end

    Start[Init effect on data load]
    Start --> DataHeaders
    Start --> ColumnTypes

    Start -->|auto select axes based on columnTypes| AutoAxes[set initial chartXAxis and chartYAxis]
    AutoAxes --> ChartXAxis
    AutoAxes --> ChartYAxis

    subgraph ChartDataComputation
      DecideAxes{YAxis selected?}
      DecideAxes -->|no| CountOnly[Group by X and count occurrences]
      DecideAxes -->|yes| WithY[Group by X and aggregate Y]

      CountOnly --> CountGrouped[Top10 categories with counts]
      WithY --> CheckNumericY{Y is numeric?}
      CheckNumericY -->|yes| NumAgg[sum count max min for numeric Y]
      CheckNumericY -->|no| CatAgg[count occurrences and raw values]
    end

    DataRows --> DecideAxes
    ChartXAxis --> DecideAxes
    ChartYAxis --> DecideAxes
    ColumnTypes --> CheckNumericY

    CountGrouped --> ChartData[chartData array]
    NumAgg --> ChartData
    CatAgg --> ChartData

    subgraph ChartStats
      StatsInput[chartData + columnTypes + chartYAxis]
      StatsInput --> StatsCheck{Y numeric?}
      StatsCheck -->|yes| NumStats[compute totalSum totalCount avg max min]
      StatsCheck -->|no| CatStats[compute totalCount only]
    end

    ChartData --> StatsInput
    NumStats --> ChartStatsOut[chartStats object]
    CatStats --> ChartStatsOut

    subgraph Rendering
      DecideEmpty{chartData empty?}
      ChartData --> DecideEmpty
      DecideEmpty -->|yes| EmptyState[Prompt to select columns and show count hint]
      DecideEmpty -->|no| RenderMainChart

      RenderMainChart --> Title[chartTitle based on axes]
      RenderMainChart --> BarFlow[Bar Line Area Pie branches]

      BarFlow -->|bar| BarChartNode[BarChart with gradients and colors]
      BarFlow -->|line| LineChartNode[LineChart with gradient shadow and activeDot]
      BarFlow -->|area| AreaChartNode[AreaChart with gradient fill and shadow]
      BarFlow -->|pie| PieChartNode[PieChart with PIE_COLORS custom tooltip legend]

      ChartStatsOut --> ToolbarStats[Top toolbar stats numeric or categorical]
    end

    Inputs --> DecideAxes
    Inputs --> RenderMainChart
    Inputs --> ToolbarStats

    subgraph SideDonut
      SideCat[choose categorical column with limited uniques]
      SideNum[choose numeric column]
      SideConditions{valid catCol numCol and rows?}
      SideConditions -->|yes| BuildPieData[group and sum top categories]
      SideConditions -->|no| NoSideChart[render nothing]
      BuildPieData --> DonutChart[Donut PieChart with gradients tooltip legend]
    end

    DataHeaders --> SideCat
    ColumnTypes --> SideCat
    ColumnStats --> SideCat
    DataHeaders --> SideNum
    ColumnTypes --> SideNum
    DataRows --> SideConditions
    SideCat --> SideConditions
    SideNum --> SideConditions
Loading

File-Level Changes

Change Details Files
Enhance visualization page to be fully data-driven with better defaults and richer chart interactions/styles.
  • Expanded color palette and introduced separate pie chart color configuration for more consistent theming.
  • Initialized chart X/Y axes based on detected column types and allowed charts to work when only X is selected (count aggregation).
  • Refactored chart data computation to use local xAxis/yAxis variables, support occurrence counting when Y is empty, and tightened numeric checks for stats.
  • Improved empty-state messaging and chart titles to reflect whether Y is selected or not.
  • Upgraded bar, line, area, and pie charts with gradients, shadows, animation, customized labels, tooltips, and legends, including guardrails when pie data is missing.
  • Enhanced side donut chart computation and rendering with new gradients, tooltips, legends, and defensive checks on headers/rows.
frontend-react/src/pages/employee/VisualizationPage.jsx
Strengthen cleaning wizard UX and data loading by enforcing sequential steps, using original data when possible, and gating feature streaming.
  • Changed initial data fetch to prefer original data endpoint with fallback to cleaned data so all cleaning steps can be applied, updating local state accordingly.
  • Introduced a step restriction popup state and helper to prevent skipping multiple steps or advancing when nulls/duplicates are unconfigured, with contextual messages.
  • Centralized step navigation into a handler that checks eligibility, starts feature engineering streaming only when arriving at the last step, and wired all step-related buttons to it.
  • Added a full-screen modal popup component to explain why a step cannot be skipped and prompt the user to complete previous steps first.
frontend-react/src/pages/employee/EmployeeCleaningPage.jsx
Improve backend dataset file discovery for cleaned, original, and visualization config data, including temp IDs and user-wide fallbacks, plus logging.
  • Expanded possible dataset paths for cleaned and original data and visualization config to also check temp-prefixed IDs and IDs with temp- stripped.
  • Added a secondary search that scans all dataset subfolders under a user directory for cleaned data, raw data, or dashboard config files when direct paths fail.
  • Logged successful file discovery and directory read failures for cleaned data, original data, and visualization configs to aid debugging.
  • Ensured original data read failures return a clear not-found error instead of crashing.
backend-node/src/controllers/cleanedDataController.js
backend-node/src/controllers/visualizationController.js
backend-node/src/controllers/datasetController.js
Add activity logging and enrichment when accessing original data for cleaning.
  • Wrapped original-data route with middleware that looks up dataset metadata and uploader information from the database.
  • Computed a friendly userName and datasetName and called the cleaning activity logger when original data is accessed.
  • Ensured that logging errors do not block the main request path by falling through to the controller via next().
backend-node/src/routes/cleanedDataRoutes.js
Extend user schema and surface full_name consistently in the UI.
  • Updated users table schema to rename generic name to full_name and add role, department, designation, and last_login fields.
  • Updated frontend layout and API client to use full_name with a fallback to name when showing the logged-in user and storing userName in localStorage.
backend-node/src/config/schema.sql
frontend-react/src/layout/EmployeeLayout.jsx
frontend-react/src/services/api.js
Make frontend authentication and upload API calls environment-configurable instead of hardcoded localhost URLs.
  • Introduced an API_URL constant based on VITE_API_BASE_URL with a localhost default in auth, admin, employee login, signup, and dashboard pages.
  • Replaced hardcoded axios.post URLs for login, signup, and upload with API_URL-based paths so environments can be configured via .env.
frontend-react/src/pages/AdminLogin.jsx
frontend-react/src/pages/Auth.jsx
frontend-react/src/pages/EmployeeDashboard.jsx
frontend-react/src/pages/EmployeeLogin.jsx
frontend-react/src/pages/Login.jsx
frontend-react/src/pages/Signup.jsx

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In the backend controllers (getCleanedData, getOriginalData, getVisualization, getDashboardConfig), the fallback that scans any dataset folder for a user and returns the first match can easily return the wrong dataset/dashboard for a given datasetId; consider tightening this to only match the requested ID (or a clearly related temp ID) to avoid cross-dataset leakage and confusing UI results.
  • The visualization components now have quite a bit of duplicated configuration/logic for colors, gradients, tooltips, and legends across multiple chart types and pie/donut charts; extracting shared helper components or utility functions for these concerns would simplify VisualizationPage.jsx and make future styling changes less error-prone.
  • There are several new console.log statements in backend controllers used for path discovery and error tracing; if these are meant for debugging, consider gating them behind an environment-based logging utility or removing them in production to avoid noisy logs and potential information leakage.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the backend controllers (`getCleanedData`, `getOriginalData`, `getVisualization`, `getDashboardConfig`), the fallback that scans any dataset folder for a user and returns the first match can easily return the wrong dataset/dashboard for a given `datasetId`; consider tightening this to only match the requested ID (or a clearly related temp ID) to avoid cross-dataset leakage and confusing UI results.
- The visualization components now have quite a bit of duplicated configuration/logic for colors, gradients, tooltips, and legends across multiple chart types and pie/donut charts; extracting shared helper components or utility functions for these concerns would simplify `VisualizationPage.jsx` and make future styling changes less error-prone.
- There are several new `console.log` statements in backend controllers used for path discovery and error tracing; if these are meant for debugging, consider gating them behind an environment-based logging utility or removing them in production to avoid noisy logs and potential information leakage.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants