Feature Engineering Pipeline + Dynamic Visualization Integration#6
Feature Engineering Pipeline + Dynamic Visualization Integration#6avijitmandal2004 wants to merge 1 commit intogarvjain7:featurefrom
Conversation
…ualization with pie chart fixes
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideImplements a more robust, fully dynamic end-to-end pipeline from dataset upload through cleaning, feature engineering, and visualization, including smarter dataset file resolution on the backend, activity logging, and richer, data-driven charts and cleaning UX on the frontend. Sequence diagram for EmployeeCleaningPage data_fetch_and_step_controlsequenceDiagram
actor Employee
participant CleaningPage as EmployeeCleaningPage
participant Backend as BackendNode
participant CleanedCtrl as CleanedDataController
participant OriginalCtrl as OriginalDataController
participant FS as MLEngine_filesystem
Employee->>CleaningPage: Open cleaning page with datasetId
CleaningPage->>Backend: GET /original-data/:datasetId (with token)
Backend->>OriginalCtrl: getOriginalData
OriginalCtrl->>FS: locate raw_data.csv via possiblePaths including temp ids
alt raw_data_found
OriginalCtrl->>FS: read raw_data.csv
FS-->>OriginalCtrl: csvText
OriginalCtrl-->>Backend: parsed headers rows success true
Backend-->>CleaningPage: json { success true rows headers }
CleaningPage->>CleaningPage: set tableRows and cleanedRows
else raw_data_missing_or_error
OriginalCtrl-->>Backend: json { success false message }
Backend-->>CleaningPage: json { success false }
CleaningPage->>Backend: GET /cleaned-data/:datasetId
Backend->>CleanedCtrl: getCleanedData
CleanedCtrl->>FS: locate cleaned_data.csv via possiblePaths including temp ids or any dataset folder
FS-->>CleanedCtrl: cleaned_data.csv
CleanedCtrl-->>Backend: parsed headers rows success true
Backend-->>CleaningPage: json { success true rows headers }
CleaningPage->>CleaningPage: set tableRows and cleanedRows
end
loop User navigates cleaning steps
Employee->>CleaningPage: Click step indicator or Next
CleaningPage->>CleaningPage: handleStepClick(stepId)
CleaningPage->>CleaningPage: canProceedToStep(targetStep)
alt trying_to_skip_multiple_steps
CleaningPage->>Employee: show stepRestrictionPopup cannot skip steps
else unresolved_null_values_in_step1
CleaningPage->>Employee: show popup configure null handling
else unresolved_duplicates_in_step2
CleaningPage->>Employee: show popup configure duplicates handling
else allowed_to_proceed
CleaningPage->>CleaningPage: setCurrentStep(stepId)
alt stepId == 5 and feature engineering not started
CleaningPage->>MLEngine_filesystem: startFeatStream (feature engineering pipeline)
MLEngine_filesystem-->>CleaningPage: streaming feature updates
end
end
end
Entity relationship diagram for updated_users_schemaerDiagram
COMPANIES {
uuid company_id PK
text name
text domain
timestamp created_at
}
USERS {
uuid user_id PK
uuid company_id FK
text full_name
text first_name
text last_name
text email
text password_hash
text phone
text address
text role
text department
text designation
boolean is_active
timestamp last_login
timestamp created_at
}
COMPANIES ||--o{ USERS : has
Flow diagram for VisualizationPage dynamic_chart_data_and_ui_logicflowchart TD
subgraph Inputs
DataHeaders[headers]
DataRows[rows]
ColumnTypes[columnTypes]
ColumnStats[columnStats]
ChartXAxis[chartXAxis state]
ChartYAxis[chartYAxis state]
ChartType[chartType state]
Aggregation[aggregation state]
end
Start[Init effect on data load]
Start --> DataHeaders
Start --> ColumnTypes
Start -->|auto select axes based on columnTypes| AutoAxes[set initial chartXAxis and chartYAxis]
AutoAxes --> ChartXAxis
AutoAxes --> ChartYAxis
subgraph ChartDataComputation
DecideAxes{YAxis selected?}
DecideAxes -->|no| CountOnly[Group by X and count occurrences]
DecideAxes -->|yes| WithY[Group by X and aggregate Y]
CountOnly --> CountGrouped[Top10 categories with counts]
WithY --> CheckNumericY{Y is numeric?}
CheckNumericY -->|yes| NumAgg[sum count max min for numeric Y]
CheckNumericY -->|no| CatAgg[count occurrences and raw values]
end
DataRows --> DecideAxes
ChartXAxis --> DecideAxes
ChartYAxis --> DecideAxes
ColumnTypes --> CheckNumericY
CountGrouped --> ChartData[chartData array]
NumAgg --> ChartData
CatAgg --> ChartData
subgraph ChartStats
StatsInput[chartData + columnTypes + chartYAxis]
StatsInput --> StatsCheck{Y numeric?}
StatsCheck -->|yes| NumStats[compute totalSum totalCount avg max min]
StatsCheck -->|no| CatStats[compute totalCount only]
end
ChartData --> StatsInput
NumStats --> ChartStatsOut[chartStats object]
CatStats --> ChartStatsOut
subgraph Rendering
DecideEmpty{chartData empty?}
ChartData --> DecideEmpty
DecideEmpty -->|yes| EmptyState[Prompt to select columns and show count hint]
DecideEmpty -->|no| RenderMainChart
RenderMainChart --> Title[chartTitle based on axes]
RenderMainChart --> BarFlow[Bar Line Area Pie branches]
BarFlow -->|bar| BarChartNode[BarChart with gradients and colors]
BarFlow -->|line| LineChartNode[LineChart with gradient shadow and activeDot]
BarFlow -->|area| AreaChartNode[AreaChart with gradient fill and shadow]
BarFlow -->|pie| PieChartNode[PieChart with PIE_COLORS custom tooltip legend]
ChartStatsOut --> ToolbarStats[Top toolbar stats numeric or categorical]
end
Inputs --> DecideAxes
Inputs --> RenderMainChart
Inputs --> ToolbarStats
subgraph SideDonut
SideCat[choose categorical column with limited uniques]
SideNum[choose numeric column]
SideConditions{valid catCol numCol and rows?}
SideConditions -->|yes| BuildPieData[group and sum top categories]
SideConditions -->|no| NoSideChart[render nothing]
BuildPieData --> DonutChart[Donut PieChart with gradients tooltip legend]
end
DataHeaders --> SideCat
ColumnTypes --> SideCat
ColumnStats --> SideCat
DataHeaders --> SideNum
ColumnTypes --> SideNum
DataRows --> SideConditions
SideCat --> SideConditions
SideNum --> SideConditions
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In the backend controllers (
getCleanedData,getOriginalData,getVisualization,getDashboardConfig), the fallback that scans any dataset folder for a user and returns the first match can easily return the wrong dataset/dashboard for a givendatasetId; consider tightening this to only match the requested ID (or a clearly related temp ID) to avoid cross-dataset leakage and confusing UI results. - The visualization components now have quite a bit of duplicated configuration/logic for colors, gradients, tooltips, and legends across multiple chart types and pie/donut charts; extracting shared helper components or utility functions for these concerns would simplify
VisualizationPage.jsxand make future styling changes less error-prone. - There are several new
console.logstatements in backend controllers used for path discovery and error tracing; if these are meant for debugging, consider gating them behind an environment-based logging utility or removing them in production to avoid noisy logs and potential information leakage.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the backend controllers (`getCleanedData`, `getOriginalData`, `getVisualization`, `getDashboardConfig`), the fallback that scans any dataset folder for a user and returns the first match can easily return the wrong dataset/dashboard for a given `datasetId`; consider tightening this to only match the requested ID (or a clearly related temp ID) to avoid cross-dataset leakage and confusing UI results.
- The visualization components now have quite a bit of duplicated configuration/logic for colors, gradients, tooltips, and legends across multiple chart types and pie/donut charts; extracting shared helper components or utility functions for these concerns would simplify `VisualizationPage.jsx` and make future styling changes less error-prone.
- There are several new `console.log` statements in backend controllers used for path discovery and error tracing; if these are meant for debugging, consider gating them behind an environment-based logging utility or removing them in production to avoid noisy logs and potential information leakage.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
🚀 Changes Implemented:
• Integrated complete Feature Engineering pipeline in ML Engine
• Fixed dataset processing pipeline
• Connected frontend with backend APIs
• Improved overall system flow
Upload → Processing → Feature Engineering → Visualization
✅ Result:
• Fully dynamic dashboard (no static data)
• Backend-driven analytics working successfully
• End-to-end ML pipeline running without errors
• Feature engineering validated through pipeline execution
Summary by Sourcery
Enhance the end-to-end data cleaning and visualization experience and improve backend dataset/visualization resolution and auth API integration.
New Features:
Bug Fixes:
Enhancements: