Skip to content

added final logs#5

Open
Frustum01 wants to merge 9 commits intogarvjain7:featurefrom
Frustum01:feature/my-change-2
Open

added final logs#5
Frustum01 wants to merge 9 commits intogarvjain7:featurefrom
Frustum01:feature/my-change-2

Conversation

@Frustum01
Copy link
Copy Markdown

@Frustum01 Frustum01 commented Apr 12, 2026

Summary by Sourcery

Add internal dataset insight chatbot service, enhance query and logging pipeline, and improve admin/employee UX around logs, permissions, and dataset cleaning.

New Features:

  • Introduce an internal FastAPI-based dataset insight chatbot service with LLM-powered pandas query generation, sandboxed execution, and a minimal HTML frontend.
  • Add employee-facing features such as dataset meta display in chat, multi-model selection, and dataset access request flows surfaced to admins in the permissions page.
  • Provide employee upload navigation and new dataset cleaning logging endpoints to track cleaning actions from the employee flows.

Bug Fixes:

  • Normalize dataset metadata mapping and cleaned-data lookup queries to use canonical dataset fields and joins.

Enhancements:

  • Extend activity logging to distinguish data modification queries from read-only queries and surface MODIFY events with dedicated styling and summary stats in the admin logs view.
  • Refactor the chat backend to delegate query handling to the new FastAPI engine via HTTP instead of shelling out to a Python script, including richer intent/code data in responses.
  • Improve employee UI polish and responsiveness, including CSS cleanup and dataset listing actions like a top-level upload button.

Documentation:

  • Add a README describing the dataset insight chatbot architecture, endpoints, security model, and deployment options.

- Employee 'Request Admin Access' workflow with approval polling
- Admin PermissionPage to approve/deny dataset modification requests
- Backend role escalation for approved modification queries
- Dataset persistence: write modified DataFrames back to disk
- Activity logs: track QUERY vs MODIFY events with intent detection
- Chatbot History tab in admin Logs page grouped by employee
- Fix sidebar flex-shrink for side-by-side layout stability
- Fix userId reference (req.user.id) for reliable activity logging
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16d78af5-a662-4329-a7e8-411293d034ea

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 12, 2026

Reviewer's Guide

Implements an internal FastAPI-based query engine with role-aware data modification, wires the existing Node/React app to it, enhances logging of query and cleaning activity (including a new MODIFY event type and employee log view), introduces dataset access request flows, and adds model selection and richer dataset context to the employee chat experience.

Sequence diagram for employee chat query via FastAPI engine

sequenceDiagram
  actor Employee
  participant ERChat as EmployeeChatPage
  participant NodeAPI as Node_chatRoutes
  participant ChatCtrl as chatController_askQuestion
  participant FastAPI as FastAPI_internal_query
  participant LLM as LLM_provider

  Employee->>ERChat: Type question and click Send
  ERChat->>ERChat: Read datasetId, selectedModel, userName
  ERChat->>ERChat: Check localStorage datasetAccessRequests
  ERChat->>NodeAPI: askQuery(datasetId, question, model, portal='employee', isApproved)
  NodeAPI->>ChatCtrl: Forward POST /query body

  ChatCtrl->>ChatCtrl: Resolve datasetDir for datasetId
  ChatCtrl->>FastAPI: POST /internal/query {dataset_id, file_dir_path, question, model, role}

  alt LLM allowed
    FastAPI->>LLM: Generate pandas code from schema and question
    LLM-->>FastAPI: pandas code
    FastAPI->>FastAPI: Safe_execute code on DataFrame
    FastAPI->>LLM: Summarize result to natural language
    LLM-->>FastAPI: answer text
    FastAPI-->>ChatCtrl: {answer, code, intent, confidence}
  else Modify intent but role cannot_modify
    FastAPI-->>ChatCtrl: {answer:'Access Denied', intent:'error'}
  end

  ChatCtrl-->>NodeAPI: JSON {success, answer, code, intent}
  NodeAPI->>ERChat: Response propagated

  ERChat->>ERChat: buildResponseHTML(text, code)
  ERChat-->>Employee: Render AI message with optional code block and Access Denied CTA
Loading

Sequence diagram for dataset access request and approval flow

sequenceDiagram
  actor Employee
  participant ERChat as EmployeeChatPage
  participant LocalStorage as browser_localStorage
  actor Admin
  participant AdminPerm as PermissionPage

  Employee->>ERChat: Receive AI message with Access Denied
  ERChat->>ERChat: Detect Access Denied text
  Employee->>ERChat: Click Request Admin Access
  ERChat->>LocalStorage: Read datasetAccessRequests
  ERChat->>LocalStorage: Append {id, user, email, dataset, datasetId, time, status:'pending'}
  ERChat->>ERChat: accessRequests[id]='requested'

  loop Poll for approval
    ERChat->>LocalStorage: Read datasetAccessRequests
    LocalStorage-->>ERChat: Current requests array
    ERChat->>ERChat: If my request status is approved then set accessRequests[id]='approved' and stop polling
  end

  Admin->>AdminPerm: Open PermissionPage
  loop Poll localStorage
    AdminPerm->>LocalStorage: Read datasetAccessRequests
    LocalStorage-->>AdminPerm: Pending requests
    AdminPerm->>AdminPerm: setDatasetAccessRequests(pending)
  end

  Admin->>AdminPerm: Click Grant Access for request id
  AdminPerm->>LocalStorage: Update request.status='approved'
  AdminPerm->>AdminPerm: Remove request from pending list

  Note over ERChat,AdminPerm: Employee polling loop detects approved status
  ERChat->>ERChat: accessRequests[id]='approved'
  Employee->>ERChat: Retry query (now isApproved true)
Loading

Class diagram for updated chat, logs, and permission components

classDiagram
  class EmployeeChatPage {
    +string selectedModel
    +object selectedDataset
    +object datasetMeta
    +array messages
    +object accessRequests
    +function handleSend(text)
    +function handleRequestAccess(id)
    +function loadDatasets()
    +function loadDatasetMeta(datasetId)
  }

  class EmployeeLogsView {
    +array logs
    +string searchTerm
    +function formatDate(date)
    +function formatDuration(seconds)
    +function getMethodClass(eventType)
    +function getEventIcon(eventType)
  }

  class LogsPage {
    +array logs
    +string activeTab
    %% activeTab: system or employee_logs
    +number totalLogins
    +number totalCleans
    +number totalModifies
    +number totalQueries
    +function fetchData()
    +function clearFilters()
  }

  class PermissionPage {
    +array users
    +array pendingUsers
    +array datasetAccessRequests
    +function fetchData()
    +function handleApprove(email)
    +function handleReject(email)
    +function handleDatasetAccessAction(id, newStatus)
  }

  class ApiService {
    +function askQuery(datasetId, question, model, portal, isApproved)
    +function getDatasetAnalysis(datasetId)
    +function cleanDataset(datasetId, detail)
  }

  EmployeeChatPage --> ApiService : uses askQuery
  EmployeeChatPage --> ApiService : uses getDatasetAnalysis
  EmployeeChatPage --> ApiService : uses cleanDataset (via other pages)
  LogsPage --> EmployeeLogsView : renders
  PermissionPage --> PermissionPage : polls_localStorage_for_datasetAccessRequests
Loading

File-Level Changes

Change Details Files
Add role-aware FastAPI "Dataset Insight Chatbot" service and wire Node chat controller to call it instead of the legacy Python script.
  • Introduce a standalone FastAPI backend that handles auth, dataset upload, schema extraction, LLM-based pandas code generation, sandboxed execution, logging, and an internal /internal/query endpoint for the Node service.
  • Replace the Node chatController exec-based call to a Python script with an HTTP call to the FastAPI /internal/query endpoint, constructing a payload with dataset path, model, role, and approval status.
  • Standardize FastAPI responses to include answer, code, result, intent, and confidence and adapt the Node controller to surface answer/code to the frontend.
dataset-insight-chatbot/backend/main.py
dataset-insight-chatbot/backend/requirements.txt
dataset-insight-chatbot/backend/Dockerfile
dataset-insight-chatbot/docker-compose.yml
backend-node/src/controllers/chatController.js
Enhance query logging and distinguish MODIFY vs QUERY events throughout the backend and admin logs UI.
  • Extend query logging to carry an intent and log MODIFY events when the model marks a modification intent.
  • Update activity logging helpers and routes so cleaning and query actions log richer detail, including dataset name resolution from the datasets table.
  • Adjust admin LogsPage to recognize the MODIFY event type, show a dedicated styling/icon, and add employee-centric log grouping and a tabbed System Activity vs Employee Logs view.
backend-node/src/controllers/activityController.js
backend-node/src/routes/chatRoutes.js
backend-node/src/routes/cleanedDataRoutes.js
backend-node/src/routes/datasetRoutes.js
frontend-react/src/pages/admin/LogsPage.jsx
Improve employee chat experience with model selection, dataset metadata, access control UX, and code-aware responses.
  • Allow employees to choose the underlying LLM model (Groq/OpenAI/Ollama) from the chat UI and pass the selection to askQuery.
  • Fetch dataset analysis metadata for the selected dataset and render real row/column counts and column chips instead of static placeholders.
  • Thread dataset access approval through localStorage-based requests (raised from chat when an Access Denied message is returned) and visual feedback in both EmployeeChatPage and reusable QueryAssistant BotMessage.
  • Update askQuery API helper to accept model, portal, and isApproved flags and return backend code snippets to render in the UI.
frontend-react/src/pages/employee/EmployeeChatPage.jsx
frontend-react/src/components/QueryAssistant.jsx
frontend-react/src/services/api.js
Log employee cleaning actions as structured backend events and wire the frontend cleaning flows to call the new logging endpoint.
  • Add a cleanDataset API helper that posts cleaning detail to the backend /datasets/:id/clean route.
  • Update employee cleaning pages to call cleanDataset when users download cleaned CSVs or proceed to visualization, and on column-clean trigger.
  • Change the datasetRoutes cleaning log to record a completed cleaning event with a detail message from the client.
frontend-react/src/services/api.js
frontend-react/src/pages/employee/EmployeeCleaningPage.jsx
frontend-react/src/pages/employee/ColumnCleaningPage.jsx
backend-node/src/routes/datasetRoutes.js
backend-node/src/controllers/activityController.js
Improve employee navigation and dataset UX (upload entry point, layout user name, and styling tweaks).
  • Add an "Upload Data" nav item and a topbar Upload Dataset button on the employee datasets page that routes to the upload page.
  • Align EmployeeLayout user name sourcing with the updated auth flow (using getMe / localStorage) and resolve merge markers that diverged between localStorage and sessionStorage implementations.
  • Apply minor CSS formatting/whitespace normalization in Employee CSS and tighten stat card animation rules.
frontend-react/src/layout/EmployeeLayout.jsx
frontend-react/src/pages/employee/EmployeeDatasetsPage.jsx
frontend-react/src/styles/Employee.css
Introduce a minimal static frontend for the new chatbot service and scratch utilities for DB inspection.
  • Add a standalone HTML/JS frontend for the Dataset Insight Bot FastAPI service with login, dataset upload, schema view, chat UI, and model selector.
  • Include a Node scratch script to inspect the datasets table schema via pg pool for debugging.
  • Add project docs and env scaffolding for the new chatbot microservice.
dataset-insight-chatbot/frontend/index.html
dataset-insight-chatbot/README.md
dataset-insight-chatbot/.env.example
backend-node/scratch-test.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • There are unresolved merge conflict markers in several files (e.g. frontend-react/src/services/api.js, frontend-react/src/layout/EmployeeLayout.jsx, backend-node/src/controllers/uploadController.js); these need to be resolved and the chosen storage/field names made consistent before merging.
  • The dataset access request/approval flow is implemented entirely via localStorage on the client (datasetAccessRequests in EmployeeChatPage and PermissionPage), which is trivially spoofable and not multi-user safe; consider moving this workflow (storage, approvals, and polling) to backend APIs tied to authenticated users.
  • The new FastAPI dataset-insight-chatbot service introduces a parallel auth and dataset handling stack; if it is intended to be the production path, consider aligning its models and permission checks more closely with the existing Node backend (e.g. dataset identifiers, roles, and logging) to avoid duplicated logic and inconsistent behavior between the two paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There are unresolved merge conflict markers in several files (e.g. `frontend-react/src/services/api.js`, `frontend-react/src/layout/EmployeeLayout.jsx`, `backend-node/src/controllers/uploadController.js`); these need to be resolved and the chosen storage/field names made consistent before merging.
- The dataset access request/approval flow is implemented entirely via `localStorage` on the client (`datasetAccessRequests` in EmployeeChatPage and PermissionPage), which is trivially spoofable and not multi-user safe; consider moving this workflow (storage, approvals, and polling) to backend APIs tied to authenticated users.
- The new FastAPI `dataset-insight-chatbot` service introduces a parallel auth and dataset handling stack; if it is intended to be the production path, consider aligning its models and permission checks more closely with the existing Node backend (e.g. dataset identifiers, roles, and logging) to avoid duplicated logic and inconsistent behavior between the two paths.

## Individual Comments

### Comment 1
<location path="frontend-react/src/services/api.js" line_range="33-39" />
<code_context>

       if (!companyId) throw new Error("No company in database environment");

+<<<<<<< HEAD
+      const fileHash = crypto.randomBytes(16).toString('hex');
+      const insertResult = await pool.query(
+        `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status)
+=======
       await pool.query(
         `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status)
+>>>>>>> upstream/feature
          VALUES ($1, $2, $3, $4, $5, $6, $7)`,
         [
</code_context>
<issue_to_address>
**issue (bug_risk):** Resolve the merge conflict in getMe to use a single storage strategy for role/userName.

`getMe` still has merge conflict markers and two different storage strategies (localStorage vs sessionStorage), which will break the build and can desync auth state. Choose a single storage mechanism, remove the conflict markers, and verify that all callers (e.g., `EmployeeLayout` and access control) read/write from that same storage location.
</issue_to_address>

### Comment 2
<location path="frontend-react/src/layout/EmployeeLayout.jsx" line_range="20-29" />
<code_context>

       if (!companyId) throw new Error("No company in database environment");

+<<<<<<< HEAD
+      const fileHash = crypto.randomBytes(16).toString('hex');
+      const insertResult = await pool.query(
+        `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status)
+=======
       await pool.query(
         `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status)
+>>>>>>> upstream/feature
          VALUES ($1, $2, $3, $4, $5, $6, $7)`,
         [
</code_context>
<issue_to_address>
**issue (bug_risk):** Clean up merge conflict and normalize how the employee name is sourced and stored.

This file still contains merge conflict markers and two different approaches to `userName` (localStorage + `getMe` vs sessionStorage), which will break the build and can desync identity/role handling from `getMe`. Resolve the conflict by choosing a single source of truth (e.g., always call `getMe`, then persist the result in one storage mechanism) and align usage with that convention across the app.
</issue_to_address>

### Comment 3
<location path="backend-node/src/controllers/uploadController.js" line_range="63-70" />
<code_context>

       if (!companyId) throw new Error("No company in database environment");

+<<<<<<< HEAD
+      const fileHash = crypto.randomBytes(16).toString('hex');
+      const insertResult = await pool.query(
+        `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status)
+=======
       await pool.query(
         `INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status)
+>>>>>>> upstream/feature
          VALUES ($1, $2, $3, $4, $5, $6, $7)`,
         [
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix the merge conflict in dataset insertion so column names and values line up correctly.

The `INSERT` statement still contains merge markers and two incompatible column lists: `(dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status)` vs `(dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status)`. Resolve the conflict, decide whether you’re storing `file_hash` and/or `file_size`, choose the correct `name`/`file_name` column, and ensure the final column list and values match the actual `datasets` table schema.
</issue_to_address>

### Comment 4
<location path="backend-node/src/routes/chatRoutes.js" line_range="52-53" />
<code_context>
           req.activityUserId,
-          req.activityUserName || userEmail?.split('@')[0],
-          userEmail,
+          req.activityUserName || req.activityUserEmail?.split('@')[0],
+          req.activityUserEmail,
           req.activityDatasetId,
           req.activityDatasetName || datasetId,
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a defined email source when logging query activity instead of req.activityUserEmail.

`req.activityUserEmail` is never set in this route, while `userEmail` is derived from `req.user`. As a result, this change will log `undefined` for both email and name. Please keep `userEmail` as a fallback for both fields, e.g. `req.activityUserName || userEmail?.split('@')[0]` and `req.activityUserEmail || userEmail`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +33 to +39
<<<<<<< HEAD
localStorage.setItem('role', user.role);
localStorage.setItem('userName', user.name || user.full_name || 'User');
=======
sessionStorage.setItem('role', user.role);
sessionStorage.setItem('userName', user.name);
>>>>>>> upstream/feature
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Resolve the merge conflict in getMe to use a single storage strategy for role/userName.

getMe still has merge conflict markers and two different storage strategies (localStorage vs sessionStorage), which will break the build and can desync auth state. Choose a single storage mechanism, remove the conflict markers, and verify that all callers (e.g., EmployeeLayout and access control) read/write from that same storage location.

Comment on lines +20 to +29
<<<<<<< HEAD
const [userName, setUserName] = useState(() => {
const val = localStorage.getItem('userName');
return (val && val !== 'undefined' && val !== 'null') ? val : 'Employee';
});
const safeName = userName || 'Employee';
const userInitials = safeName.split(' ').map(n => n[0]).join('').toUpperCase().slice(0, 2);

useEffect(() => {
getMe().then(user => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Clean up merge conflict and normalize how the employee name is sourced and stored.

This file still contains merge conflict markers and two different approaches to userName (localStorage + getMe vs sessionStorage), which will break the build and can desync identity/role handling from getMe. Resolve the conflict by choosing a single source of truth (e.g., always call getMe, then persist the result in one storage mechanism) and align usage with that convention across the app.

Comment on lines +63 to +70
<<<<<<< HEAD
const fileHash = crypto.randomBytes(16).toString('hex');
const insertResult = await pool.query(
`INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status)
=======
await pool.query(
`INSERT INTO datasets (dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status)
>>>>>>> upstream/feature
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Fix the merge conflict in dataset insertion so column names and values line up correctly.

The INSERT statement still contains merge markers and two incompatible column lists: (dataset_id, company_id, uploaded_by, dataset_name, name, hash, upload_status) vs (dataset_id, company_id, uploaded_by, dataset_name, file_name, file_size, upload_status). Resolve the conflict, decide whether you’re storing file_hash and/or file_size, choose the correct name/file_name column, and ensure the final column list and values match the actual datasets table schema.

Comment on lines +52 to +53
req.activityUserName || req.activityUserEmail?.split('@')[0],
req.activityUserEmail,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Use a defined email source when logging query activity instead of req.activityUserEmail.

req.activityUserEmail is never set in this route, while userEmail is derived from req.user. As a result, this change will log undefined for both email and name. Please keep userEmail as a fallback for both fields, e.g. req.activityUserName || userEmail?.split('@')[0] and req.activityUserEmail || userEmail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant