Skip to content

chat_bot modeified and logs integrated#4

Open
Frustum01 wants to merge 9 commits intogarvjain7:featurefrom
Frustum01:feature/my-change
Open

chat_bot modeified and logs integrated#4
Frustum01 wants to merge 9 commits intogarvjain7:featurefrom
Frustum01:feature/my-change

Conversation

@Frustum01
Copy link
Copy Markdown

@Frustum01 Frustum01 commented Apr 12, 2026

Summary by Sourcery

Integrate a new FastAPI-based Dataset Insight Chatbot backend and UI, wire it into the existing Node/React app for secure, logged data querying and modification, and enhance admin/employee flows with chatbot activity views, dataset metadata, access requests, and permission-aware model selection.

New Features:

  • Add a dedicated chatbot microservice (FastAPI) that performs schema-based LLM querying and safe pandas execution with support for Groq, OpenAI, and Ollama models.
  • Introduce a standalone web UI and Docker setup for the Dataset Insight Chatbot, including upload, query, and visualization flows.
  • Add a chatbot history tab to the admin logs page that groups QUERY and MODIFY events by user and surfaces modification metadata.
  • Expose dataset analysis metadata to the employee chat page and allow selecting different LLM backends when asking questions.
  • Enable employees to request dataset modification access from within chat responses, and surface those requests to admins for approval in the permissions page.

Bug Fixes:

  • Correct dataset and user lookup in chat and cleaned data routes to use the new datasets schema and safer user identifiers.
  • Harden user name handling in the employee layout and API layer to avoid undefined/null values in local storage and UI badges.

Enhancements:

  • Refactor chat routing and activity logging to capture query intent, distinguish data modifications from read-only queries, and log richer details to the activity system.
  • Update admin logs styling and icons to highlight MODIFY events separately and clarify chatbot-related traffic.
  • Improve employee dataset and chat UX with upload shortcuts, dynamic column chips from backend metadata, and inline access-denied guidance.
  • Replace the Node.js shell-based Python invocation with an HTTP integration to the new FastAPI query engine, including model selection and role-aware access control.

Deployment:

  • Add Dockerfile and docker-compose configuration to run the chatbot backend and static frontend alongside the existing stack.

Documentation:

  • Add README documentation describing the Dataset Insight Chatbot architecture, endpoints, security model, and local/Docker usage.

Chores:

  • Add a scratch Node.js script to inspect the datasets table schema during development.

- Employee 'Request Admin Access' workflow with approval polling
- Admin PermissionPage to approve/deny dataset modification requests
- Backend role escalation for approved modification queries
- Dataset persistence: write modified DataFrames back to disk
- Activity logs: track QUERY vs MODIFY events with intent detection
- Chatbot History tab in admin Logs page grouped by employee
- Fix sidebar flex-shrink for side-by-side layout stability
- Fix userId reference (req.user.id) for reliable activity logging
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 12, 2026

Reviewer's Guide

Integrates a new FastAPI-based dataset insight chatbot microservice and wires it into the existing Node/React app, adds role- and intent-aware logging for chat activity (including data modifications), enhances employee chat UX with model selection, dataset metadata, and access request flows, and extends the admin UI with chatbot history and dataset access approvals.

Sequence diagram for employee chat query through Node to FastAPI chatbot

sequenceDiagram
  actor Employee
  participant Browser as EmployeeChatPage
  participant Api as askQuery_api
  participant NodeRoute as Node_/query_route
  participant ChatCtrl as ChatController_askQuestion
  participant FastAPI as FastAPI_internal_query
  participant ActCtrl as ActivityController_logQueryActivity
  participant DB as Postgres_DB

  Employee->>Browser: Type message and click Send
  Browser->>Browser: handleSend(text)
  Browser->>Api: askQuery(datasetId, text, selectedModel, 'employee', isApproved)

  Api->>NodeRoute: POST /query {datasetId, question, model, portal, isApproved}
  NodeRoute->>ChatCtrl: askQuestion(req, res)

  ChatCtrl->>DB: Resolve dataset dir for datasetId
  DB-->>ChatCtrl: datasetDir

  ChatCtrl->>FastAPI: POST /internal/query {dataset_id, file_dir_path, question, model, role}
  Note right of FastAPI: Extract schema
  FastAPI->>FastAPI: call_llm() to generate pandas code
  FastAPI->>FastAPI: safe_execute(code, df)
  FastAPI->>FastAPI: call_llm() to summarize result
  FastAPI-->>ChatCtrl: {answer, code, intent, confidence}

  ChatCtrl-->>NodeRoute: JSON {success, answer, code, intent}

  NodeRoute->>ActCtrl: logQueryActivity(userId, userName, userEmail, datasetId, datasetName, queryText, status, duration, intent)
  ActCtrl->>DB: INSERT activity log (event_type QUERY or MODIFY)
  DB-->>ActCtrl: ok

  NodeRoute-->>Api: HTTP 200 {answer, code, intent}
  Api-->>Browser: response

  Browser->>Browser: buildResponseHTML({text: answer, code})
  Browser-->>Employee: Render bot message and optional code
Loading

Class diagram for key chatbot-related components and models

classDiagram
  class EmployeeChatPage {
    +datasetMeta
    +selectedModel
    +accessRequests
    +handleSend(text)
    +handleRequestAccess(id)
  }

  class ApiService {
    +askQuery(datasetId, question, model, portal, isApproved)
    +getDatasetAnalysis(datasetId)
  }

  class ChatController {
    +askQuestion(req, res)
  }

  class InternalQueryRequest {
    +string dataset_id
    +string file_dir_path
    +string question
    +string model
    +string role
  }

  class FastAPIApp {
    +query_dataset(req, current_user)
    +internal_query(req)
    +extract_schema(df)
    +safe_execute(code, df)
    +call_llm(model, system, user)
    +role_can(role, action)
  }

  class ActivityController {
    +logQueryActivity(userId, userName, userEmail, datasetId, datasetName, query, status, durationSeconds, intent)
  }

  class LogsPage {
    +activeTab
    +ChatbotHistoryView(logs)
    +getMethodClass(event)
    +getEventIcon(event)
  }

  class PermissionPage {
    +datasetAccessRequests
    +handleDatasetAccessAction(id, newStatus)
  }

  EmployeeChatPage --> ApiService : uses
  EmployeeChatPage --> LogsPage : generates QUERY
  EmployeeChatPage --> PermissionPage : shares datasetAccessRequests via localStorage

  ApiService --> ChatController : calls /query
  ChatController --> FastAPIApp : axios POST /internal/query
  FastAPIApp --> InternalQueryRequest : accepts
  FastAPIApp --> ActivityController : supplies intent modify/insight

  ActivityController --> LogsPage : event_type QUERY or MODIFY
  PermissionPage --> EmployeeChatPage : updates accessRequests state
Loading

File-Level Changes

Change Details Files
Add a dedicated FastAPI-based Dataset Insight Chatbot service that handles dataset upload, schema extraction, LLM-based pandas code generation, sandboxed execution, and query logging, plus a lightweight HTML frontend and Docker setup.
  • Implement main FastAPI app with auth via PostgreSQL, role permissions, dataset upload/versioning, schema extraction, LLM orchestration (Groq/OpenAI/Ollama), sandboxed pandas execution, and detailed query/audit logging.
  • Add an internal /internal/query endpoint that operates without auth or direct DB access for use by the Node.js backend, including role-aware modify permissions and result formatting.
  • Include a static single-page HTML UI for the chatbot, a README documenting architecture and setup, requirements.txt, Dockerfile, and docker-compose configuration for running backend and frontend containers.
dataset-insight-chatbot/backend/main.py
dataset-insight-chatbot/frontend/index.html
dataset-insight-chatbot/README.md
dataset-insight-chatbot/docker-compose.yml
dataset-insight-chatbot/backend/Dockerfile
dataset-insight-chatbot/backend/requirements.txt
dataset-insight-chatbot/.env.example
dataset-insight-chatbot/backend/.gitignore
Refactor the Node chat controller to delegate query handling to the new FastAPI service and enrich activity logging with intent-based differentiation between READ and MODIFY operations.
  • Change chatController.askQuestion to POST JSON payloads to the FastAPI /internal/query endpoint instead of invoking a local Python script via child_process.exec, mapping model, role, and approval flags.
  • Improve error handling in chatController to surface FastAPI connectivity issues and include generated code in responses.
  • Update chatRoutes.wrapWithActivity to look up dataset metadata from the new schema, capture user id/name/email more robustly, and pass intent to logQueryActivity for correct event typing.
  • Extend activityController.logQueryActivity to classify events as QUERY or MODIFY based on LLM intent and adjust event description text accordingly.
  • Align cleanedDataRoutes dataset lookups with the new datasets schema (dataset_name and user join).
backend-node/src/controllers/chatController.js
backend-node/src/routes/chatRoutes.js
backend-node/src/controllers/activityController.js
backend-node/src/routes/cleanedDataRoutes.js
Enhance the employee chat experience with model selection, dataset metadata, permission-aware queries, and in-chat access request flows.
  • Extend EmployeeChatPage to support multiple LLM backends via a model dropdown, fetch dataset analysis metadata, and display dynamic dataset/column info instead of hardcoded values.
  • Update askQuery API to accept model, portal, and approval flags, and forward them to the backend; pass LLM-generated code back into the chat for richer answers.
  • Add a localStorage-based dataset access request mechanism from EmployeeChatPage that lets users request modify access when the assistant denies permission, with polling for admin approval.
  • Improve EmployeeLayout user name handling and initials generation to be more robust against missing or invalid values.
  • Tidy up Employee.css formatting (whitespace and keyframe style consistency).
frontend-react/src/pages/employee/EmployeeChatPage.jsx
frontend-react/src/services/api.js
frontend-react/src/layout/EmployeeLayout.jsx
frontend-react/src/styles/Employee.css
Extend the admin UI to surface chatbot-related activity and handle dataset modification access requests from employees.
  • Add a ChatbotHistoryView component to LogsPage and a tabbed UI between system logs and per-user chatbot history, including special treatment and styling for MODIFY events and expanded logging limits.
  • Update log visualization helpers to handle a new MODIFY event type and to use a chat icon for QUERY while reserving the database icon for MODIFY.
  • Augment PermissionPage to read dataset access requests from localStorage, display them alongside pending user approvals, and allow admins to approve or deny modify access, updating localStorage state.
  • Adjust the system stats on LogsPage to count MODIFY events separately and tweak labeling/colors to reflect chat-specific metrics.
frontend-react/src/pages/admin/LogsPage.jsx
frontend-react/src/pages/admin/PermissionPage.jsx
Improve employee dataset navigation and uploading flow to align with the new chatbot capabilities.
  • Add an Upload Data navigation item to the employee sidebar layout and ensure user display name is resilient to missing profile fields.
  • Expose an Upload Dataset button in the EmployeeDatasetsPage top bar that routes to the employee upload page, aligning with the new dataset upload capabilities.
frontend-react/src/layout/EmployeeLayout.jsx
frontend-react/src/pages/employee/EmployeeDatasetsPage.jsx
Miscellaneous backend tooling and schema-alignment fixes.
  • Add a scratch-test Node script to inspect the datasets table structure via Postgres for debugging.
  • Adjust uploadController to insert into updated datasets columns (dataset_name, name, hash) and generate a synthetic file hash instead of using file_size.
  • Update datasetController.mapDatasetRow to handle new column names and fallbacks for filename and status fields.
  • Leave package-lock.json files updated to reflect dependency tree changes (no functional modifications).
backend-node/scratch-test.js
backend-node/src/controllers/uploadController.js
backend-node/src/controllers/datasetController.js
backend-node/package-lock.json
frontend-react/package-lock.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e5b82849-af79-485a-b877-3e1924ccd4ae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The dataset access approval flow relies entirely on localStorage (datasetAccessRequests) shared between the employee and admin UIs, which won’t work across different browsers/devices and is trivial to tamper with on the client; consider moving this to a backend-backed permission request model instead of treating localStorage as an authority of record.
  • The new scratch-test.js script in backend-node looks like a local debugging helper and is not referenced anywhere; consider removing it from the repo or moving it under a dedicated tooling/dev directory to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The dataset access approval flow relies entirely on `localStorage` (`datasetAccessRequests`) shared between the employee and admin UIs, which won’t work across different browsers/devices and is trivial to tamper with on the client; consider moving this to a backend-backed permission request model instead of treating localStorage as an authority of record.
- The new `scratch-test.js` script in `backend-node` looks like a local debugging helper and is not referenced anywhere; consider removing it from the repo or moving it under a dedicated tooling/dev directory to avoid confusion.

## Individual Comments

### Comment 1
<location path="frontend-react/src/pages/employee/EmployeeChatPage.jsx" line_range="150-159" />
<code_context>
+  const handleRequestAccess = (id) => {
</code_context>
<issue_to_address>
**issue (bug_risk):** Access request polling interval is never cleaned up, which can leak timers across the session.

Each `handleRequestAccess` call creates a `setInterval` that runs indefinitely unless the request is approved or removed from `localStorage`, so timers can keep firing after the component unmounts or the user navigates away. Please tie the polling lifecycle to React (e.g., store the interval ID and clear it in a `useEffect` cleanup) or move to a single shared interval keyed by dataset/user instead of per-request intervals.
</issue_to_address>

### Comment 2
<location path="backend-node/src/controllers/chatController.js" line_range="63-64" />
<code_context>
+      dataset_id: datasetId,
+      file_dir_path: datasetDir,
+      question: queryText,
+      model: model || "groq",
+      role: req.body.isApproved ? 'admin' : (req.body.portal === 'employee' ? 'employee' : (req.user?.role || "viewer"))
+    };

</code_context>
<issue_to_address>
**🚨 issue (security):** Trusting `isApproved` from the client to escalate role to admin is a privilege escalation risk.

Because `role` is computed directly from `req.body.isApproved`, a client can POST `{ isApproved: true }` to `/query` and obtain `admin` privileges in the Python service, bypassing any UI checks. Since `internal/query` relies solely on `role` to gate modification actions, this is a privilege‑escalation vulnerability. Please derive `role` and permissions only from server‑side state (e.g., `req.user` and persisted permissions) and pass a backend‑controlled `role` value that cannot be influenced by client JSON.
</issue_to_address>

### Comment 3
<location path="dataset-insight-chatbot/backend/Dockerfile" line_range="12" />
<code_context>
+
+EXPOSE 8000
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
</code_context>
<issue_to_address>
**suggestion (performance):** Running uvicorn with `--reload` inside the container is not ideal for non-dev deployments.

`--reload` adds a file watcher that increases overhead and can behave unpredictably with container filesystems. For non-dev images, drop `--reload` and either run a single-process uvicorn or use a process manager (e.g., gunicorn with uvicorn workers). If this is dev-only, consider documenting that or using a separate `Dockerfile.dev`.

Suggested implementation:

```
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

```

1. If this image is intended for development, consider adding a separate `Dockerfile.dev` that keeps `--reload` and documenting in the README which Dockerfile to use for dev vs prod.
2. For production, you may also want to consider a process manager (e.g. `gunicorn -k uvicorn.workers.UvicornWorker`) if you need multiple workers or more advanced process supervision.
</issue_to_address>

### Comment 4
<location path="dataset-insight-chatbot/README.md" line_range="140-7" />
<code_context>
+
+### POST /upload
+
+```json
+// Response
+{
+  "dataset_id": "sales.csv",
+  "schema": {
+    "row_count": 5000,
+    "columns": {
+      "revenue": {
+        "type": "numeric",
+        "min": 100, "max": 95000, "mean": 4820.5, ...
+      },
+      "region": {
+        "type": "categorical",
+        "unique_count": 4,
+        "top_values": ["North", "South", "East", "West"]
+      }
+    }
+  }
+}
+```
+
+### POST /query
</code_context>
<issue_to_address>
**suggestion:** JSON examples include comments/ellipsis but are fenced as strict JSON

In the `/upload` and `/query` sections, the fenced `json` examples contain `//` comments and `...`, which aren’t valid JSON. To avoid confusing users who might paste them into tools, consider changing the fence (e.g., to `jsonc`), or making the examples valid JSON by removing comments/ellipsis or clearly marking them as illustrative only.

Suggested implementation:

```
### POST /upload

```jsonc
// Example response
{
  "dataset_id": "sales.csv",
  "schema": {
    "row_count": 5000,
    "columns": {
      "revenue": {
        "type": "numeric",
        "min": 100,
        "max": 95000,
        "mean": 4820.5
        // ...additional numeric stats
      },
      "region": {
        "type": "categorical",
        "unique_count": 4,
        "top_values": ["North", "South", "East", "West"]
        // ...additional categorical stats
      }
    }
  }
}
```

```

```
### POST /query

```jsonc
// Example request
{
  "dataset_id": "sales.csv",
  "question": "Which region has the highest average revenue?",
  "model": "ollama" // or "claude"
}

// Example response

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +150 to +159
const handleRequestAccess = (id) => {
setAccessRequests(prev => ({ ...prev, [id]: 'requested' }));

// Save to localStorage so admin can see it
const reqs = JSON.parse(localStorage.getItem('datasetAccessRequests') || '[]');
const user = localStorage.getItem('userName') || 'Employee User';
const email = localStorage.getItem('email') || 'employee@datainsights.app';
reqs.push({
id,
user,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Access request polling interval is never cleaned up, which can leak timers across the session.

Each handleRequestAccess call creates a setInterval that runs indefinitely unless the request is approved or removed from localStorage, so timers can keep firing after the component unmounts or the user navigates away. Please tie the polling lifecycle to React (e.g., store the interval ID and clear it in a useEffect cleanup) or move to a single shared interval keyed by dataset/user instead of per-request intervals.

Comment on lines +63 to +64
model: model || "groq",
role: req.body.isApproved ? 'admin' : (req.body.portal === 'employee' ? 'employee' : (req.user?.role || "viewer"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 issue (security): Trusting isApproved from the client to escalate role to admin is a privilege escalation risk.

Because role is computed directly from req.body.isApproved, a client can POST { isApproved: true } to /query and obtain admin privileges in the Python service, bypassing any UI checks. Since internal/query relies solely on role to gate modification actions, this is a privilege‑escalation vulnerability. Please derive role and permissions only from server‑side state (e.g., req.user and persisted permissions) and pass a backend‑controlled role value that cannot be influenced by client JSON.


EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Running uvicorn with --reload inside the container is not ideal for non-dev deployments.

--reload adds a file watcher that increases overhead and can behave unpredictably with container filesystems. For non-dev images, drop --reload and either run a single-process uvicorn or use a process manager (e.g., gunicorn with uvicorn workers). If this is dev-only, consider documenting that or using a separate Dockerfile.dev.

Suggested implementation:

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

  1. If this image is intended for development, consider adding a separate Dockerfile.dev that keeps --reload and documenting in the README which Dockerfile to use for dev vs prod.
  2. For production, you may also want to consider a process manager (e.g. gunicorn -k uvicorn.workers.UvicornWorker) if you need multiple workers or more advanced process supervision.


## How it works

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: JSON examples include comments/ellipsis but are fenced as strict JSON

In the /upload and /query sections, the fenced json examples contain // comments and ..., which aren’t valid JSON. To avoid confusing users who might paste them into tools, consider changing the fence (e.g., to jsonc), or making the examples valid JSON by removing comments/ellipsis or clearly marking them as illustrative only.

Suggested implementation:

### POST /upload

```jsonc
// Example response
{
  "dataset_id": "sales.csv",
  "schema": {
    "row_count": 5000,
    "columns": {
      "revenue": {
        "type": "numeric",
        "min": 100,
        "max": 95000,
        "mean": 4820.5
        // ...additional numeric stats
      },
      "region": {
        "type": "categorical",
        "unique_count": 4,
        "top_values": ["North", "South", "East", "West"]
        // ...additional categorical stats
      }
    }
  }
}

POST /query

// Example request
{
  "dataset_id": "sales.csv",
  "question": "Which region has the highest average revenue?",
  "model": "ollama" // or "claude"
}

// Example response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant