Skip to content

Extend the predict_fn in the init_model to support multiple output columns#394

Merged
AnilSorathiya merged 22 commits intomainfrom
anilsorathiya/sc-11324/extend-the-predict-fn-in-the-init-model-to
Jul 23, 2025
Merged

Extend the predict_fn in the init_model to support multiple output columns#394
AnilSorathiya merged 22 commits intomainfrom
anilsorathiya/sc-11324/extend-the-predict-fn-in-the-init-model-to

Conversation

@AnilSorathiya
Copy link
Contributor

@AnilSorathiya AnilSorathiya commented Jul 17, 2025

Pull Request Description

This change to extend the predict_fn callable function parameter in the init_model to support multiple output columns :

  • Function can return single value or dictionary
  • One key would capture as prediction column if it returns dictionary. The prediction column identify by key called prediction.
  • Other columns in dictionary can be added as extra columns in dataset object.

What and why?

Currently, predict_fn functional call supports output store in a single column as prediction. However, intermediate outputs from function call can't be store for the traceability purpose.
This PR implements the following:

  • Function can return single value or dictionary
  • One key would capture as prediction column if it returns dictionary. The prediction column identify by key called prediction.
  • Other columns in dictionary can be added as extra columns in dataset object.
    This could allow us to include intermediate outputs from LangGraph/workflows while invoking assign_predictions. These additional columns could be stored in the VM dataset object for enhanced traceability and analysis for LLM use cases or further analysis.

How to test

run notebooks:

  • notebooks/agents/langchain_agent_simple_demo.ipynb
  • notebooks/agents/langgraph_agent_demo.ipynb
  • notebooks/agents/langgraph_agent_simple_demo.ipynb
  • notebooks/code_samples/nlp_and_llm/rag_documentation_demo.ipynb
  • notebooks/code_samples/nlp_and_llm/rag_benchmark_demo.ipynb
  • notebooks/quickstart/quickstart_model_documentation.ipynb
  • notebooks/code_samples/credit_risk/application_scorecard_full_suite.ipynb

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@AnilSorathiya AnilSorathiya added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 17, 2025
@AnilSorathiya AnilSorathiya removed the enhancement New feature or request label Jul 18, 2025
@AnilSorathiya AnilSorathiya marked this pull request as ready for review July 18, 2025 18:15
Copy link
Contributor

@cachafla cachafla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🙌. I suggest testing traditional notebooks (quickstart model doc, quickstart regression, etc.) to ensure the existing code continues to work without issues.

Copy link
Contributor

@juanmleng juanmleng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! One of those key notebooks to test is the credit scorecard application_scorecard_full_suite.ipynb

@github-actions
Copy link
Contributor

PR Summary

This PR introduces several key enhancements and refactors across the project:

  1. LLM Agent and Notebook Updates:

    • The notebooks now demonstrate an improved LLM-powered tool selection router with enhanced markdown documentation explaining the benefits of intelligent tool routing.
    • The agent functions have been updated to return a structured dictionary that includes the final prediction, raw output, and a list of tools used. Minor formatting changes and refined docstrings make the demo clearer and more instructive.
  2. Utility Module Refactoring:

    • The langchain utility module has been streamlined by removing redundant helper functions (such as extra tool extraction and formatting routines) and consolidating functionality to capture only essential tool output data.
    • Import paths have been updated to reflect the new structure, improving code clarity and reducing unused imports.
  3. Enhanced Dataset Prediction Assignment:

    • The dataset module now supports diverse prediction outputs including simple values as well as dictionary responses. New helper functions (_handle_deprecated_parameters, _handle_dictionary_predictions, etc.) modularize the process of adding prediction and probability columns.
    • The prediction assignment workflow now properly distinguishes between computed predictions and precomputed prediction values, ensuring correct column naming and consistent data assignment.
    • Comprehensive new unit tests in tests/test_dataset.py cover scenarios for classification, regression, complex dictionary outputs, multiple models, and error-handling (e.g. invalid predict_fn), ensuring robustness of the new functionality.

Overall, these changes enhance the robustness, clarity, and flexibility of both the LLM agent routing demos and the dataset prediction functionality without altering core business logic.

Test Suggestions

  • Run all new and existing unit tests in tests/test_dataset.py to ensure all prediction scenarios (classification, regression, complex outputs) are correctly handled.
  • Perform integration testing of the LLM agent flow to verify that the tool routing correctly identifies and logs the correct tool usage.
  • Simulate edge cases including empty tool outputs and invalid prediction functions to check for proper error handling and warnings.
  • Validate that the structured return values (including the 'tools_used' list) are correctly propagated to downstream processing.

@AnilSorathiya AnilSorathiya merged commit 3454dda into main Jul 23, 2025
7 checks passed
@AnilSorathiya AnilSorathiya deleted the anilsorathiya/sc-11324/extend-the-predict-fn-in-the-init-model-to branch July 23, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants