Draft by Kumario1 · Pull Request #10 · akash-network/akash-chat

Kumario1 · 2025-03-09T19:21:18Z

Added file attachment support for images. (PDF and more to come later)

Vision Model Integration for AkashChat using Ollama's LLaVA

This feature allows users to upload images to the chat and have them analyzed. Images are processed by Ollama's LLaVA vision model. The analysis is then appended to the user's message as context and sent to the AI model.

How It Works

The user uploads an image by clicking the upload button in the chat input area
The file is displayed as a preview with an option to remove it
The user types their question about the file
When the user sends the message:
- Images are processed using Ollama's LLaVA model
The analysis is prepended to the user's message as context
The combined message is sent to the main AI model API

Components

ImageUploadButton.tsx: A reusable component for the file upload button
ChatInput.tsx: Modified to include file upload and processing
pages/api/vision.ts: API endpoint that processes images using Ollama's LLaVA model
utils/app/vision.ts: handles file uploads and conversions

New Dependencies

axios: For making HTTP requests to the Ollama API

Configuration

Install and set up Ollama:
- Download and install Ollama from https://ollama.ai/
- Pull the LLaVA model: ollama pull llava
- Make sure Ollama is running on the default port (11434)
If your Ollama server is running on a different machine or port, update the API endpoint URL in pages/api/vision.ts.

Limitations

Vision analysis accuracy depends on the quality of the image and the capabilities of the LLaVA model
Large files may take longer to process
The Ollama server must be running for the vision analysis to work
Processing large files may require significant computational resources

…as "Text Extracted from Image"

* feat: user accounts + shortcuts + new tos, privacy notice + cookie banner + enhance chat context and keyboard shortcuts functionality + Added temperature and top-p settings with local storage persistence in ChatContext. + Updated ChatProvider to retrieve and save temperature and top-p values. + Integrated keyboard shortcuts for various actions in MainLayout and ChatSidebar. + Enhanced chat components to support new props for messages and context files. + Added KeyboardShortcutsModal for improved user experience in managing shortcuts. * fix: merge err * feat: enhance folder management and data synchronization + tos, privacy notice * feat: keyboard shortcuts improvement * chore: add openai-gpt-oss-120b * chore: merge errors * feat: Add GPT-OSS and DeepSeek V3.1 (#7) * fix: fix OpenAI provider with reasoning content injection in streaming responses * fix: update fallback model ID and add deploy URL for Akash template * fix: model availability logic * fix: filter available models for display and set revalidation period * fix: enhance model availability checks and set revalidation period for static pages * fix: simplify model availability checks by removing proxy and chatapi conditions * fix: implement model ID mapping for API calls and enhance model availability checks * fix: uncomment cached models retrieval in getAvailableModels function * refactor: improve readability of model availability checks in getAvailableModels function * feat: add DeepSeek V3.1 model with enhanced capabilities and update fallback model ID * chore: comment out deploy URL for DeepSeek V3.1 * chore: cookies * feat: Add Hermes 4 405B * fix: uncomment deploy URL for Hermes 4 in model configuration * chore: ua2 + account creation * feat: models from db * fix: Fix DeepSeek V3.1 thinking and add models (#8) * fix: DeepSeek V3.1 missing chat_template args * feat: Add Qwen3 Next 80B A3B model * refactor: Update Qwen3 Next 80B A3B model details * chore: comment out deploy URL for Qwen3 Next 80B A3B model * chore: Improve private mode chat handling and toggling Adds handlePrivateModeToggle to ChatContext for centralized private mode toggling. Ensures private chats are only cleaned up when private mode is enabled and converts private chats to regular chats when disabling private mode with existing messages. Updates components to use the new toggle handler and prevents toggling private mode if messages exist and not in private mode. chore: custom auth0 domain support chore: Add local storage cleanup utility Introduced lib/local-storage-manager.ts to monitor and clean up localStorage when usage exceeds a 4MB threshold, prioritizing removal of private chats and pruning old chat messages. Integrated the cleanup check into ChatProvider initialization to help prevent storage quota issues and improve app reliability. * chore: Integrate storage error handling and improve local storage management * chore: lint * chore: rate limits from db + akashchat migration * chore: new akash handle * feat: migrate to akashml * chore: use static token first * chore: improve decryption * chore: use modelid from db * chore: update dependencies * chore: bump next * feat: dev user + db access through functions * chore: make envs optional (#10) for standalone usage with akashml api key * fix: gh action * chore: gh workflow remove version bump step * chore: update package dependencies and versions * fix: handle no available WebSocket endpoints gracefully --------- Co-authored-by: nick134 <nick134-bit@proton.me>

Kumario1 added 6 commits March 9, 2025 00:38

added upload button its functionality.

67ec8a1

created OCR API endpoint.

0be7490

Finished OCR implementation.

6a5d7dc

Removed context from user input. Now all context from image is shown …

a8e65a6

…as "Text Extracted from Image"

Remove OCR now using Ollama Vision model

5d1a376

removed old OCR function.

f7af8cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft#10

Draft#10
Kumario1 wants to merge 6 commits intoakash-network:mainfrom
Kumario1:prince

Kumario1 commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kumario1 commented Mar 9, 2025

Added file attachment support for images. (PDF and more to come later)

Vision Model Integration for AkashChat using Ollama's LLaVA

How It Works

Components

New Dependencies

Configuration

Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant