Skip to content

tneto/neuro-cond

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neuro-Cond

A containerized application designed to run small and efficient Large Language Models (LLMs) for code generation on GPUs with limited VRAM (4GB or less).

Overview

Neuro-Cond leverages multiple optimization techniques to ensure fast inference while supporting CPU-GPU offloading for larger models. The project is designed to make code-specialized LLMs accessible on consumer-grade hardware.

Features

  • Run code generation models on GPUs with as little as 4GB VRAM
  • Support for multiple optimization techniques (GGUF, GPTQ, BitsAndBytes)
  • CPU-GPU offloading for larger models
  • FastAPI backend with efficient batch processing
  • React frontend with Tailwind CSS

Supported Models

  • deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct (2.4B parameters)
  • deepseek-ai/DeepSeek-Coder-V2-Lite-Base (2.4B parameters)
  • google/codegemma-7b-GGUF (7B parameters)
  • meta-llama/CodeLlama-7b-hf (7B parameters)
  • Qwen/Qwen2.5-Coder-3B (3B parameters)

System Requirements

  • Host Operating System: Windows (via WSL 2) or Linux (Ubuntu 20.04+)
  • Docker Runtime: Docker with NVIDIA Container Toolkit enabled
  • GPU Support: NVIDIA GPU with CUDA (Compute Capability 7.0+ recommended)

Getting Started

Running with Docker

# Build the Docker image
docker build -t neuro-cond .

# Run the container with GPU support
docker run --gpus all -p 8000:8000 neuro-cond

Access the Application

Open a web browser and navigate to http://localhost:8000 to access the Neuro-Cond user interface.

License

See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors