Enchancements to Embeddings: latency optimized/ debugging local model

A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.

For batch-size one:
- query mode
- debugging/testing
- weird deployments to environments with segregated networks / places where you cannot provide your ACCESS_TOKEN  , it would be interesting to e.g. run a local Bert locally

I would suggest to add a `base` / not fine-tuned encoder model (`bge-large`) with a SentenceTransformers like setup (`ONNX`-cpu or `CTranslate2`-cpu, which do not require `torch`).  Users could then switch between local mode and API mode.

`pip install gradientai[local-embedder]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enchancements to Embeddings: latency optimized/ debugging local model #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enchancements to Embeddings: latency optimized/ debugging local model #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions