Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 22 additions & 10 deletions docs/solution/analytics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,26 @@
CrateDB provides real-time analytics on raw data stored for the long term.
:::

In all domains of real-time analytics where you absolutely must have access to all
the records, and can't live with any down-sampled variants, because records are
unique, and need to be accounted for within your analytics queries.

If you find yourself in such a situation, you need a storage system which
manages all the high-volume data in its hot zone, to be available right on
your fingertips, for live querying. Batch jobs to roll up raw data into
analytical results are not an option, because users' queries are too
individual, so you need to run them on real data in real time.
CrateDB eliminates the trade-off between data accessibility and storage costs
by keeping all high-volume raw data in the hot zone without requiring
downsampling or aggregation. Unlike traditional systems that force you to
choose between real-time query capabilities and long-term retention,
CrateDB handles billions of records while maintaining fast query
performance on the full dataset.

Traditional analytics pipelines rely on pre-aggregated rollups or batch
processing to handle query loads, limiting users to predefined metrics
and losing the granularity needed for ad-hoc analysis. CrateDB's
distributed architecture scales horizontally to support
exploratory queries on complete raw datasets in near real time, enabling
analysts to discover insights that would be invisible in downsampled data.

By keeping all records immediately available for querying, you avoid the
complexity of maintaining separate hot and cold storage tiers, ETL
pipelines for aggregation, or data movement processes. Your analytics
queries run directly on raw data across any time range, delivering the
accuracy and flexibility that business intelligence and data science
teams require.

With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
Other than integrating well with commodity systems using standard database
Expand All @@ -31,8 +42,9 @@ on top.
:columns: 12 6 3 3

- {ref}`timeseries`
- {ref}`machine-learning`
- {ref}`longterm`
- {ref}`industrial`
- {ref}`machine-learning`
+++
Related topics in the same area.
::::
Expand Down
7 changes: 6 additions & 1 deletion docs/solution/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@

# Solutions and use cases

:::{div} sd-text-muted
CrateDB is a distributed and scalable SQL database for storing and analyzing
massive amounts of data in near real-time, even with complex queries. It is
PostgreSQL-compatible, and based on Lucene.
:::

:::{toctree}
:hidden:
time-series/index
Expand All @@ -12,7 +18,6 @@ analytics/index
machine-learning/index
:::


## Explanations

:::{div} sd-text-muted
Expand Down
5 changes: 5 additions & 0 deletions docs/solution/industrial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,11 @@ production systems in manufacturing, shipping, fulfillment, and logistics.

:::::

:Related:
{ref}`analytics` •
{ref}`longterm-store` •
{ref}`machine-learning`

:Tags:
{tags-primary}`Data Historian`
{tags-primary}`Industrial IoT`
Expand Down
1 change: 1 addition & 0 deletions docs/solution/longterm/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(longterm)=
(longterm-store)=
(timeseries-longterm)=
(timeseries-long-term-storage)=
Expand Down
25 changes: 25 additions & 0 deletions docs/solution/machine-learning/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,31 @@ CrateDB provides a vector type natively, and adapters for integrating
with machine learning frameworks.
:::

Modern AI and machine learning applications demand efficient storage and
retrieval of high-dimensional vectors, seamless integration with ML frameworks,
and the ability to combine traditional analytics with semantic search capabilities.
From retrieval-augmented generation (RAG) systems to predictive maintenance models,
organizations need a unified platform that handles vector embeddings, training datasets,
and production model artifacts without juggling multiple specialized systems.

CrateDB unifies vector search, time series analysis, and ML operations in a single
platform. Store and query high-dimensional embeddings using native FLOAT_VECTOR support
with HNSW-based similarity search, integrate directly with LangChain and LlamaIndex for
AI applications, and leverage MLflow and PyCaret for end-to-end MLOps workflows. Whether
you're building semantic search engines, training forecasting models on large time series
datasets, or implementing hybrid search combining full-text and vector similarity, CrateDB
eliminates data movement and infrastructure complexity.

By keeping vector embeddings, training data, and model metadata in one queryable system,
you avoid the overhead of synchronizing between specialized vector databases, data lakes,
and model registries. Your ML pipelines remain agile, your queries span structured and
vector data seamlessly, and your infrastructure stays lean.

With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
Other than integrating well with commodity systems using standard database
access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface
on top.

## Vector store

:::{div}
Expand Down
27 changes: 24 additions & 3 deletions docs/solution/time-series/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,30 @@
Use CrateDB to store and query massive amounts of time series data.
:::

CrateDB is a distributed and scalable SQL database for storing and analyzing
massive amounts of data in near real-time, even with complex queries. It is
PostgreSQL-compatible, and based on Lucene.
Time series data represents one of the fastest-growing data types across industries,
from IoT sensors and industrial equipment to application metrics and financial transactions.
The challenge lies not just in handling the sheer volume of incoming data points, but in
maintaining query performance across both real-time streams and historical datasets while
managing storage costs effectively.

Traditional databases struggle with the unique characteristics of time series workloads:
high write throughput, time-based queries spanning variable ranges, the need for downsampling
and aggregation, and retention policies that balance storage with analytical requirements.
Many organizations find themselves cobbling together multiple systems—one for ingestion,
another for querying, and yet another for long-term storage—creating operational complexity
and data silos.

CrateDB handles time series data natively through its distributed architecture, combining
high-speed ingestion with powerful SQL analytics across any time range. Its partitioning
capabilities enable efficient data lifecycle management, while built-in functions for
downsampling, interpolation, and time-window operations simplify complex analytical tasks.
You can query billions of data points in seconds, whether analyzing recent trends or exploring
patterns across years of historical data.

With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
Other than integrating well with commodity systems using standard database
access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface
on top.

::::{grid} 1 2 2 2
:margin: 4 4 0 0
Expand Down