From 859ff3ba0915a0c6ab2acd845578c8c44d161fb0 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Sat, 25 Oct 2025 00:51:34 +0200 Subject: [PATCH 1/4] Solutions: Add explanation text `todo` placeholder items --- docs/solution/analytics/index.md | 9 +++++++++ docs/solution/machine-learning/index.md | 9 +++++++++ docs/solution/time-series/index.md | 9 +++++++++ 3 files changed, 27 insertions(+) diff --git a/docs/solution/analytics/index.md b/docs/solution/analytics/index.md index 2519aff8..44a18d62 100644 --- a/docs/solution/analytics/index.md +++ b/docs/solution/analytics/index.md @@ -15,6 +15,15 @@ your fingertips, for live querying. Batch jobs to roll up raw data into analytical results are not an option, because users' queries are too individual, so you need to run them on real data in real time. +:::{todo} +**Instructions:** +Elaborate a bit longer about the topic domain and the ingredients of this section +in an abstract way, concisely highlighting and summarizing relevant benefits, +like the `../analytics/index`, `../industrial/index`, and `../longterm/index` +pages are doing it already. +Use concise language, active voice, and avoid yapping. +::: + With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. Other than integrating well with commodity systems using standard database access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface diff --git a/docs/solution/machine-learning/index.md b/docs/solution/machine-learning/index.md index 936521e7..25d45cdf 100644 --- a/docs/solution/machine-learning/index.md +++ b/docs/solution/machine-learning/index.md @@ -11,6 +11,15 @@ CrateDB provides a vector type natively, and adapters for integrating with machine learning frameworks. ::: +:::{todo} +**Instructions:** +Elaborate a bit longer about the topic domain and the ingredients of this section +in an abstract way, concisely highlighting and summarizing relevant benefits, +like the `../analytics/index`, `../industrial/index`, and `../longterm/index` +pages are doing it already. +Use concise language, active voice, and avoid yapping. +::: + ## Vector store :::{div} diff --git a/docs/solution/time-series/index.md b/docs/solution/time-series/index.md index 8b387e9d..a5212965 100644 --- a/docs/solution/time-series/index.md +++ b/docs/solution/time-series/index.md @@ -9,6 +9,15 @@ CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene. +:::{todo} +**Instructions:** +Elaborate a bit longer about the topic domain and the ingredients of this section +in an abstract way, concisely highlighting and summarizing relevant benefits, +like the `../analytics/index`, `../industrial/index`, and `../longterm/index` +pages are doing it already. +Use concise language, active voice, and avoid yapping. +::: + ::::{grid} 1 2 2 2 :margin: 4 4 0 0 :padding: 0 From 71a63054a6666c343720ada48706f05c5d46bd41 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Sat, 25 Oct 2025 04:52:00 +0200 Subject: [PATCH 2/4] Solutions: Add explanation texts, by CodeRabbit Teaser texts have been missing on the "time series", "analytics", and "machine learning" sections. --- docs/solution/analytics/index.md | 38 +++++++++++++------------ docs/solution/index.md | 7 ++++- docs/solution/machine-learning/index.md | 32 +++++++++++++++------ docs/solution/time-series/index.md | 36 +++++++++++++++-------- 4 files changed, 74 insertions(+), 39 deletions(-) diff --git a/docs/solution/analytics/index.md b/docs/solution/analytics/index.md index 44a18d62..52557b51 100644 --- a/docs/solution/analytics/index.md +++ b/docs/solution/analytics/index.md @@ -5,24 +5,26 @@ CrateDB provides real-time analytics on raw data stored for the long term. ::: -In all domains of real-time analytics where you absolutely must have access to all -the records, and can't live with any down-sampled variants, because records are -unique, and need to be accounted for within your analytics queries. - -If you find yourself in such a situation, you need a storage system which -manages all the high-volume data in its hot zone, to be available right on -your fingertips, for live querying. Batch jobs to roll up raw data into -analytical results are not an option, because users' queries are too -individual, so you need to run them on real data in real time. - -:::{todo} -**Instructions:** -Elaborate a bit longer about the topic domain and the ingredients of this section -in an abstract way, concisely highlighting and summarizing relevant benefits, -like the `../analytics/index`, `../industrial/index`, and `../longterm/index` -pages are doing it already. -Use concise language, active voice, and avoid yapping. -::: +CrateDB eliminates the trade-off between data accessibility and storage costs +by keeping all high-volume raw data in the hot zone without requiring +downsampling or aggregation. Unlike traditional systems that force you to +choose between real-time query capabilities and long-term retention, +CrateDB handles billions of unique records while maintaining fast query +performance on the full dataset. + +Traditional analytics pipelines rely on pre-aggregated rollups or batch +processing to handle query loads, limiting users to predefined metrics +and losing the granularity needed for ad-hoc analysis. CrateDB's +distributed architecture scales horizontally to support individual, +exploratory queries on complete raw datasets in real time, enabling +analysts to discover insights that would be invisible in downsampled data. + +By keeping all records immediately available for querying, you avoid the +complexity of maintaining separate hot and cold storage tiers, ETL +pipelines for aggregation, or data movement processes. Your analytics +queries run directly on raw data across any time range, delivering the +accuracy and flexibility that business intelligence and data science +teams require. With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. Other than integrating well with commodity systems using standard database diff --git a/docs/solution/index.md b/docs/solution/index.md index b2a823cd..5b5472de 100644 --- a/docs/solution/index.md +++ b/docs/solution/index.md @@ -3,6 +3,12 @@ # Solutions and use cases +:::{div} sd-text-muted +CrateDB is a distributed and scalable SQL database for storing and analyzing +massive amounts of data in near real-time, even with complex queries. It is +PostgreSQL-compatible, and based on Lucene. +::: + :::{toctree} :hidden: time-series/index @@ -12,7 +18,6 @@ analytics/index machine-learning/index ::: - ## Explanations :::{div} sd-text-muted diff --git a/docs/solution/machine-learning/index.md b/docs/solution/machine-learning/index.md index 25d45cdf..259d83ce 100644 --- a/docs/solution/machine-learning/index.md +++ b/docs/solution/machine-learning/index.md @@ -11,14 +11,30 @@ CrateDB provides a vector type natively, and adapters for integrating with machine learning frameworks. ::: -:::{todo} -**Instructions:** -Elaborate a bit longer about the topic domain and the ingredients of this section -in an abstract way, concisely highlighting and summarizing relevant benefits, -like the `../analytics/index`, `../industrial/index`, and `../longterm/index` -pages are doing it already. -Use concise language, active voice, and avoid yapping. -::: +Modern AI and machine learning applications demand efficient storage and +retrieval of high-dimensional vectors, seamless integration with ML frameworks, +and the ability to combine traditional analytics with semantic search capabilities. +From retrieval-augmented generation (RAG) systems to predictive maintenance models, +organizations need a unified platform that handles vector embeddings, training datasets, +and production model artifacts without juggling multiple specialized systems. + +CrateDB unifies vector search, time series analysis, and ML operations in a single +platform. Store and query high-dimensional embeddings using native FLOAT_VECTOR support +with HNSW-based similarity search, integrate directly with LangChain and LlamaIndex for +AI applications, and leverage MLflow and PyCaret for end-to-end MLOps workflows. Whether +you're building semantic search engines, training forecasting models on large time series +datasets, or implementing hybrid search combining full-text and vector similarity, CrateDB +eliminates data movement and infrastructure complexity. + +By keeping vector embeddings, training data, and model metadata in one queryable system, +you avoid the overhead of synchronizing between specialized vector databases, data lakes, +and model registries. Your ML pipelines remain agile, your queries span structured and +vector data seamlessly, and your infrastructure stays lean. + +With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. +Other than integrating well with commodity systems using standard database +access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface +on top. ## Vector store diff --git a/docs/solution/time-series/index.md b/docs/solution/time-series/index.md index a5212965..1a2ee911 100644 --- a/docs/solution/time-series/index.md +++ b/docs/solution/time-series/index.md @@ -5,18 +5,30 @@ Use CrateDB to store and query massive amounts of time series data. ::: -CrateDB is a distributed and scalable SQL database for storing and analyzing -massive amounts of data in near real-time, even with complex queries. It is -PostgreSQL-compatible, and based on Lucene. - -:::{todo} -**Instructions:** -Elaborate a bit longer about the topic domain and the ingredients of this section -in an abstract way, concisely highlighting and summarizing relevant benefits, -like the `../analytics/index`, `../industrial/index`, and `../longterm/index` -pages are doing it already. -Use concise language, active voice, and avoid yapping. -::: +Time series data represents one of the fastest-growing data types across industries, +from IoT sensors and industrial equipment to application metrics and financial transactions. +The challenge lies not just in handling the sheer volume of incoming data points, but in +maintaining query performance across both real-time streams and historical datasets while +managing storage costs effectively. + +Traditional databases struggle with the unique characteristics of time series workloads: +high write throughput, time-based queries spanning variable ranges, the need for downsampling +and aggregation, and retention policies that balance storage with analytical requirements. +Many organizations find themselves cobbling together multiple systems—one for ingestion, +another for querying, and yet another for long-term storage—creating operational complexity +and data silos. + +CrateDB handles time series data natively through its distributed architecture, combining +high-speed ingestion with powerful SQL analytics across any time range. Its partitioning +capabilities enable efficient data lifecycle management, while built-in functions for +downsampling, interpolation, and time-window operations simplify complex analytical tasks. +You can query billions of data points in seconds, whether analyzing recent trends or exploring +patterns across years of historical data. + +With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. +Other than integrating well with commodity systems using standard database +access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface +on top. ::::{grid} 1 2 2 2 :margin: 4 4 0 0 From 85dfd6df69e4daab4d7027d3b04ab22a4e5540a6 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Sat, 25 Oct 2025 04:52:19 +0200 Subject: [PATCH 3/4] Solutions: Adjust related links --- docs/solution/analytics/index.md | 3 ++- docs/solution/industrial/index.md | 5 +++++ docs/solution/longterm/index.md | 1 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/solution/analytics/index.md b/docs/solution/analytics/index.md index 52557b51..51ddf07b 100644 --- a/docs/solution/analytics/index.md +++ b/docs/solution/analytics/index.md @@ -42,8 +42,9 @@ on top. :columns: 12 6 3 3 - {ref}`timeseries` -- {ref}`machine-learning` +- {ref}`longterm` - {ref}`industrial` +- {ref}`machine-learning` +++ Related topics in the same area. :::: diff --git a/docs/solution/industrial/index.md b/docs/solution/industrial/index.md index 865eced0..229d5b4a 100644 --- a/docs/solution/industrial/index.md +++ b/docs/solution/industrial/index.md @@ -80,6 +80,11 @@ production systems in manufacturing, shipping, fulfillment, and logistics. ::::: +:Related: + {ref}`analytics` • + {ref}`longterm-store` • + {ref}`machine-learning` + :Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` diff --git a/docs/solution/longterm/index.md b/docs/solution/longterm/index.md index 9c971075..461ec904 100644 --- a/docs/solution/longterm/index.md +++ b/docs/solution/longterm/index.md @@ -1,3 +1,4 @@ +(longterm)= (longterm-store)= (timeseries-longterm)= (timeseries-long-term-storage)= From b9037e79739896155c7226997cbadae480e6b9b8 Mon Sep 17 00:00:00 2001 From: Marios Trivyzas <5058131+matriv@users.noreply.github.com> Date: Mon, 27 Oct 2025 10:57:50 +0100 Subject: [PATCH 4/4] Solutions: Improve wording --- docs/solution/analytics/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/solution/analytics/index.md b/docs/solution/analytics/index.md index 51ddf07b..743a37f1 100644 --- a/docs/solution/analytics/index.md +++ b/docs/solution/analytics/index.md @@ -9,14 +9,14 @@ CrateDB eliminates the trade-off between data accessibility and storage costs by keeping all high-volume raw data in the hot zone without requiring downsampling or aggregation. Unlike traditional systems that force you to choose between real-time query capabilities and long-term retention, -CrateDB handles billions of unique records while maintaining fast query +CrateDB handles billions of records while maintaining fast query performance on the full dataset. Traditional analytics pipelines rely on pre-aggregated rollups or batch processing to handle query loads, limiting users to predefined metrics and losing the granularity needed for ad-hoc analysis. CrateDB's -distributed architecture scales horizontally to support individual, -exploratory queries on complete raw datasets in real time, enabling +distributed architecture scales horizontally to support +exploratory queries on complete raw datasets in near real time, enabling analysts to discover insights that would be invisible in downsampled data. By keeping all records immediately available for querying, you avoid the