diff --git a/docs/solution/analytics/index.md b/docs/solution/analytics/index.md index 2519aff8..743a37f1 100644 --- a/docs/solution/analytics/index.md +++ b/docs/solution/analytics/index.md @@ -5,15 +5,26 @@ CrateDB provides real-time analytics on raw data stored for the long term. ::: -In all domains of real-time analytics where you absolutely must have access to all -the records, and can't live with any down-sampled variants, because records are -unique, and need to be accounted for within your analytics queries. - -If you find yourself in such a situation, you need a storage system which -manages all the high-volume data in its hot zone, to be available right on -your fingertips, for live querying. Batch jobs to roll up raw data into -analytical results are not an option, because users' queries are too -individual, so you need to run them on real data in real time. +CrateDB eliminates the trade-off between data accessibility and storage costs +by keeping all high-volume raw data in the hot zone without requiring +downsampling or aggregation. Unlike traditional systems that force you to +choose between real-time query capabilities and long-term retention, +CrateDB handles billions of records while maintaining fast query +performance on the full dataset. + +Traditional analytics pipelines rely on pre-aggregated rollups or batch +processing to handle query loads, limiting users to predefined metrics +and losing the granularity needed for ad-hoc analysis. CrateDB's +distributed architecture scales horizontally to support +exploratory queries on complete raw datasets in near real time, enabling +analysts to discover insights that would be invisible in downsampled data. + +By keeping all records immediately available for querying, you avoid the +complexity of maintaining separate hot and cold storage tiers, ETL +pipelines for aggregation, or data movement processes. Your analytics +queries run directly on raw data across any time range, delivering the +accuracy and flexibility that business intelligence and data science +teams require. With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. Other than integrating well with commodity systems using standard database @@ -31,8 +42,9 @@ on top. :columns: 12 6 3 3 - {ref}`timeseries` -- {ref}`machine-learning` +- {ref}`longterm` - {ref}`industrial` +- {ref}`machine-learning` +++ Related topics in the same area. :::: diff --git a/docs/solution/index.md b/docs/solution/index.md index b2a823cd..5b5472de 100644 --- a/docs/solution/index.md +++ b/docs/solution/index.md @@ -3,6 +3,12 @@ # Solutions and use cases +:::{div} sd-text-muted +CrateDB is a distributed and scalable SQL database for storing and analyzing +massive amounts of data in near real-time, even with complex queries. It is +PostgreSQL-compatible, and based on Lucene. +::: + :::{toctree} :hidden: time-series/index @@ -12,7 +18,6 @@ analytics/index machine-learning/index ::: - ## Explanations :::{div} sd-text-muted diff --git a/docs/solution/industrial/index.md b/docs/solution/industrial/index.md index 865eced0..229d5b4a 100644 --- a/docs/solution/industrial/index.md +++ b/docs/solution/industrial/index.md @@ -80,6 +80,11 @@ production systems in manufacturing, shipping, fulfillment, and logistics. ::::: +:Related: + {ref}`analytics` • + {ref}`longterm-store` • + {ref}`machine-learning` + :Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` diff --git a/docs/solution/longterm/index.md b/docs/solution/longterm/index.md index 9c971075..461ec904 100644 --- a/docs/solution/longterm/index.md +++ b/docs/solution/longterm/index.md @@ -1,3 +1,4 @@ +(longterm)= (longterm-store)= (timeseries-longterm)= (timeseries-long-term-storage)= diff --git a/docs/solution/machine-learning/index.md b/docs/solution/machine-learning/index.md index 936521e7..259d83ce 100644 --- a/docs/solution/machine-learning/index.md +++ b/docs/solution/machine-learning/index.md @@ -11,6 +11,31 @@ CrateDB provides a vector type natively, and adapters for integrating with machine learning frameworks. ::: +Modern AI and machine learning applications demand efficient storage and +retrieval of high-dimensional vectors, seamless integration with ML frameworks, +and the ability to combine traditional analytics with semantic search capabilities. +From retrieval-augmented generation (RAG) systems to predictive maintenance models, +organizations need a unified platform that handles vector embeddings, training datasets, +and production model artifacts without juggling multiple specialized systems. + +CrateDB unifies vector search, time series analysis, and ML operations in a single +platform. Store and query high-dimensional embeddings using native FLOAT_VECTOR support +with HNSW-based similarity search, integrate directly with LangChain and LlamaIndex for +AI applications, and leverage MLflow and PyCaret for end-to-end MLOps workflows. Whether +you're building semantic search engines, training forecasting models on large time series +datasets, or implementing hybrid search combining full-text and vector similarity, CrateDB +eliminates data movement and infrastructure complexity. + +By keeping vector embeddings, training data, and model metadata in one queryable system, +you avoid the overhead of synchronizing between specialized vector databases, data lakes, +and model registries. Your ML pipelines remain agile, your queries span structured and +vector data seamlessly, and your infrastructure stays lean. + +With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. +Other than integrating well with commodity systems using standard database +access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface +on top. + ## Vector store :::{div} diff --git a/docs/solution/time-series/index.md b/docs/solution/time-series/index.md index 8b387e9d..1a2ee911 100644 --- a/docs/solution/time-series/index.md +++ b/docs/solution/time-series/index.md @@ -5,9 +5,30 @@ Use CrateDB to store and query massive amounts of time series data. ::: -CrateDB is a distributed and scalable SQL database for storing and analyzing -massive amounts of data in near real-time, even with complex queries. It is -PostgreSQL-compatible, and based on Lucene. +Time series data represents one of the fastest-growing data types across industries, +from IoT sensors and industrial equipment to application metrics and financial transactions. +The challenge lies not just in handling the sheer volume of incoming data points, but in +maintaining query performance across both real-time streams and historical datasets while +managing storage costs effectively. + +Traditional databases struggle with the unique characteristics of time series workloads: +high write throughput, time-based queries spanning variable ranges, the need for downsampling +and aggregation, and retention policies that balance storage with analytical requirements. +Many organizations find themselves cobbling together multiple systems—one for ingestion, +another for querying, and yet another for long-term storage—creating operational complexity +and data silos. + +CrateDB handles time series data natively through its distributed architecture, combining +high-speed ingestion with powerful SQL analytics across any time range. Its partitioning +capabilities enable efficient data lifecycle management, while built-in functions for +downsampling, interpolation, and time-window operations simplify complex analytical tasks. +You can query billions of data points in seconds, whether analyzing recent trends or exploring +patterns across years of historical data. + +With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. +Other than integrating well with commodity systems using standard database +access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface +on top. ::::{grid} 1 2 2 2 :margin: 4 4 0 0