Dimension DB

Hybrid time-series and block-column storage database engine written in Java

Program Description

General Information

Dimension DB is a hybrid columnar database for storing time series data. Written in the Java programming language.

Key Features:

Columnar data storage format;
Storage of time series data;
Storage format with built-in support for data deduplication and compression;
Setting the storage format at the table and block level;
Automatic selection of the deduplication algorithm based on the collected data;
Use of the Berkley DB database as a backend for local storage;
Support for basic JDBC data types for PostgreSQL, Oracle, MS SQL and ClickHouse databases;
In read-only mode, automatic generation of SQL queries to tables with time series data;

Return to Contents

Application areas

The library can be used as a lightweight analytical database for backend data processing systems on the Java platform to solve the following application problems;

storing collected data and analytical processing of small and medium amounts of data for various embedded Java applications;
storing and processing Internet of Things (IoT) data on the devices themselves as temporary storage, cache, and other tasks related to local data processing;
metadata storage for querying external data sources using automatic SQL generation based on the data model for analytical applications.

Return to Contents

Minimum technical requirements

Dimension DB is compatible with Java 25+ and comes with a small set of dependencies.

Hardware requirements

Table 1. Hardware requirements

Parameter	Description
CPU and RAM	Processor with a frequency of at least 500 MHz and a memory capacity of at least 250 MB, depending on the volume of processed data
CPU architecture	Intel 64-bit (x86_64), AMD 64-bit (x86_64), Arm 64-bit (aarch64)
Disk	Disk size depending on the volume of processed data

Software requirements

Table 2. Software requirements

Software	Requirements
Java	Java version 25+
Maven	Not lower than version 3
Git	Latest current version

Operating system requirements

Table 3. Operating system requirements

Software	Requirements
Operating system	Windows, Linux, MacOS

Return to Contents

Getting started with the project

Building the project

Make sure you have JDK 25+, Maven and the latest Git

  java -version
  mvn -version
  git --version

Download the Dimension DB repository sources:

  git clone <<url>>
  cd Dimension DB

To compile the project sources, run:

  mvn clean compile

To run unit tests, run:

  mvn clean test

To build the project into a Dimension DB jar file, run:

  mvn clean package

To build and install the Dimension DB jar file into your local Maven repository, run:

  mvn clean install

Notes:

Building the project may include running unit tests from Dimension DBRunnerTest and DBaseRunnerUseCasesTest, which take approximately 25 minutes to complete.
The majority of this time is consumed by the DimensionDBRunnerTest class, which runs 45 tests with various configurations.
Both Dimension DBRunnerTest and DBaseRunnerUseCasesTest are disabled by default using the @Disabled annotation.
The DimensionDBRunnerTest class contains tests with different permutation levels, as detailed below:

Table 4. Parameters and statistics for Dimension DBRunnerTest tests

Test name	PermutationState	Number of configurations	Total tests	Execution time
testPermutationNone	NONE	108	4 860	~ 9 seconds
testPermutationPartial	PARTIAL	1 944	87 480	~ 2 minutes
testPermutationAll	ALL	26 244	1 180 980	~ 25 minutes

To reduce build time by skipping these tests, use the following Maven commands:

# Skip DimensionDBRunnerTest during testing phase
mvn clean test -Dtest=!DimensionDBRunnerTest

# Skip during packaging
mvn clean package -Dtest=!DimensionDBRunnerTest

# Skip during installation
mvn clean install -Dtest=!DimensionDBRunnerTest

Return to Contents

Using the project

To use the Dimension DB database in a Java application, you need to add the jar file as a dependency to the Maven pom.xml settings file

Add the Dimension DB jar file as a dependency to your project's pom.xml settings file:

<?xml version="1.0" encoding="UTF-8"?>
<project>
  <properties>
    <dimension-db-version>25.7.1</dimension-db-version>
  </properties>
  <!--.....-->
  <!--.....-->
  <!--.....-->
  <dependencies>
    <dependency>
      <groupId>ru.dimension</groupId>
      <artifactId>db</artifactId>
      <version>{Dimension DB-version}</version>
    </dependency>
  </dependencies>
</project>

Notes:

To build and install the latest current version of the Dimension DB library in the local Maven repository, download the application source codes and run:

mvn clean install

Details on building and installing the project are described in the Building the project section

You can find a full list of Dimension DB use cases in your application in the Code use cases section.

Return to Contents

Prepare the environment to run unit tests

To run unit tests, you need to follow the steps to build the project from the Building the project section

Return to Contents

Prepare the environment to run integration tests

Preparing the environment to run integration tests requires a local database installation using Docker Desktop

Instructions for installing Docker Desktop for Windows and Linux

Return to Contents

PostgreSQL

Install and run a container with PostgreSQL DB

docker run -d --name db \
-e 'POSTGRES_PASSWORD=postgres' \
-p 5432:5432 \
-v pgdata:/var/lib/postgresql/data \
postgres \
postgres -c 'max_connections=200'

Return to Contents

Oracle

Install locally repository of scripts for installing Oracle DB
Download the DB distribution from the Oracle website. For the Linux x86 platform and DB version 19.3, copy the LINUX.X64_193000_db_home.zip file to the local directory of the OracleDatabase/SingleInstance/dockerfiles/19.3.0 script repository
Run the buildContainerImage.sh script from the script repository and specify the parameters (example for DB version 19.3.0)
```
OracleDatabase/SingleInstance/dockerfiles/buildContainerImage.sh -v 19.3.0 -e -i
```

Install and run a container with an Oracle DB instance from the command line (example of running the Docker run command for DB version 19.3.0)

docker run --name database19ee \
-p 1523:1521 -p 5522:5500 -e \
ORACLE_SID=orcl -e ORACLE_PDB=pdb_orcl -e ORACLE_PWD=sys \
-e ORACLE_CHARACTERSET=AL32UTF8 \
-v /opt/oracle/oradata oracle/database:19.3.0-ee

Return to Contents

MS SQL

Install and run container with MS SQL DB

docker run -e 'ACCEPT_EULA=Y' \
-e 'SA_PASSWORD=QOEfSsa51234!' \
-p 1433:1433 \
-v |(pwd)/drive:/var/opt/mssql/data \
--name mssql \
mcr.microsoft.com/mssql/server

Return to Contents

Clickhouse

Install and run container with Clickhouse DB

docker run -d --name clickhouse-server \
-e CLICKHOUSE_USER=admin -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 -e CLICKHOUSE_PASSWORD=admin \
-p 8123:8123 -p 9000:9000 \
-v C:\clickhouse:/var/lib/clickhouse clickhouse/clickhouse-server

Return to Contents

Loading test data into Clickhouse

Install Clickhouse locally and check the connection to ClickHouse on a local PC using the URL "jdbc:clickhouse://localhost:8123" or use another connection string and make changes to the tests;
Load test data New York taxi orders into a local ClickHouse instance (the format of pre-prepared partitions loaded into the datasets schema is used);
Check that the data has loaded clickhouse-client --query "select count(*) from datasets.trips_mergetree";
Create a directory for local storage of Dimension DB data (example for Windows) "C:\Users\.temp";
Load test data into the local Dimension DB database using any of the methods in Dimension DBCHLoadDataTest marked with the @Test annotation; – Run integration tests in Dimension DBCHQueryGanttTest, Dimension DBCHQueryStackedTest or Dimension DBCHQueryRawTest which are marked with the @Test annotation.

Notes:

Integration tests use the @Disabled annotation to disable them when running Dimension DB unit tests, if necessary, it should be removed for correct data loading and testing.
Test data from the trips_mergetree table is loaded into the datasets schema
```
CREATE DATABASE datasets;
```
Trips_mergetree table definition

CREATE TABLE trips_mergetree

    CREATE TABLE trips_mergetree (
      trip_id UInt32,
      vendor_id Enum8('1' = 1, '2' = 2, 'CMT' = 3, 'VTS' = 4, 'DDS' = 5, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14),
      pickup_date Date,
      pickup_datetime DateTime,
      dropoff_date Date,
      dropoff_datetime DateTime,
      store_and_fwd_flag UInt8,
      rate_code_id UInt8,
      pickup_longitude Float64,
      pickup_latitude Float64,
      dropoff_longitude Float64,
      dropoff_latitude Float64,
      passenger_count UInt8,
      trip_distance Float64,
      fare_amount Float32,
      extra Float32,
      mta_tax Float32,
      tip_amount Float32,
      tolls_amount Float32,
      ehail_fee Float32,
      improvement_surcharge Float32,
      total_amount Float32,
      payment_type_ Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
      trip_type UInt8,
      pickup FixedString(25),
      dropoff FixedString(25),
      cab_type Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
      pickup_nyct2010_gid UInt8,
      pickup_ctlabel Float32,
      pickup_borocode UInt8,
      pickup_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
      pickup_ct2010 FixedString(6),
      pickup_boroct2010 FixedString(7),
      pickup_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
      pickup_ntacode FixedString(4),
      pickup_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
      pickup_puma UInt16,
      dropoff_nyct2010_gid UInt8,
      dropoff_ctlabel Float32,
      dropoff_borocode UInt8,
      dropoff_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
      dropoff_ct2010 FixedString(6),
      dropoff_boroct2010 FixedString(7),
      dropoff_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
      dropoff_ntacode FixedString(4),
      dropoff_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
      dropoff_puma UInt16
    ) ENGINE = MergeTree() ORDER BY (pickup_date, pickup_datetime);

Return to Contents

Description of the logical structure

DB architecture

Mermaid flow chart

flowchart TD
  Client{Clients}
  Client -->|Write Request| DStore_Write_Interface[Write API]
  Client -->|Read Request| DStore_Read_Interface[Read API]
  Client -->|Metadata Request| MetadataAPI[Metadata API]

  DStore_Write_Interface --> |Direct Insert| LocalBerkleyDB[(Local Berkley DB)]
  DStore_Write_Interface --> |Fetch Data| ExternalDB_Write[(External DBs JDBC)]
  ExternalDB_Write --> LocalBerkleyDB

  DStore_Read_Interface --> |Local Read| LocalBerkleyDB
  DStore_Read_Interface --> |External Read| ExternalDB_Read[(External DBs JDBC)]

  MetadataAPI --> MetadataStore[Metadata file]

Return to Contents

Data storage format

Three data storage formats are supported:

RAW - when data is stored as a key of the Java type int for string data and a suitable Java type for the rest (numeric, special, etc.);
ENUM - when data is stored as a key of the Java type byte value;
HISTOGRAM - when the start and end index of repeating data from the range and the value itself are stored.

Metadata of the storage format, indexing type and compression are stored in the block header.

Return to Contents

Algorithms

The main methods of the Dimension DB Java API are located in the DStore interface. Access to DStore must be obtained after initialization of the Dimension DB by calling the constructor of the Dimension DB class, examples are presented in the DB initialization algorithm section.

A complete list of methods of the main classes and interfaces with input and output parameters is posted in the Used methods section.

DB initialization algorithm

Define a local directory for storing Dimension DB settings;
Create the Dimension DB DB configuration Dimension DBConfig;
Initialize the backend DB;
Initialize the Dimension DB by calling the constructor from the Dimension DB class;
Get a reference to the DStore interface from the Dimension DB class to access the Dimension DB Java API.

Example of DB initialization for Berkley DB backend

@TempDir
static File databaseDir;
...
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());
BerkleyDB berkleyDB = new BerkleyDB(databaseDir.getAbsolutePath(), true);
Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, berkleyDB.getStore());
DStore fStore = Dimension DB.getDStore();

Example of DB initialization for ClickHouse backend

@TempDir
static File databaseDir;
...
BType bType = BType.CLICKHOUSE;
String driverClassName = "com.clickhouse.jdbc.ClickHouseDriver";
String dbUrl = "jdbc:clickhouse://localhost:8123";

SqlDB sqlDB = new SqlDB(bType, driverClassName, dbUrl, "admin", "admin");
BasicDataSource basicDataSource = sqlDB.getDatasource();
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());

Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, bType, basicDataSource);
DStore fStore = Dimension DB.getDStore();

Notes:

The temporary directory databaseDir is created automatically using the @TempDir annotation of the JUnit library;
For the example with the Berkley DB backend, Dimension DB settings and backend data are located in the same temporary directory databaseDir;
Code examples are taken from unit and integration tests.

Return to Contents

Program API

Dimension DB offers advanced capabilities for interacting with the database through a set of methods capable of performing various data storage and retrieval tasks. These APIs are specifically designed to provide flexibility and optimal work with time series data in a Java environment for both locally stored data and external data storage systems with an SQL interface.

Main API. The main functions for working with data are built into the DStore interface, which provides standardized access to write and read methods. DStore is a key element of the Dimension DB architecture, providing an interface point for interacting with the database for client applications.
Writing data. Several approaches are supported for writing information to the Dimension DB database:
- Direct - inserting data into a local database using an intermediate Java table-type data structure.
- JDBC — insert data into a local database using data received from an external data source via JDBC, which allows integration with other systems that can work with JDBC.
- JDBC Batch — batch loading into a local database using data received from an external data source via JDBC, minimizing the number of transactions for large-scale insertion.
Reading data. The process of extracting data from Dimension DB uses APIs, each of which is optimized for the corresponding types of queries:
- Stacked — retrieving data based on aggregation functions such as COUNT SUM and AVG, obtaining information on the distribution of data by selected attributes and parameters.
- Gantt — complex distribution calculation using COUNT aggregation at two levels, which is useful for analyzing dependencies and relationships between different indicators.
- Raw — extracting source data in tabular format to view detailed information on the entire array of collected data.
- BatchResultSet — extracting source data in tabular format with row-by-row access to get only part of the rows from the result set in the DB.
Loading metadata. To work with external data sources, you need to perform queries to sources via JDBC or Java table structures. To store information on tables and data types used for local storage and access to external data sources via JDBC, an API is used to load and view this information in a local Dimension DB metadata storage file.

The APIs for writing data only work with the local Berkley DB key-value data storage. API methods for reading data can work with both local Berkley DB storage and external data sources via JDBC using automatic generation of SQL queries.

Table 5. Supported databases for working with API in Dimension DB

№	Dimension DB API support	Database	Database type
1	Read/Write	Berkley DB	key-value
2	Read	ClickHouse	analytical
3	Read	Oracle	relational
4	Read	PostgreSQL	relational
5	Read	Microsoft SQL Server	relational
6	Read	MySQL	relational

Before you start writing data, you need to set storage parameters and metadata for tables and columns in the SProfile object.

The database table configuration allows you to switch between global and local indexing, and enable or disable data compression on the fly. This is achieved by placing storage type metadata in the block header for both indexing types and a flag to enable or disable compression.

Return to Contents

API Sequence Diagram

Mermaid sequence diagram

sequenceDiagram
  participant Client
  participant DStore_Write as Write API
  participant DStore_Read as Read API
  participant MetadataAPI as Metadata API
  participant LocalDB as Local Berkley DB Storage
  participant ExternalDB as External DBs via JDBC

  Client->>DStore_Write: Write Request
  DStore_Write->>MetadataAPI: Fetch Metadata
  MetadataAPI->>DStore_Write: Return Metadata

  rect rgba(200, 200, 255, 0.1)
    DStore_Write->>LocalDB: Direct Insert
    DStore_Write->>ExternalDB: Fetch Data
    ExternalDB->>DStore_Write: Return Data
    DStore_Write->>LocalDB: Insert from ExternalDB
  end

  Client->>DStore_Read: Read Request
  DStore_Read->>MetadataAPI: Fetch Metadata
  MetadataAPI->>DStore_Read: Return Metadata

  rect rgba(200, 255, 200, 0.1)
    DStore_Read->>LocalDB: Local Read
    DStore_Read->>ExternalDB: Fetch Data
    ExternalDB->>DStore_Read: Return Data
  end

  Client->>MetadataAPI: Read Metadata Request
  MetadataAPI->>Client: Return Metadata

Return to Contents

Used methods

The DStore interface contains the main contracts and method signatures for interaction between the Dimension DB database and client applications;
Initialization of database objects occurs when calling the Dimension DB class constructor;
The reference to the DStore interface for accessing the application API is called from the Dimension DB class object.

DStore interface

Table 6. List of APIs for working with the Dimension DB database of the DStore interface

#	Method Name	API Type	Description	Input Parameters	Output Parameters
1	getTProfile	Metadata	Get table metadata by name	String tableName	TProfile
2	loadDirectTableMetadata	Metadata	Load metadata from SProfile	SProfile sProfile	TProfile
3	loadJdbcTableMetadata	Metadata	Load metadata via JDBC connection with query	Connection connection, String query, SProfile sProfile	TProfile
4	loadJdbcTableMetadata	Metadata	Load metadata via JDBC (schema and table)	Connection connection, String sqlSchemaName, String sqlTableName, SProfile sProfile	TProfile
5	setTimestampColumn	Metadata	Set timestamp column	String tableName, String timestampColumnName	void
6	putDataDirect	Write	Save data using Java structure	String tableName, List<List> data	void
7	putDataJdbc	Write	Save data from JDBC ResultSet	String tableName, ResultSet resultSet	long (last row timestamp)
8	putDataJdbcBatch	Write	Batch save from JDBC ResultSet	String tableName, ResultSet resultSet, Integer batchSize	void
9	getBlockKeyTailList	Read	Get blocks with keys and ranges	String tableName, long begin, long end	List
10	getStacked	Read	Get aggregated data	String tableName, CProfile cProfile, GroupFunction groupFunction, CompositeFilter compositeFilter, long begin, long end	List
11	getGanttCount	Read	Two-level COUNT grouping	String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end	List
12	getGanttCount	Read	Two-level COUNT grouping (multithreaded)	String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, int batchSize, long begin, long end	List
13	getGanttSum	Read	Two-level SUM grouping	String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end	List
14	getDistinct	Read	Get distinct values	String tableName, CProfile cProfile, OrderBy orderBy, CompositeFilter compositeFilter, int limit, long begin, long end	List
15	getRawDataAll	Read	Get raw data (all columns, no filter)	String tableName, long begin, long end	List<List>
16	getRawDataAll	Read	Get raw data (all columns, with filter)	String tableName, CProfile cProfileFilter, String filter, long begin, long end	List<List>
17	getRawDataByColumn	Read	Get raw data for specific column	String tableName, CProfile cProfile, long begin, long end	List<List>
18	getBatchResultSet	Read	Batch read (regular tables)	String tableName, int fetchSize	BatchResultSet
19	getBatchResultSet	Read	Batch read (time-series tables)	String tableName, long begin, long end, int fetchSize	BatchResultSet
20	getFirst	Metadata	Get first timestamp	String tableName, long begin, long end	long
21	getLast	Metadata	Get last timestamp	String tableName, long begin, long end	long
22	syncBackendDb	Backend	Sync Berkley DB to disk	-	void
23	closeBackendDb	Backend	Close Berkley DB	-	void

Return to Contents

Dimension DB class

Table 7. DBase class where DB objects are initialized

Item #	Constructor name	API type	Description	Input parameters	Output parameters
1	DBase	Read/Write	Create Dimension DB storage for Berkley DB backend	Dimension DBConfig Dimension DBConfig - local Dimension DB storage configuration, EntityStore - Berkley DB data storage	Constructor
2	DBase	Read	Create Dimension DB storage for a specific backend type	Dimension DBConfig Dimension DBConfig - Dimension DB local storage configuration, BType backendType - backend type for reading, BasicDataSource basicDataSource - Apache DBCP2 JDBC connection pool object	Constructor

Return to Contents

Parameters

Table type - enum TType

Table 8. Enum TType for storing supported table types

Property name	Type	Description
REGULAR	TType	Table for storing regular tables
TIME_SERIES	TType	Table for storing time series data

Return to Contents

Table indexing type - enum IType

Table 9. Enum IType for storing table indexing type

Property name	Type	Description
GLOBAL	IType	The type of index used (RAW, HISTOGRAM, ENUM), defined by the user at the table level for each column in the SProfile input parameters
LOCAL	IType	The type of index used (RAW, HISTOGRAM, ENUM), determined automatically by the system at the data block level for each table column

Return to Contents

Data analysis method type by table - enum AType

Table 10. Enum AType for storing the data analysis method for local indexing

Имя свойства	Тип	Описание
ON_LOAD	AType	Data is analyzed upon initial retrieval, and the indexing type for each column in the table is determined based on this analysis
FULL_PASS_ONCE	AType	Data is analyzed upon initial retrieval, then on subsequent runs, each column of the table is sequentially analyzed once, and the indexing type is determined based on this information.
FULL_PASS_EACH	AType	Same as FULL_PASS_ONCE, except each column of the table is analyzed every time the data write API is called. Each API call analyzes one column, cycling through them in order.

Return to Contents

Table backend type - enum BType

Table 11. Enum BType for storing supported DB types used as a backend for the API for reading and writing data

Property name	Type	Description
BERKLEYDB	BType	BerkleyDB
CLICKHOUSE	BType	ClickHouse
ORACLE	BType	Oracle
POSTGRES	BType	PostgreSQL
MSSQL	BType	Microsoft SQL Server

Return to Contents

Data grouping function type - enum GroupFunction

Table 12. Enum GroupFunction for passing the grouping function when calling Stacked API

Property name	Type	Description
COUNT	GroupFunction	Calculating the number of occurrences of unique indicators in the data by the cProfile column
SUM	GroupFunction	Calculating the sum in the data by the cProfile column
AVG	GroupFunction	Calculating the average value in the data by the cProfile column

Return to Contents

Table column storage parameters - CSType class

Table 13. CSType class for storing storage parameters and data types of table columns

Property name	Type	Default value	Description
isTimeStamp	boolean	false	Flag indicating the column that stores the timestamp
sType	SType	null	Data storage type in the table (RAW, HISTOGRAM, ENUM)
cType	CType	null	Java data type used to store data in Berkley DB (Int, Long, Double, String ... etc.)
dType	DataType	null	JDBC data type used to convert data to Berkley DB Java type (INTEGER, DATE, DATETIME, VARCHAR ... etc.)

Return to Contents

Data storage type - enum SType

Table 14. SType class for storing the table data storage type used in a column or block

Property name	Type	Default value	Description
RAW	SType	1	Data as is or as a key for string data types
HISTOGRAM	SType	2	Data as a histogram
ENUM	SType	3	Data as a list of constants

Return to Contents

Java data storage type - enum CType

Table 15. CType class for storing the Java data type used to store data in the Berkley DB table column

Property name	Type	Default Value	Description
BYTE	int	1	Represents a byte
INT	int	2	Represents an integer
LONG	int	3	Represents a long integer
FLOAT	int	4	Represents a float
DOUBLE	int	5	Represents a double-precision float
STRING	int	6	Represents a string

Return to Contents

JDBC Data Storage Type - enum DataType

Table 16. DataType class for storing JDBC data types, which is used to convert data to Java storage format in Berkley DB database of Dimension DB database table column

Property name	Type	Default value	Description
BIT	byte	0	Postgres: bit data type
...	...	...	...
RAW	byte	23	Oracle: raw data
...	...	...	...
INT	byte	31	MS SQL: integer
...	...	...	...
DEC	byte	47	ClickHouse: Decimal Numbers

Full list of properties in file

Return to Contents

Input parameters

DB settings - DBConfig class

Table 17. Dimension DBaseConfig class for storing settings

Property name	Type	Default value	Description
configDirectory	String		Absolute path to the directory for storing the configuration file with Dimension DB metadata
configFileName	String	"metamodel.obj"	Name of the configuration file with Dimension DB metadata

Return to Contents

SProfile table settings

Table 18. SProfile class for storing table/column settings in the Dimension DB database (used in input parameters)

Property name	Type	Default value	Description
tableName	String	null	Table name
tableType	TType	TIME_SERIES	Table type
indexType	IType	GLOBAL	Index type
backendType	BType	BERKLEYDB	Backend type for storing data
compression	Boolean	FALSE	Whether to use data compression in the table or not
csTypeMap	Map<String, CSType>	null	Key-value structure mapping column names to CSTypes

Return to Contents

Output Parameters

Table Settings - TProfile Class

Table 19. TProfile Class for Getting Table/Column Settings from Metadata (Used in Output Parameters)

Property Name	Type	Default Value	Description
tableName	String	null	Table Name
tableType	TType	null	Table Type
indexType	IType	null	Index Type
backendType	BType	null	Backend Type
compression	Boolean	false	Whether data compression is enabled
cProfiles	List	null or empty list	List of table column settings

Return to Contents

Table Column Settings - CProfile Class

Table 20. CProfile Class for Storing Table Column Settings

Property Name	Type	Default Value	Description
colId	int	0	Unique Identifier
colIdSql	int	0	Column Identifier in SQL
colName	String	null	Column Name
colDbTypeName	String	null	Data Type Name in DB
colSizeDisplay	int	0	Size for display
colSizeSqlType	int	0	SQL Type Size
csType	CSType	null	The CSType type stores the storage parameters and data types of a column

Return to Contents

Object to get data aggregates from stacked API - StackedColumn class

Timestamps store the date in milliseconds that have passed since January 1, 1970 - Unix-time

Table 21. StackedColumn class for get data aggregates from stacked API

Property name	Type	Default value	Description
key	long	N/A	Object identifier - timestamp of the beginning of the range by which the values of the aggregates in keyCount, keySum, keyAvg are calculated
tail	long	0	End timestamp of the range over which the aggregate values in keyCount, keySum, keyAvg are calculated
keyCount	Map<String, Integer>	Empty map	Key-value structure for storing the number of occurrences of each indicator in the cProfile table column data when calling the Stacked API (for numeric and symbolic data types)
keySum	Map<String, Double>	Empty map	Key-value structure for storing the sum of values for each key (only for numeric data types)
keyAvg	Map<String, Double>	Empty map	Key-value structure for storing the average value for each key (only for numeric data types)

Return to Contents

Object to get data aggregates from the gantt API - GanttColumnCount and GanttColumnSum classes

Table 22a. GanttColumnCount Class for Two-Level COUNT Aggregation Results

Property name	Type	Default value	Description
key	String	N/A	Key, stores a unique metric in the firstGrpBy table column data when calling the Gantt API
gantt	Map<String, Integer>	Empty map	Key-value structure, for storing the number of occurrences of each unique value from secondGrpBy with the key filter by the data from firstGrpBy when calling the Gantt API

Table 22b. GanttColumnSum Class for Two-Level SUM Aggregation Results

Property name	Type	Default value	Description
key	String	N/A	The unique value from the first grouping level column
value	Double	0	Total sum of all values from the second (numeric) column for this group

Return to Contents

Code use cases

All code use cases make Dimension DB API calls through the DStore interface;
A detailed description of the contracts and method signatures of the DStore interface are presented in the Used Methods section;
Before each call to a method of the DStore interface, it is initialized according to the instructions in the DB Initialization Algorithm section;
A test case has been developed for each use case in Dimension DBUseCasesCodeTest.

Table 23. Use cases

#	Method Name	API Type	Description	Use cases
1	getTProfile	Metadata	Get table metadata by name	getTProfile
2	loadDirectTableMetadata	Metadata	Load metadata from SProfile	loadDirectTableMetadata
3	loadJdbcTableMetadata	Metadata	Load metadata via JDBC connection with query	loadJdbcTableMetadata
4	loadJdbcTableMetadata	Metadata	Load metadata via JDBC (schema and table)
5	setTimestampColumn	Metadata	Set timestamp column	setTimestampColumn
6	putDataDirect	Write	Save data using Java structure	putDataDirect
7	putDataJdbc	Write	Save data from JDBC ResultSet	putDataJdbc
8	putDataJdbcBatch	Write	Batch save from JDBC ResultSet	putDataJdbcBatch
9	getBlockKeyTailList	Read	Get blocks with keys and ranges	getBlockKeyTailList
10	getStacked	Read	Get aggregated data (COUNT/SUM/AVG)	getStacked
11	getStacked	Read	Get aggregated data with filter	getStackedCountFilter
12	getGantt	Read	Two-level COUNT grouping (basic)	getGantt
13	getGantt	Read	Two-level COUNT grouping (multithreaded)	getGantt
14	getGantt	Read	Two-level COUNT grouping with filter	getGanttFilter
15	getGanttSum	Read	Two-level SUM grouping (basic)	getGanttSum
16	getGanttSum	Read	Two-level SUM grouping with filter	getGanttSumFiltered
17	getDistinct	Read	Get distinct values (basic)	getDistinct
18	getDistinct	Read	Get distinct values with filter	getDistinctWithFilter
19	getRawDataAll	Read	Get raw data (all columns, no filter)	getRawDataAll
20	getRawDataAll	Read	Get raw data (all columns, with filter)	getRawDataAllFilter
21	getRawDataByColumn	Read	Get raw data for specific column	getRawDataByColumn
22	getBatchResultSet	Read	Batch read (regular tables)	getBatchResultSetRegularTable
23	getBatchResultSet	Read	Batch read (time-series tables)	getBatchResultSetTimeSeriesTable
24	getFirst	Metadata	Get first timestamp	getFirst
25	getLast	Metadata	Get last timestamp	getLast
26	syncBackendDb	Backend	Sync Berkley DB to disk
27	closeBackendDb	Backend	Close Berkley DB

Return to Contents

Performance

Table 24. Test environment

Category	Details
Processor	AMD Ryzen 5 5600H with Radeon Graphics, 3301 Mhz, 6 Core(s), 12 Logical Processor(s)
RAM	16.0 GB
Disk	Generic Flash Disk USB Device - SAMSUNG MZVL2512HCJQ-00B00 (476.94 GB)
OS	Microsoft Windows 11
Test data	New York taxi orders

SQL query to get statistics for partition 2016

WITH 
2016_parts AS (
    SELECT sum(bytes) AS bytes
    FROM system.parts
    WHERE table = 'trips_mergetree'
      AND database = 'datasets'
      AND active
      AND partition LIKE '2016%'
)
SELECT
    (SELECT count() FROM datasets.trips_mergetree WHERE toYear(pickup_datetime) = 2016) AS exact_row_count,
    formatReadableSize((SELECT bytes FROM 2016_parts)) AS total_size,
    formatReadableSize((SELECT bytes FROM 2016_parts) / exact_row_count) AS avg_row_size;

Table 25. Data for 2016

exact_row_count	total_size	avg_row_size
78325655	6.88 GiB	94.37 B

Table 26. Load profiles

Value\Profile	№ 1	№ 2	№ 3	№ 4
TType	TIME_SERIES	TIME_SERIES	TIME_SERIES	TIME_SERIES
IType	GLOBAL	LOCAL	LOCAL	LOCAL
AType	ON_LOAD	ON_LOAD	FULL_PASS_ONCE	FULL_PASS_EACH
Compression	true	true	true	true
Load (min)	31 min 9 sec	23 min 28 sec	23 min 36 sec	23 min 31 sec
Size (GB)	9,015	11,993	11,331	12,382
Avg Rows/Sec	41 891	55 620	55 314	55 499
Avg MB/Sec	5,28	7,02	6,98	7,00

Table 27. Performance tests for gantt API (count)

№	Test name	№ 1	№ 2	№ 3	№ 4
1	getGanttRawRaw	13,5 / 9,0	13,7 / 9,0	14,3 / 9,3	13,4 / 8,9
2	getGanttEnumEnum	8,5 / 4,3	11,3 / 6,9	11,2 / 7,0	11,0 / 7,0
3	getGanttHistHist	7,5 / 3,9	13,5 / 8,5	13,3 / 8,6	13,1 / 8,5
4	getGanttHistRaw	9,6 / 6,2	11,3 / 7,9	11,0 / 7,7	10,9 / 7,9
5	getGanttHistEnum	6,9 / 3,7	12,7 / 8,5	12,4 / 8,5	12,3 / 8,6
6	getGanttEnumRaw	9,8 / 6,6	13,8 / 9,8	14,0 / 9,8	13,7 / 9,7
7	getGanttEnumHist	6,8 / 4,1	17,0 / 10,9	15,0 / 10,5	15,0 / 10,4
8	getGanttRawHist	9,8 / 6,4	14,4 / 9,8	14,2 / 10,0	13,9 / 9,8
9	getGanttRawEnum	9,9 / 6,1	14,3 / 9,3	14,0 / 9,4	14,2 / 9,4

Table 28. Performance tests for gantt API (sum)

№	Test name	№ 1	№ 2	№ 3	№ 4
1	getGanttSumHistRaw	5,5	7,4	7,2	7,2
2	getGanttSumEnumRaw	6,5	11,2	10,8	11,2
3	getGanttSumRawRaw	9,3	10,0	9,8	10,1
4	getGanttSumHistHist	4,8	6,8	6,6	6,6
5	getGanttSumEnumEnum	5,1	10,0	9,9	10,1
6	getGanttSumRawEnum	8,4	8,7	8,6	8,9
7	getGanttSumHistRawWithFilter	6,9	9,6	9,6	9,7

Table 29. Performance tests for stacked API

№	Test name	№ 1	№ 2	№ 3	№ 4
1	stackedHist	5,4	6,1	8,8	6,1
2	stackedHistDate	0,1	0,1	0,1	0,1
3	stackedEnum	5,0	10,3	8,9	8,9
4	stackedEnumDate	0,1	0,1	0,1	0,1
5	stackedRaw	7,8	7,8	7,9	8,0
6	stackedRawDate	0,1	0,1	0,1	0,1

Table 30. Queries Table

№	Test name	SQL query
	Gantt API (Count)
1	getGanttRawRaw	`SELECT pickup_cdeligibil, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, vendor_id;`
2	getGanttHistRaw	`SELECT trip_type, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, vendor_id;`
3	getGanttRawEnum	`SELECT pickup_cdeligibil, cab_type, COUNT(cab_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, cab_type;`
4	getGanttRawHist	`SELECT pickup_cdeligibil, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, pickup_boroname;`
5	getGanttEnumEnum	`SELECT dropoff_puma, dropoff_borocode, COUNT(dropoff_borocode) FROM datasets.trips_mergetree where toYear(pickup_date) = 2016 group by dropoff_puma, dropoff_borocode;`
6	getGanttEnumHist	`SELECT dropoff_boroname, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, pickup_boroname;`
7	getGanttEnumRaw	`SELECT dropoff_boroname, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, vendor_id;`
8	getGanttHistEnum	`SELECT trip_type, dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, dropoff_boroname;`
9	getGanttHistHist	`SELECT trip_type, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, pickup_boroname;`
	Gantt API (Sum)
10	getGanttSumHistRaw	`SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;`
11	getGanttSumEnumRaw	`SELECT pickup_boroname, SUM(trip_distance) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_boroname;`
12	getGanttSumRawRaw	`SELECT vendor_id, SUM(total_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id;`
13	getGanttSumHistHist	`SELECT trip_type, SUM(passenger_count) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;`
14	getGanttSumEnumEnum	`SELECT dropoff_boroname, SUM(dropoff_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname;`
15	getGanttSumRawEnum	`SELECT pickup_cdeligibil, SUM(pickup_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil;`
16	getGanttSumWithFilter	`SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 AND trip_type IN (1, 2) GROUP BY trip_type;`
	Stacked API
17	stackedHist	`SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;`
18	stackedHistDate	`SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY trip_type;`
19	stackedEnum	`SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname;`
20	stackedEnumDate	`SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY dropoff_boroname;`
21	stackedRaw	`SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id;`
22	stackedRawDate	`SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY vendor_id;`

Download

Building and installing Dimension DB from source codes to a local Maven repository is described in the Building the project section.

Return to Contents

Development

If you find an error in the code or have suggestions for improvement, create a ticket;
Before starting work, you must check the project build and successful completion of unit tests according to the instructions in the Building the project section;
Integration tests are used to check the correctness and performance of the Dimension DB API. For more details, see Preparing the Environment to Run Integration Tests and Code Use Cases.

Return to Contents

Documentation

EN	RU
README in English	README на русском

Return to Contents

License

Code released under the Apache License Version 2.0

Return to Contents

Support

Created with support of "Innovation Promotion Fund".

Return to Contents

Contact

Created by @akardapolov - feel free to contact me!

Return to Contents

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitea/workflows		.gitea/workflows
media		media
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README-RU.md		README-RU.md
README.md		README.md
pom.xml		pom.xml

License

akardapolov/dimension-db

Folders and files

Latest commit

History

Repository files navigation

Dimension DB

Contents

Program Description

General Information

Application areas

Minimum technical requirements

Hardware requirements

Software requirements

Operating system requirements

Getting started with the project

Building the project

Using the project

Prepare the environment to run unit tests

Prepare the environment to run integration tests

PostgreSQL

Oracle

MS SQL

Clickhouse

Loading test data into Clickhouse

Description of the logical structure

DB architecture

Data storage format

Algorithms

DB initialization algorithm

Program API

API Sequence Diagram

Used methods

DStore interface

Dimension DB class

Parameters

Table type - enum TType

Table indexing type - enum IType

Data analysis method type by table - enum AType

Table backend type - enum BType

Data grouping function type - enum GroupFunction

Table column storage parameters - CSType class

Data storage type - enum SType

Java data storage type - enum CType

JDBC Data Storage Type - enum DataType

Input parameters

DB settings - DBConfig class

SProfile table settings

Output Parameters

Table Settings - TProfile Class

Table Column Settings - CProfile Class

Object to get data aggregates from stacked API - StackedColumn class

Object to get data aggregates from the gantt API - GanttColumnCount and GanttColumnSum classes

Code use cases

Performance

Download

Development

Documentation

License

Support

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Languages

Packages