Skip to content

akardapolov/dimension-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dimension DB

Dimension DB logo

Hybrid time-series and block-column storage database engine written in Java

Contents

Program Description

General Information

Dimension DB is a hybrid columnar database for storing time series data. Written in the Java programming language.

Key Features:

  • Columnar data storage format;
  • Storage of time series data;
  • Storage format with built-in support for data deduplication and compression;
  • Setting the storage format at the table and block level;
  • Automatic selection of the deduplication algorithm based on the collected data;
  • Use of the Berkley DB database as a backend for local storage;
  • Support for basic JDBC data types for PostgreSQL, Oracle, MS SQL and ClickHouse databases;
  • In read-only mode, automatic generation of SQL queries to tables with time series data;

Return to Contents

Application areas

The library can be used as a lightweight analytical database for backend data processing systems on the Java platform to solve the following application problems;

  • storing collected data and analytical processing of small and medium amounts of data for various embedded Java applications;
  • storing and processing Internet of Things (IoT) data on the devices themselves as temporary storage, cache, and other tasks related to local data processing;
  • metadata storage for querying external data sources using automatic SQL generation based on the data model for analytical applications.

Return to Contents

Minimum technical requirements

Dimension DB is compatible with Java 25+ and comes with a small set of dependencies.

Hardware requirements

Table 1. Hardware requirements

Parameter Description
CPU and RAM Processor with a frequency of at least 500 MHz and a memory capacity of at least 250 MB, depending on the volume of processed data
CPU architecture Intel 64-bit (x86_64), AMD 64-bit (x86_64), Arm 64-bit (aarch64)
Disk Disk size depending on the volume of processed data

Software requirements

Table 2. Software requirements

Software Requirements
Java Java version 25+
Maven Not lower than version 3
Git Latest current version

Operating system requirements

Table 3. Operating system requirements

Software Requirements
Operating system Windows, Linux, MacOS

Return to Contents

Getting started with the project

Building the project

Make sure you have JDK 25+, Maven and the latest Git

  java -version
  mvn -version
  git --version

Download the Dimension DB repository sources:

  git clone <<url>>
  cd Dimension DB

To compile the project sources, run:

  mvn clean compile

To run unit tests, run:

  mvn clean test

To build the project into a Dimension DB jar file, run:

  mvn clean package

To build and install the Dimension DB jar file into your local Maven repository, run:

  mvn clean install

Notes:

  • Building the project may include running unit tests from Dimension DBRunnerTest and DBaseRunnerUseCasesTest, which take approximately 25 minutes to complete.
  • The majority of this time is consumed by the DimensionDBRunnerTest class, which runs 45 tests with various configurations.
  • Both Dimension DBRunnerTest and DBaseRunnerUseCasesTest are disabled by default using the @Disabled annotation.
  • The DimensionDBRunnerTest class contains tests with different permutation levels, as detailed below:

Table 4. Parameters and statistics for Dimension DBRunnerTest tests

Test name PermutationState Number of configurations Total tests Execution time
testPermutationNone NONE 108 4 860 ~ 9 seconds
testPermutationPartial PARTIAL 1 944 87 480 ~ 2 minutes
testPermutationAll ALL 26 244 1 180 980 ~ 25 minutes
  • To reduce build time by skipping these tests, use the following Maven commands:
# Skip DimensionDBRunnerTest during testing phase
mvn clean test -Dtest=!DimensionDBRunnerTest

# Skip during packaging
mvn clean package -Dtest=!DimensionDBRunnerTest

# Skip during installation
mvn clean install -Dtest=!DimensionDBRunnerTest

Return to Contents

Using the project

To use the Dimension DB database in a Java application, you need to add the jar file as a dependency to the Maven pom.xml settings file

Add the Dimension DB jar file as a dependency to your project's pom.xml settings file:

<?xml version="1.0" encoding="UTF-8"?>
<project>
  <properties>
    <dimension-db-version>25.7.1</dimension-db-version>
  </properties>
  <!--.....-->
  <!--.....-->
  <!--.....-->
  <dependencies>
    <dependency>
      <groupId>ru.dimension</groupId>
      <artifactId>db</artifactId>
      <version>{Dimension DB-version}</version>
    </dependency>
  </dependencies>
</project>

Notes:

  • To build and install the latest current version of the Dimension DB library in the local Maven repository, download the application source codes and run:
mvn clean install

Details on building and installing the project are described in the Building the project section

  • You can find a full list of Dimension DB use cases in your application in the Code use cases section.

Return to Contents

Prepare the environment to run unit tests

To run unit tests, you need to follow the steps to build the project from the Building the project section

Return to Contents

Prepare the environment to run integration tests

Preparing the environment to run integration tests requires a local database installation using Docker Desktop

Instructions for installing Docker Desktop for Windows and Linux

Return to Contents

PostgreSQL

Install and run a container with PostgreSQL DB

docker run -d --name db \
-e 'POSTGRES_PASSWORD=postgres' \
-p 5432:5432 \
-v pgdata:/var/lib/postgresql/data \
postgres \
postgres -c 'max_connections=200'

Return to Contents

Oracle
  • Install locally repository of scripts for installing Oracle DB
  • Download the DB distribution from the Oracle website. For the Linux x86 platform and DB version 19.3, copy the LINUX.X64_193000_db_home.zip file to the local directory of the OracleDatabase/SingleInstance/dockerfiles/19.3.0 script repository
  • Run the buildContainerImage.sh script from the script repository and specify the parameters (example for DB version 19.3.0)
    OracleDatabase/SingleInstance/dockerfiles/buildContainerImage.sh -v 19.3.0 -e -i
  • Install and run a container with an Oracle DB instance from the command line (example of running the Docker run command for DB version 19.3.0)
    docker run --name database19ee \
    -p 1523:1521 -p 5522:5500 -e \
    ORACLE_SID=orcl -e ORACLE_PDB=pdb_orcl -e ORACLE_PWD=sys \
    -e ORACLE_CHARACTERSET=AL32UTF8 \
    -v /opt/oracle/oradata oracle/database:19.3.0-ee

Return to Contents

MS SQL

Install and run container with MS SQL DB

docker run -e 'ACCEPT_EULA=Y' \
-e 'SA_PASSWORD=QOEfSsa51234!' \
-p 1433:1433 \
-v |(pwd)/drive:/var/opt/mssql/data \
--name mssql \
mcr.microsoft.com/mssql/server

Return to Contents

Clickhouse

Install and run container with Clickhouse DB

docker run -d --name clickhouse-server \
-e CLICKHOUSE_USER=admin -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 -e CLICKHOUSE_PASSWORD=admin \
-p 8123:8123 -p 9000:9000 \
-v C:\clickhouse:/var/lib/clickhouse clickhouse/clickhouse-server

Return to Contents

Loading test data into Clickhouse
  • Install Clickhouse locally and check the connection to ClickHouse on a local PC using the URL "jdbc:clickhouse://localhost:8123" or use another connection string and make changes to the tests;
  • Load test data New York taxi orders into a local ClickHouse instance (the format of pre-prepared partitions loaded into the datasets schema is used);
  • Check that the data has loaded clickhouse-client --query "select count(*) from datasets.trips_mergetree";
  • Create a directory for local storage of Dimension DB data (example for Windows) "C:\Users\.temp";
  • Load test data into the local Dimension DB database using any of the methods in Dimension DBCHLoadDataTest marked with the @Test annotation; – Run integration tests in Dimension DBCHQueryGanttTest, Dimension DBCHQueryStackedTest or Dimension DBCHQueryRawTest which are marked with the @Test annotation.

Notes:

  • Integration tests use the @Disabled annotation to disable them when running Dimension DB unit tests, if necessary, it should be removed for correct data loading and testing.
  • Test data from the trips_mergetree table is loaded into the datasets schema
    CREATE DATABASE datasets;
    
  • Trips_mergetree table definition
CREATE TABLE trips_mergetree
    CREATE TABLE trips_mergetree (
      trip_id UInt32,
      vendor_id Enum8('1' = 1, '2' = 2, 'CMT' = 3, 'VTS' = 4, 'DDS' = 5, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14),
      pickup_date Date,
      pickup_datetime DateTime,
      dropoff_date Date,
      dropoff_datetime DateTime,
      store_and_fwd_flag UInt8,
      rate_code_id UInt8,
      pickup_longitude Float64,
      pickup_latitude Float64,
      dropoff_longitude Float64,
      dropoff_latitude Float64,
      passenger_count UInt8,
      trip_distance Float64,
      fare_amount Float32,
      extra Float32,
      mta_tax Float32,
      tip_amount Float32,
      tolls_amount Float32,
      ehail_fee Float32,
      improvement_surcharge Float32,
      total_amount Float32,
      payment_type_ Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
      trip_type UInt8,
      pickup FixedString(25),
      dropoff FixedString(25),
      cab_type Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
      pickup_nyct2010_gid UInt8,
      pickup_ctlabel Float32,
      pickup_borocode UInt8,
      pickup_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
      pickup_ct2010 FixedString(6),
      pickup_boroct2010 FixedString(7),
      pickup_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
      pickup_ntacode FixedString(4),
      pickup_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
      pickup_puma UInt16,
      dropoff_nyct2010_gid UInt8,
      dropoff_ctlabel Float32,
      dropoff_borocode UInt8,
      dropoff_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
      dropoff_ct2010 FixedString(6),
      dropoff_boroct2010 FixedString(7),
      dropoff_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
      dropoff_ntacode FixedString(4),
      dropoff_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
      dropoff_puma UInt16
    ) ENGINE = MergeTree() ORDER BY (pickup_date, pickup_datetime);

Return to Contents

Description of the logical structure

DB architecture

Architecture

Mermaid flow chart
flowchart TD
  Client{Clients}
  Client -->|Write Request| DStore_Write_Interface[Write API]
  Client -->|Read Request| DStore_Read_Interface[Read API]
  Client -->|Metadata Request| MetadataAPI[Metadata API]

  DStore_Write_Interface --> |Direct Insert| LocalBerkleyDB[(Local Berkley DB)]
  DStore_Write_Interface --> |Fetch Data| ExternalDB_Write[(External DBs JDBC)]
  ExternalDB_Write --> LocalBerkleyDB

  DStore_Read_Interface --> |Local Read| LocalBerkleyDB
  DStore_Read_Interface --> |External Read| ExternalDB_Read[(External DBs JDBC)]

  MetadataAPI --> MetadataStore[Metadata file]
Loading

Return to Contents

Data storage format

Data storage format

Three data storage formats are supported:

  • RAW - when data is stored as a key of the Java type int for string data and a suitable Java type for the rest (numeric, special, etc.);
  • ENUM - when data is stored as a key of the Java type byte value;
  • HISTOGRAM - when the start and end index of repeating data from the range and the value itself are stored.

Metadata of the storage format, indexing type and compression are stored in the block header.

Return to Contents

Algorithms

The main methods of the Dimension DB Java API are located in the DStore interface. Access to DStore must be obtained after initialization of the Dimension DB by calling the constructor of the Dimension DB class, examples are presented in the DB initialization algorithm section.

A complete list of methods of the main classes and interfaces with input and output parameters is posted in the Used methods section.

DB initialization algorithm
  1. Define a local directory for storing Dimension DB settings;
  2. Create the Dimension DB DB configuration Dimension DBConfig;
  3. Initialize the backend DB;
  4. Initialize the Dimension DB by calling the constructor from the Dimension DB class;
  5. Get a reference to the DStore interface from the Dimension DB class to access the Dimension DB Java API.

Example of DB initialization for Berkley DB backend

@TempDir
static File databaseDir;
...
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());
BerkleyDB berkleyDB = new BerkleyDB(databaseDir.getAbsolutePath(), true);
Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, berkleyDB.getStore());
DStore fStore = Dimension DB.getDStore();

Example of DB initialization for ClickHouse backend

@TempDir
static File databaseDir;
...
BType bType = BType.CLICKHOUSE;
String driverClassName = "com.clickhouse.jdbc.ClickHouseDriver";
String dbUrl = "jdbc:clickhouse://localhost:8123";

SqlDB sqlDB = new SqlDB(bType, driverClassName, dbUrl, "admin", "admin");
BasicDataSource basicDataSource = sqlDB.getDatasource();
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());

Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, bType, basicDataSource);
DStore fStore = Dimension DB.getDStore();

Notes:

  • The temporary directory databaseDir is created automatically using the @TempDir annotation of the JUnit library;
  • For the example with the Berkley DB backend, Dimension DB settings and backend data are located in the same temporary directory databaseDir;
  • Code examples are taken from unit and integration tests.

Return to Contents

Program API

Dimension DB offers advanced capabilities for interacting with the database through a set of methods capable of performing various data storage and retrieval tasks. These APIs are specifically designed to provide flexibility and optimal work with time series data in a Java environment for both locally stored data and external data storage systems with an SQL interface.

  • Main API. The main functions for working with data are built into the DStore interface, which provides standardized access to write and read methods. DStore is a key element of the Dimension DB architecture, providing an interface point for interacting with the database for client applications.

  • Writing data. Several approaches are supported for writing information to the Dimension DB database:

    • Direct - inserting data into a local database using an intermediate Java table-type data structure.
    • JDBC — insert data into a local database using data received from an external data source via JDBC, which allows integration with other systems that can work with JDBC.
    • JDBC Batch — batch loading into a local database using data received from an external data source via JDBC, minimizing the number of transactions for large-scale insertion.
  • Reading data. The process of extracting data from Dimension DB uses APIs, each of which is optimized for the corresponding types of queries:

    • Stacked — retrieving data based on aggregation functions such as COUNT SUM and AVG, obtaining information on the distribution of data by selected attributes and parameters.
    • Gantt — complex distribution calculation using COUNT aggregation at two levels, which is useful for analyzing dependencies and relationships between different indicators.
    • Raw — extracting source data in tabular format to view detailed information on the entire array of collected data.
    • BatchResultSet — extracting source data in tabular format with row-by-row access to get only part of the rows from the result set in the DB.
  • Loading metadata. To work with external data sources, you need to perform queries to sources via JDBC or Java table structures. To store information on tables and data types used for local storage and access to external data sources via JDBC, an API is used to load and view this information in a local Dimension DB metadata storage file.

The APIs for writing data only work with the local Berkley DB key-value data storage. API methods for reading data can work with both local Berkley DB storage and external data sources via JDBC using automatic generation of SQL queries.

Table 5. Supported databases for working with API in Dimension DB

Dimension DB API support Database Database type
1 Read/Write Berkley DB key-value
2 Read ClickHouse analytical
3 Read Oracle relational
4 Read PostgreSQL relational
5 Read Microsoft SQL Server relational
6 Read MySQL relational

Before you start writing data, you need to set storage parameters and metadata for tables and columns in the SProfile object.

The database table configuration allows you to switch between global and local indexing, and enable or disable data compression on the fly. This is achieved by placing storage type metadata in the block header for both indexing types and a flag to enable or disable compression.

Return to Contents

API Sequence Diagram

Dimension DB API Call Sequence Diagram

Mermaid sequence diagram
sequenceDiagram
  participant Client
  participant DStore_Write as Write API
  participant DStore_Read as Read API
  participant MetadataAPI as Metadata API
  participant LocalDB as Local Berkley DB Storage
  participant ExternalDB as External DBs via JDBC

  Client->>DStore_Write: Write Request
  DStore_Write->>MetadataAPI: Fetch Metadata
  MetadataAPI->>DStore_Write: Return Metadata

  rect rgba(200, 200, 255, 0.1)
    DStore_Write->>LocalDB: Direct Insert
    DStore_Write->>ExternalDB: Fetch Data
    ExternalDB->>DStore_Write: Return Data
    DStore_Write->>LocalDB: Insert from ExternalDB
  end

  Client->>DStore_Read: Read Request
  DStore_Read->>MetadataAPI: Fetch Metadata
  MetadataAPI->>DStore_Read: Return Metadata

  rect rgba(200, 255, 200, 0.1)
    DStore_Read->>LocalDB: Local Read
    DStore_Read->>ExternalDB: Fetch Data
    ExternalDB->>DStore_Read: Return Data
  end

  Client->>MetadataAPI: Read Metadata Request
  MetadataAPI->>Client: Return Metadata
Loading

Return to Contents

Used methods
  • The DStore interface contains the main contracts and method signatures for interaction between the Dimension DB database and client applications;
  • Initialization of database objects occurs when calling the Dimension DB class constructor;
  • The reference to the DStore interface for accessing the application API is called from the Dimension DB class object.
DStore interface

Table 6. List of APIs for working with the Dimension DB database of the DStore interface

# Method Name API Type Description Input Parameters Output Parameters
1 getTProfile Metadata Get table metadata by name String tableName TProfile
2 loadDirectTableMetadata Metadata Load metadata from SProfile SProfile sProfile TProfile
3 loadJdbcTableMetadata Metadata Load metadata via JDBC connection with query Connection connection, String query, SProfile sProfile TProfile
4 loadJdbcTableMetadata Metadata Load metadata via JDBC (schema and table) Connection connection, String sqlSchemaName, String sqlTableName, SProfile sProfile TProfile
5 setTimestampColumn Metadata Set timestamp column String tableName, String timestampColumnName void
6 putDataDirect Write Save data using Java structure String tableName, List<List> data void
7 putDataJdbc Write Save data from JDBC ResultSet String tableName, ResultSet resultSet long (last row timestamp)
8 putDataJdbcBatch Write Batch save from JDBC ResultSet String tableName, ResultSet resultSet, Integer batchSize void
9 getBlockKeyTailList Read Get blocks with keys and ranges String tableName, long begin, long end List
10 getStacked Read Get aggregated data String tableName, CProfile cProfile, GroupFunction groupFunction, CompositeFilter compositeFilter, long begin, long end List
11 getGanttCount Read Two-level COUNT grouping String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end List
12 getGanttCount Read Two-level COUNT grouping (multithreaded) String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, int batchSize, long begin, long end List
13 getGanttSum Read Two-level SUM grouping String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end List
14 getDistinct Read Get distinct values String tableName, CProfile cProfile, OrderBy orderBy, CompositeFilter compositeFilter, int limit, long begin, long end List
15 getRawDataAll Read Get raw data (all columns, no filter) String tableName, long begin, long end List<List>
16 getRawDataAll Read Get raw data (all columns, with filter) String tableName, CProfile cProfileFilter, String filter, long begin, long end List<List>
17 getRawDataByColumn Read Get raw data for specific column String tableName, CProfile cProfile, long begin, long end List<List>
18 getBatchResultSet Read Batch read (regular tables) String tableName, int fetchSize BatchResultSet
19 getBatchResultSet Read Batch read (time-series tables) String tableName, long begin, long end, int fetchSize BatchResultSet
20 getFirst Metadata Get first timestamp String tableName, long begin, long end long
21 getLast Metadata Get last timestamp String tableName, long begin, long end long
22 syncBackendDb Backend Sync Berkley DB to disk - void
23 closeBackendDb Backend Close Berkley DB - void

Return to Contents

Dimension DB class

Table 7. DBase class where DB objects are initialized

Item # Constructor name API type Description Input parameters Output parameters
1 DBase Read/Write Create Dimension DB storage for Berkley DB backend Dimension DBConfig Dimension DBConfig - local Dimension DB storage configuration,
EntityStore - Berkley DB data storage
Constructor
2 DBase Read Create Dimension DB storage for a specific backend type Dimension DBConfig Dimension DBConfig - Dimension DB local storage configuration,
BType backendType - backend type for reading,
BasicDataSource basicDataSource - Apache DBCP2 JDBC connection pool object
Constructor

Return to Contents

Parameters
Table type - enum TType

Table 8. Enum TType for storing supported table types

Property name Type Description
REGULAR TType Table for storing regular tables
TIME_SERIES TType Table for storing time series data

Return to Contents

Table indexing type - enum IType

Table 9. Enum IType for storing table indexing type

Property name Type Description
GLOBAL IType The type of index used (RAW, HISTOGRAM, ENUM), defined by the user at the table level for each column in the SProfile input parameters
LOCAL IType The type of index used (RAW, HISTOGRAM, ENUM), determined automatically by the system at the data block level for each table column

Return to Contents

Data analysis method type by table - enum AType

Table 10. Enum AType for storing the data analysis method for local indexing

Имя свойства Тип Описание
ON_LOAD AType Data is analyzed upon initial retrieval, and the indexing type for each column in the table is determined based on this analysis
FULL_PASS_ONCE AType Data is analyzed upon initial retrieval, then on subsequent runs, each column of the table is sequentially analyzed once,
and the indexing type is determined based on this information.
FULL_PASS_EACH AType Same as FULL_PASS_ONCE, except each column of the table is analyzed every time the data write API is called.
Each API call analyzes one column, cycling through them in order.

Return to Contents

Table backend type - enum BType

Table 11. Enum BType for storing supported DB types used as a backend for the API for reading and writing data

Property name Type Description
BERKLEYDB BType BerkleyDB
CLICKHOUSE BType ClickHouse
ORACLE BType Oracle
POSTGRES BType PostgreSQL
MSSQL BType Microsoft SQL Server

Return to Contents

Data grouping function type - enum GroupFunction

Table 12. Enum GroupFunction for passing the grouping function when calling Stacked API

Property name Type Description
COUNT GroupFunction Calculating the number of occurrences of unique indicators in the data by the cProfile column
SUM GroupFunction Calculating the sum in the data by the cProfile column
AVG GroupFunction Calculating the average value in the data by the cProfile column

Return to Contents

Table column storage parameters - CSType class

Table 13. CSType class for storing storage parameters and data types of table columns

Property name Type Default value Description
isTimeStamp boolean false Flag indicating the column that stores the timestamp
sType SType null Data storage type in the table (RAW, HISTOGRAM, ENUM)
cType CType null Java data type used to store data in Berkley DB (Int, Long, Double, String ... etc.)
dType DataType null JDBC data type used to convert data to Berkley DB Java type (INTEGER, DATE, DATETIME, VARCHAR ... etc.)

Return to Contents

Data storage type - enum SType

Table 14. SType class for storing the table data storage type used in a column or block

Property name Type Default value Description
RAW SType 1 Data as is or as a key for string data types
HISTOGRAM SType 2 Data as a histogram
ENUM SType 3 Data as a list of constants

Return to Contents

Java data storage type - enum CType

Table 15. CType class for storing the Java data type used to store data in the Berkley DB table column

Property name Type Default Value Description
BYTE int 1 Represents a byte
INT int 2 Represents an integer
LONG int 3 Represents a long integer
FLOAT int 4 Represents a float
DOUBLE int 5 Represents a double-precision float
STRING int 6 Represents a string

Return to Contents

JDBC Data Storage Type - enum DataType

Table 16. DataType class for storing JDBC data types, which is used to convert data to Java storage format in Berkley DB database of Dimension DB database table column

Property name Type Default value Description
BIT byte 0 Postgres: bit data type
... ... ... ...
RAW byte 23 Oracle: raw data
... ... ... ...
INT byte 31 MS SQL: integer
... ... ... ...
DEC byte 47 ClickHouse: Decimal Numbers

Full list of properties in file

Return to Contents

Input parameters
DB settings - DBConfig class

Table 17. Dimension DBaseConfig class for storing settings

Property name Type Default value Description
configDirectory String Absolute path to the directory for storing the configuration file with Dimension DB metadata
configFileName String "metamodel.obj" Name of the configuration file with Dimension DB metadata

Return to Contents

SProfile table settings

Table 18. SProfile class for storing table/column settings in the Dimension DB database (used in input parameters)

Property name Type Default value Description
tableName String null Table name
tableType TType TIME_SERIES Table type
indexType IType GLOBAL Index type
backendType BType BERKLEYDB Backend type for storing data
compression Boolean FALSE Whether to use data compression in the table or not
csTypeMap Map<String, CSType> null Key-value structure mapping column names to CSTypes

Return to Contents

Output Parameters
Table Settings - TProfile Class

Table 19. TProfile Class for Getting Table/Column Settings from Metadata (Used in Output Parameters)

Property Name Type Default Value Description
tableName String null Table Name
tableType TType null Table Type
indexType IType null Index Type
backendType BType null Backend Type
compression Boolean false Whether data compression is enabled
cProfiles List null or empty list List of table column settings

Return to Contents

Table Column Settings - CProfile Class

Table 20. CProfile Class for Storing Table Column Settings

Property Name Type Default Value Description
colId int 0 Unique Identifier
colIdSql int 0 Column Identifier in SQL
colName String null Column Name
colDbTypeName String null Data Type Name in DB
colSizeDisplay int 0 Size for display
colSizeSqlType int 0 SQL Type Size
csType CSType null The CSType type stores the storage parameters and data types of a column

Return to Contents

Object to get data aggregates from stacked API - StackedColumn class

Timestamps store the date in milliseconds that have passed since January 1, 1970 - Unix-time

Table 21. StackedColumn class for get data aggregates from stacked API

Property name Type Default value Description
key long N/A Object identifier - timestamp of the beginning of the range by which the values of the aggregates in keyCount, keySum, keyAvg are calculated
tail long 0 End timestamp of the range over which the aggregate values in keyCount, keySum, keyAvg are calculated
keyCount Map<String, Integer> Empty map Key-value structure for storing the number of occurrences of each indicator in the cProfile table column data when calling the Stacked API (for numeric and symbolic data types)
keySum Map<String, Double> Empty map Key-value structure for storing the sum of values for each key (only for numeric data types)
keyAvg Map<String, Double> Empty map Key-value structure for storing the average value for each key (only for numeric data types)

Return to Contents

Object to get data aggregates from the gantt API - GanttColumnCount and GanttColumnSum classes

Table 22a. GanttColumnCount Class for Two-Level COUNT Aggregation Results

Property name Type Default value Description
key String N/A Key, stores a unique metric in the firstGrpBy table column data when calling the Gantt API
gantt Map<String, Integer> Empty map Key-value structure, for storing the number of occurrences of each unique value from secondGrpBy with the key filter by the data from firstGrpBy when calling the Gantt API

Table 22b. GanttColumnSum Class for Two-Level SUM Aggregation Results

Property name Type Default value Description
key String N/A The unique value from the first grouping level column
value Double 0 Total sum of all values from the second (numeric) column for this group

Return to Contents

Code use cases

  • All code use cases make Dimension DB API calls through the DStore interface;
  • A detailed description of the contracts and method signatures of the DStore interface are presented in the Used Methods section;
  • Before each call to a method of the DStore interface, it is initialized according to the instructions in the DB Initialization Algorithm section;
  • A test case has been developed for each use case in Dimension DBUseCasesCodeTest.

Table 23. Use cases

# Method Name API Type Description Use cases
1 getTProfile Metadata Get table metadata by name getTProfile
2 loadDirectTableMetadata Metadata Load metadata from SProfile loadDirectTableMetadata
3 loadJdbcTableMetadata Metadata Load metadata via JDBC connection with query loadJdbcTableMetadata
4 loadJdbcTableMetadata Metadata Load metadata via JDBC (schema and table)
5 setTimestampColumn Metadata Set timestamp column setTimestampColumn
6 putDataDirect Write Save data using Java structure putDataDirect
7 putDataJdbc Write Save data from JDBC ResultSet putDataJdbc
8 putDataJdbcBatch Write Batch save from JDBC ResultSet putDataJdbcBatch
9 getBlockKeyTailList Read Get blocks with keys and ranges getBlockKeyTailList
10 getStacked Read Get aggregated data (COUNT/SUM/AVG) getStacked
11 getStacked Read Get aggregated data with filter getStackedCountFilter
12 getGantt Read Two-level COUNT grouping (basic) getGantt
13 getGantt Read Two-level COUNT grouping (multithreaded) getGantt
14 getGantt Read Two-level COUNT grouping with filter getGanttFilter
15 getGanttSum Read Two-level SUM grouping (basic) getGanttSum
16 getGanttSum Read Two-level SUM grouping with filter getGanttSumFiltered
17 getDistinct Read Get distinct values (basic) getDistinct
18 getDistinct Read Get distinct values with filter getDistinctWithFilter
19 getRawDataAll Read Get raw data (all columns, no filter) getRawDataAll
20 getRawDataAll Read Get raw data (all columns, with filter) getRawDataAllFilter
21 getRawDataByColumn Read Get raw data for specific column getRawDataByColumn
22 getBatchResultSet Read Batch read (regular tables) getBatchResultSetRegularTable
23 getBatchResultSet Read Batch read (time-series tables) getBatchResultSetTimeSeriesTable
24 getFirst Metadata Get first timestamp getFirst
25 getLast Metadata Get last timestamp getLast
26 syncBackendDb Backend Sync Berkley DB to disk
27 closeBackendDb Backend Close Berkley DB

Return to Contents

Performance

Table 24. Test environment

Category Details
Processor AMD Ryzen 5 5600H with Radeon Graphics, 3301 Mhz, 6 Core(s), 12 Logical Processor(s)
RAM 16.0 GB
Disk Generic Flash Disk USB Device
- SAMSUNG MZVL2512HCJQ-00B00 (476.94 GB)
OS Microsoft Windows 11
Test data New York taxi orders
SQL query to get statistics for partition 2016
WITH 
2016_parts AS (
    SELECT sum(bytes) AS bytes
    FROM system.parts
    WHERE table = 'trips_mergetree'
      AND database = 'datasets'
      AND active
      AND partition LIKE '2016%'
)
SELECT
    (SELECT count() FROM datasets.trips_mergetree WHERE toYear(pickup_datetime) = 2016) AS exact_row_count,
    formatReadableSize((SELECT bytes FROM 2016_parts)) AS total_size,
    formatReadableSize((SELECT bytes FROM 2016_parts) / exact_row_count) AS avg_row_size;

Table 25. Data for 2016

exact_row_count total_size avg_row_size
78325655 6.88 GiB 94.37 B

Table 26. Load profiles

Value\Profile № 1 № 2 № 3 № 4
TType TIME_SERIES TIME_SERIES TIME_SERIES TIME_SERIES
IType GLOBAL LOCAL LOCAL LOCAL
AType ON_LOAD ON_LOAD FULL_PASS_ONCE FULL_PASS_EACH
Compression true true true true
Load (min) 31 min 9 sec 23 min 28 sec 23 min 36 sec 23 min 31 sec
Size (GB) 9,015 11,993 11,331 12,382
Avg Rows/Sec 41 891 55 620 55 314 55 499
Avg MB/Sec 5,28 7,02 6,98 7,00

Table 27. Performance tests for gantt API (count)

Test name № 1 № 2 № 3 № 4
1 getGanttRawRaw 13,5 / 9,0 13,7 / 9,0 14,3 / 9,3 13,4 / 8,9
2 getGanttEnumEnum 8,5 / 4,3 11,3 / 6,9 11,2 / 7,0 11,0 / 7,0
3 getGanttHistHist 7,5 / 3,9 13,5 / 8,5 13,3 / 8,6 13,1 / 8,5
4 getGanttHistRaw 9,6 / 6,2 11,3 / 7,9 11,0 / 7,7 10,9 / 7,9
5 getGanttHistEnum 6,9 / 3,7 12,7 / 8,5 12,4 / 8,5 12,3 / 8,6
6 getGanttEnumRaw 9,8 / 6,6 13,8 / 9,8 14,0 / 9,8 13,7 / 9,7
7 getGanttEnumHist 6,8 / 4,1 17,0 / 10,9 15,0 / 10,5 15,0 / 10,4
8 getGanttRawHist 9,8 / 6,4 14,4 / 9,8 14,2 / 10,0 13,9 / 9,8
9 getGanttRawEnum 9,9 / 6,1 14,3 / 9,3 14,0 / 9,4 14,2 / 9,4

Table 28. Performance tests for gantt API (sum)

Test name № 1 № 2 № 3 № 4
1 getGanttSumHistRaw 5,5 7,4 7,2 7,2
2 getGanttSumEnumRaw 6,5 11,2 10,8 11,2
3 getGanttSumRawRaw 9,3 10,0 9,8 10,1
4 getGanttSumHistHist 4,8 6,8 6,6 6,6
5 getGanttSumEnumEnum 5,1 10,0 9,9 10,1
6 getGanttSumRawEnum 8,4 8,7 8,6 8,9
7 getGanttSumHistRawWithFilter 6,9 9,6 9,6 9,7

Table 29. Performance tests for stacked API

Test name № 1 № 2 № 3 № 4
1 stackedHist 5,4 6,1 8,8 6,1
2 stackedHistDate 0,1 0,1 0,1 0,1
3 stackedEnum 5,0 10,3 8,9 8,9
4 stackedEnumDate 0,1 0,1 0,1 0,1
5 stackedRaw 7,8 7,8 7,9 8,0
6 stackedRawDate 0,1 0,1 0,1 0,1

Table 30. Queries Table

Test name SQL query
Gantt API (Count)
1 getGanttRawRaw SELECT pickup_cdeligibil, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, vendor_id;
2 getGanttHistRaw SELECT trip_type, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, vendor_id;
3 getGanttRawEnum SELECT pickup_cdeligibil, cab_type, COUNT(cab_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, cab_type;
4 getGanttRawHist SELECT pickup_cdeligibil, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, pickup_boroname;
5 getGanttEnumEnum SELECT dropoff_puma, dropoff_borocode, COUNT(dropoff_borocode) FROM datasets.trips_mergetree where toYear(pickup_date) = 2016 group by dropoff_puma, dropoff_borocode;
6 getGanttEnumHist SELECT dropoff_boroname, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, pickup_boroname;
7 getGanttEnumRaw SELECT dropoff_boroname, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, vendor_id;
8 getGanttHistEnum SELECT trip_type, dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, dropoff_boroname;
9 getGanttHistHist SELECT trip_type, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, pickup_boroname;
Gantt API (Sum)
10 getGanttSumHistRaw SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;
11 getGanttSumEnumRaw SELECT pickup_boroname, SUM(trip_distance) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_boroname;
12 getGanttSumRawRaw SELECT vendor_id, SUM(total_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id;
13 getGanttSumHistHist SELECT trip_type, SUM(passenger_count) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;
14 getGanttSumEnumEnum SELECT dropoff_boroname, SUM(dropoff_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname;
15 getGanttSumRawEnum SELECT pickup_cdeligibil, SUM(pickup_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil;
16 getGanttSumWithFilter SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 AND trip_type IN (1, 2) GROUP BY trip_type;
Stacked API
17 stackedHist SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type;
18 stackedHistDate SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY trip_type;
19 stackedEnum SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname;
20 stackedEnumDate SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY dropoff_boroname;
21 stackedRaw SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id;
22 stackedRawDate SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY vendor_id;

Download

  • Building and installing Dimension DB from source codes to a local Maven repository is described in the Building the project section.

Return to Contents

Development

  • If you find an error in the code or have suggestions for improvement, create a ticket;
  • Before starting work, you must check the project build and successful completion of unit tests according to the instructions in the Building the project section;
  • Integration tests are used to check the correctness and performance of the Dimension DB API. For more details, see Preparing the Environment to Run Integration Tests and Code Use Cases.

Return to Contents

Documentation

EN RU
README in English README на русском

Return to Contents

License

License Code released under the Apache License Version 2.0

Return to Contents

Support

Created with support of "Innovation Promotion Fund".

Return to Contents

Contact

Created by @akardapolov - feel free to contact me!

Return to Contents

About

Hybrid time-series and block-column storage database engine written in Java

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages