Hybrid time-series and block-column storage database engine written in Java
- Program Description
- General Information
- Application Areas
- Minimum Technical Requirements
- Getting Started with the Project
- Description of the logical structure
- DB architecture
- Algorithms
- Program API
- API sequence diagram
- Used methods
- DStore interface
- Dimension DB class
- Parameters
- Input parameters
- Output parameters
- Code use cases
- Performance
- Download
- Development
- Documentation
- License
- Support
- Contact
Dimension DB is a hybrid columnar database for storing time series data. Written in the Java programming language.
Key Features:
- Columnar data storage format;
- Storage of time series data;
- Storage format with built-in support for data deduplication and compression;
- Setting the storage format at the table and block level;
- Automatic selection of the deduplication algorithm based on the collected data;
- Use of the Berkley DB database as a backend for local storage;
- Support for basic JDBC data types for PostgreSQL, Oracle, MS SQL and ClickHouse databases;
- In read-only mode, automatic generation of SQL queries to tables with time series data;
The library can be used as a lightweight analytical database for backend data processing systems on the Java platform to solve the following application problems;
- storing collected data and analytical processing of small and medium amounts of data for various embedded Java applications;
- storing and processing Internet of Things (IoT) data on the devices themselves as temporary storage, cache, and other tasks related to local data processing;
- metadata storage for querying external data sources using automatic SQL generation based on the data model for analytical applications.
Dimension DB is compatible with Java 25+ and comes with a small set of dependencies.
Table 1. Hardware requirements
| Parameter | Description |
|---|---|
| CPU and RAM | Processor with a frequency of at least 500 MHz and a memory capacity of at least 250 MB, depending on the volume of processed data |
| CPU architecture | Intel 64-bit (x86_64), AMD 64-bit (x86_64), Arm 64-bit (aarch64) |
| Disk | Disk size depending on the volume of processed data |
Table 2. Software requirements
| Software | Requirements |
|---|---|
| Java | Java version 25+ |
| Maven | Not lower than version 3 |
| Git | Latest current version |
Table 3. Operating system requirements
| Software | Requirements |
|---|---|
| Operating system | Windows, Linux, MacOS |
Make sure you have JDK 25+, Maven and the latest Git
java -version
mvn -version
git --versionDownload the Dimension DB repository sources:
git clone <<url>>
cd Dimension DBTo compile the project sources, run:
mvn clean compileTo run unit tests, run:
mvn clean testTo build the project into a Dimension DB jar file, run:
mvn clean packageTo build and install the Dimension DB jar file into your local Maven repository, run:
mvn clean installNotes:
- Building the project may include running unit tests from Dimension DBRunnerTest and DBaseRunnerUseCasesTest, which take approximately 25 minutes to complete.
- The majority of this time is consumed by the DimensionDBRunnerTest class, which runs 45 tests with various configurations.
- Both Dimension DBRunnerTest and DBaseRunnerUseCasesTest are disabled by default using the @Disabled annotation.
- The DimensionDBRunnerTest class contains tests with different permutation levels, as detailed below:
Table 4. Parameters and statistics for Dimension DBRunnerTest tests
| Test name | PermutationState | Number of configurations | Total tests | Execution time |
|---|---|---|---|---|
| testPermutationNone | NONE | 108 | 4 860 | ~ 9 seconds |
| testPermutationPartial | PARTIAL | 1 944 | 87 480 | ~ 2 minutes |
| testPermutationAll | ALL | 26 244 | 1 180 980 | ~ 25 minutes |
- To reduce build time by skipping these tests, use the following Maven commands:
# Skip DimensionDBRunnerTest during testing phase
mvn clean test -Dtest=!DimensionDBRunnerTest
# Skip during packaging
mvn clean package -Dtest=!DimensionDBRunnerTest
# Skip during installation
mvn clean install -Dtest=!DimensionDBRunnerTestTo use the Dimension DB database in a Java application, you need to add the jar file as a dependency to the Maven pom.xml settings file
Add the Dimension DB jar file as a dependency to your project's pom.xml settings file:
<?xml version="1.0" encoding="UTF-8"?>
<project>
<properties>
<dimension-db-version>25.7.1</dimension-db-version>
</properties>
<!--.....-->
<!--.....-->
<!--.....-->
<dependencies>
<dependency>
<groupId>ru.dimension</groupId>
<artifactId>db</artifactId>
<version>{Dimension DB-version}</version>
</dependency>
</dependencies>
</project>Notes:
- To build and install the latest current version of the Dimension DB library in the local Maven repository, download the application source codes and run:
mvn clean installDetails on building and installing the project are described in the Building the project section
- You can find a full list of Dimension DB use cases in your application in the Code use cases section.
To run unit tests, you need to follow the steps to build the project from the Building the project section
Preparing the environment to run integration tests requires a local database installation using Docker Desktop
Instructions for installing Docker Desktop for Windows and Linux
Install and run a container with PostgreSQL DB
docker run -d --name db \
-e 'POSTGRES_PASSWORD=postgres' \
-p 5432:5432 \
-v pgdata:/var/lib/postgresql/data \
postgres \
postgres -c 'max_connections=200'- Install locally repository of scripts for installing Oracle DB
- Download the DB distribution from the Oracle website. For the Linux x86 platform and DB version 19.3, copy the LINUX.X64_193000_db_home.zip file to the local directory of the OracleDatabase/SingleInstance/dockerfiles/19.3.0 script repository
- Run the buildContainerImage.sh script from the script repository and specify the parameters (example for DB version 19.3.0)
OracleDatabase/SingleInstance/dockerfiles/buildContainerImage.sh -v 19.3.0 -e -i
- Install and run a container with an Oracle DB instance from the command line (example of running the Docker run command for DB version 19.3.0)
docker run --name database19ee \ -p 1523:1521 -p 5522:5500 -e \ ORACLE_SID=orcl -e ORACLE_PDB=pdb_orcl -e ORACLE_PWD=sys \ -e ORACLE_CHARACTERSET=AL32UTF8 \ -v /opt/oracle/oradata oracle/database:19.3.0-ee
Install and run container with MS SQL DB
docker run -e 'ACCEPT_EULA=Y' \
-e 'SA_PASSWORD=QOEfSsa51234!' \
-p 1433:1433 \
-v |(pwd)/drive:/var/opt/mssql/data \
--name mssql \
mcr.microsoft.com/mssql/serverInstall and run container with Clickhouse DB
docker run -d --name clickhouse-server \
-e CLICKHOUSE_USER=admin -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 -e CLICKHOUSE_PASSWORD=admin \
-p 8123:8123 -p 9000:9000 \
-v C:\clickhouse:/var/lib/clickhouse clickhouse/clickhouse-server- Install Clickhouse locally and check the connection to ClickHouse on a local PC using the URL "jdbc:clickhouse://localhost:8123" or use another connection string and make changes to the tests;
- Load test data New York taxi orders into a local ClickHouse instance (the format of pre-prepared partitions loaded into the datasets schema is used);
- Check that the data has loaded
clickhouse-client --query "select count(*) from datasets.trips_mergetree"; - Create a directory for local storage of Dimension DB data (example for Windows) "C:\Users\.temp";
- Load test data into the local Dimension DB database using any of the methods in Dimension DBCHLoadDataTest marked with the @Test annotation; – Run integration tests in Dimension DBCHQueryGanttTest, Dimension DBCHQueryStackedTest or Dimension DBCHQueryRawTest which are marked with the @Test annotation.
Notes:
- Integration tests use the @Disabled annotation to disable them when running Dimension DB unit tests, if necessary, it should be removed for correct data loading and testing.
- Test data from the trips_mergetree table is loaded into the datasets schema
CREATE DATABASE datasets; - Trips_mergetree table definition
CREATE TABLE trips_mergetree
CREATE TABLE trips_mergetree (
trip_id UInt32,
vendor_id Enum8('1' = 1, '2' = 2, 'CMT' = 3, 'VTS' = 4, 'DDS' = 5, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14),
pickup_date Date,
pickup_datetime DateTime,
dropoff_date Date,
dropoff_datetime DateTime,
store_and_fwd_flag UInt8,
rate_code_id UInt8,
pickup_longitude Float64,
pickup_latitude Float64,
dropoff_longitude Float64,
dropoff_latitude Float64,
passenger_count UInt8,
trip_distance Float64,
fare_amount Float32,
extra Float32,
mta_tax Float32,
tip_amount Float32,
tolls_amount Float32,
ehail_fee Float32,
improvement_surcharge Float32,
total_amount Float32,
payment_type_ Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
trip_type UInt8,
pickup FixedString(25),
dropoff FixedString(25),
cab_type Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
pickup_nyct2010_gid UInt8,
pickup_ctlabel Float32,
pickup_borocode UInt8,
pickup_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
pickup_ct2010 FixedString(6),
pickup_boroct2010 FixedString(7),
pickup_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
pickup_ntacode FixedString(4),
pickup_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
pickup_puma UInt16,
dropoff_nyct2010_gid UInt8,
dropoff_ctlabel Float32,
dropoff_borocode UInt8,
dropoff_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5),
dropoff_ct2010 FixedString(6),
dropoff_boroct2010 FixedString(7),
dropoff_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2),
dropoff_ntacode FixedString(4),
dropoff_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195),
dropoff_puma UInt16
) ENGINE = MergeTree() ORDER BY (pickup_date, pickup_datetime);
Mermaid flow chart
flowchart TD
Client{Clients}
Client -->|Write Request| DStore_Write_Interface[Write API]
Client -->|Read Request| DStore_Read_Interface[Read API]
Client -->|Metadata Request| MetadataAPI[Metadata API]
DStore_Write_Interface --> |Direct Insert| LocalBerkleyDB[(Local Berkley DB)]
DStore_Write_Interface --> |Fetch Data| ExternalDB_Write[(External DBs JDBC)]
ExternalDB_Write --> LocalBerkleyDB
DStore_Read_Interface --> |Local Read| LocalBerkleyDB
DStore_Read_Interface --> |External Read| ExternalDB_Read[(External DBs JDBC)]
MetadataAPI --> MetadataStore[Metadata file]
Three data storage formats are supported:
- RAW - when data is stored as a key of the Java type int for string data and a suitable Java type for the rest (numeric, special, etc.);
- ENUM - when data is stored as a key of the Java type byte value;
- HISTOGRAM - when the start and end index of repeating data from the range and the value itself are stored.
Metadata of the storage format, indexing type and compression are stored in the block header.
The main methods of the Dimension DB Java API are located in the DStore interface. Access to DStore must be obtained after initialization of the Dimension DB by calling the constructor of the Dimension DB class, examples are presented in the DB initialization algorithm section.
A complete list of methods of the main classes and interfaces with input and output parameters is posted in the Used methods section.
- Define a local directory for storing Dimension DB settings;
- Create the Dimension DB DB configuration Dimension DBConfig;
- Initialize the backend DB;
- Initialize the Dimension DB by calling the constructor from the Dimension DB class;
- Get a reference to the DStore interface from the Dimension DB class to access the Dimension DB Java API.
Example of DB initialization for Berkley DB backend
@TempDir
static File databaseDir;
...
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());
BerkleyDB berkleyDB = new BerkleyDB(databaseDir.getAbsolutePath(), true);
Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, berkleyDB.getStore());
DStore fStore = Dimension DB.getDStore();
Example of DB initialization for ClickHouse backend
@TempDir
static File databaseDir;
...
BType bType = BType.CLICKHOUSE;
String driverClassName = "com.clickhouse.jdbc.ClickHouseDriver";
String dbUrl = "jdbc:clickhouse://localhost:8123";
SqlDB sqlDB = new SqlDB(bType, driverClassName, dbUrl, "admin", "admin");
BasicDataSource basicDataSource = sqlDB.getDatasource();
Dimension DBConfig Dimension DBConfig = new Dimension DBConfig().setConfigDirectory(databaseDir.getAbsolutePath());
Dimension DB Dimension DB = new Dimension DB(Dimension DBConfig, bType, basicDataSource);
DStore fStore = Dimension DB.getDStore();
Notes:
- The temporary directory databaseDir is created automatically using the @TempDir annotation of the JUnit library;
- For the example with the Berkley DB backend, Dimension DB settings and backend data are located in the same temporary directory databaseDir;
- Code examples are taken from unit and integration tests.
Dimension DB offers advanced capabilities for interacting with the database through a set of methods capable of performing various data storage and retrieval tasks. These APIs are specifically designed to provide flexibility and optimal work with time series data in a Java environment for both locally stored data and external data storage systems with an SQL interface.
-
Main API. The main functions for working with data are built into the DStore interface, which provides standardized access to write and read methods. DStore is a key element of the Dimension DB architecture, providing an interface point for interacting with the database for client applications.
-
Writing data. Several approaches are supported for writing information to the Dimension DB database:
- Direct - inserting data into a local database using an intermediate Java table-type data structure.
- JDBC — insert data into a local database using data received from an external data source via JDBC, which allows integration with other systems that can work with JDBC.
- JDBC Batch — batch loading into a local database using data received from an external data source via JDBC, minimizing the number of transactions for large-scale insertion.
-
Reading data. The process of extracting data from Dimension DB uses APIs, each of which is optimized for the corresponding types of queries:
- Stacked — retrieving data based on aggregation functions such as COUNT SUM and AVG, obtaining information on the distribution of data by selected attributes and parameters.
- Gantt — complex distribution calculation using COUNT aggregation at two levels, which is useful for analyzing dependencies and relationships between different indicators.
- Raw — extracting source data in tabular format to view detailed information on the entire array of collected data.
- BatchResultSet — extracting source data in tabular format with row-by-row access to get only part of the rows from the result set in the DB.
-
Loading metadata. To work with external data sources, you need to perform queries to sources via JDBC or Java table structures. To store information on tables and data types used for local storage and access to external data sources via JDBC, an API is used to load and view this information in a local Dimension DB metadata storage file.
The APIs for writing data only work with the local Berkley DB key-value data storage. API methods for reading data can work with both local Berkley DB storage and external data sources via JDBC using automatic generation of SQL queries.
Table 5. Supported databases for working with API in Dimension DB
| № | Dimension DB API support | Database | Database type |
|---|---|---|---|
| 1 | Read/Write | Berkley DB | key-value |
| 2 | Read | ClickHouse | analytical |
| 3 | Read | Oracle | relational |
| 4 | Read | PostgreSQL | relational |
| 5 | Read | Microsoft SQL Server | relational |
| 6 | Read | MySQL | relational |
Before you start writing data, you need to set storage parameters and metadata for tables and columns in the SProfile object.
The database table configuration allows you to switch between global and local indexing, and enable or disable data compression on the fly. This is achieved by placing storage type metadata in the block header for both indexing types and a flag to enable or disable compression.
Mermaid sequence diagram
sequenceDiagram
participant Client
participant DStore_Write as Write API
participant DStore_Read as Read API
participant MetadataAPI as Metadata API
participant LocalDB as Local Berkley DB Storage
participant ExternalDB as External DBs via JDBC
Client->>DStore_Write: Write Request
DStore_Write->>MetadataAPI: Fetch Metadata
MetadataAPI->>DStore_Write: Return Metadata
rect rgba(200, 200, 255, 0.1)
DStore_Write->>LocalDB: Direct Insert
DStore_Write->>ExternalDB: Fetch Data
ExternalDB->>DStore_Write: Return Data
DStore_Write->>LocalDB: Insert from ExternalDB
end
Client->>DStore_Read: Read Request
DStore_Read->>MetadataAPI: Fetch Metadata
MetadataAPI->>DStore_Read: Return Metadata
rect rgba(200, 255, 200, 0.1)
DStore_Read->>LocalDB: Local Read
DStore_Read->>ExternalDB: Fetch Data
ExternalDB->>DStore_Read: Return Data
end
Client->>MetadataAPI: Read Metadata Request
MetadataAPI->>Client: Return Metadata
- The DStore interface contains the main contracts and method signatures for interaction between the Dimension DB database and client applications;
- Initialization of database objects occurs when calling the Dimension DB class constructor;
- The reference to the DStore interface for accessing the application API is called from the Dimension DB class object.
Table 6. List of APIs for working with the Dimension DB database of the DStore interface
| # | Method Name | API Type | Description | Input Parameters | Output Parameters |
|---|---|---|---|---|---|
| 1 | getTProfile | Metadata | Get table metadata by name | String tableName | TProfile |
| 2 | loadDirectTableMetadata | Metadata | Load metadata from SProfile | SProfile sProfile | TProfile |
| 3 | loadJdbcTableMetadata | Metadata | Load metadata via JDBC connection with query | Connection connection, String query, SProfile sProfile | TProfile |
| 4 | loadJdbcTableMetadata | Metadata | Load metadata via JDBC (schema and table) | Connection connection, String sqlSchemaName, String sqlTableName, SProfile sProfile | TProfile |
| 5 | setTimestampColumn | Metadata | Set timestamp column | String tableName, String timestampColumnName | void |
| 6 | putDataDirect | Write | Save data using Java structure | String tableName, List<List> data | void |
| 7 | putDataJdbc | Write | Save data from JDBC ResultSet | String tableName, ResultSet resultSet | long (last row timestamp) |
| 8 | putDataJdbcBatch | Write | Batch save from JDBC ResultSet | String tableName, ResultSet resultSet, Integer batchSize | void |
| 9 | getBlockKeyTailList | Read | Get blocks with keys and ranges | String tableName, long begin, long end | List |
| 10 | getStacked | Read | Get aggregated data | String tableName, CProfile cProfile, GroupFunction groupFunction, CompositeFilter compositeFilter, long begin, long end | List |
| 11 | getGanttCount | Read | Two-level COUNT grouping | String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end | List |
| 12 | getGanttCount | Read | Two-level COUNT grouping (multithreaded) | String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, int batchSize, long begin, long end | List |
| 13 | getGanttSum | Read | Two-level SUM grouping | String tableName, CProfile firstGrpBy, CProfile secondGrpBy, CompositeFilter compositeFilter, long begin, long end | List |
| 14 | getDistinct | Read | Get distinct values | String tableName, CProfile cProfile, OrderBy orderBy, CompositeFilter compositeFilter, int limit, long begin, long end | List |
| 15 | getRawDataAll | Read | Get raw data (all columns, no filter) | String tableName, long begin, long end | List<List> |
| 16 | getRawDataAll | Read | Get raw data (all columns, with filter) | String tableName, CProfile cProfileFilter, String filter, long begin, long end | List<List> |
| 17 | getRawDataByColumn | Read | Get raw data for specific column | String tableName, CProfile cProfile, long begin, long end | List<List> |
| 18 | getBatchResultSet | Read | Batch read (regular tables) | String tableName, int fetchSize | BatchResultSet |
| 19 | getBatchResultSet | Read | Batch read (time-series tables) | String tableName, long begin, long end, int fetchSize | BatchResultSet |
| 20 | getFirst | Metadata | Get first timestamp | String tableName, long begin, long end | long |
| 21 | getLast | Metadata | Get last timestamp | String tableName, long begin, long end | long |
| 22 | syncBackendDb | Backend | Sync Berkley DB to disk | - | void |
| 23 | closeBackendDb | Backend | Close Berkley DB | - | void |
Table 7. DBase class where DB objects are initialized
| Item # | Constructor name | API type | Description | Input parameters | Output parameters |
|---|---|---|---|---|---|
| 1 | DBase | Read/Write | Create Dimension DB storage for Berkley DB backend | Dimension DBConfig Dimension DBConfig - local Dimension DB storage configuration, EntityStore - Berkley DB data storage |
Constructor |
| 2 | DBase | Read | Create Dimension DB storage for a specific backend type | Dimension DBConfig Dimension DBConfig - Dimension DB local storage configuration, BType backendType - backend type for reading, BasicDataSource basicDataSource - Apache DBCP2 JDBC connection pool object |
Constructor |
Table 8. Enum TType for storing supported table types
| Property name | Type | Description |
|---|---|---|
| REGULAR | TType | Table for storing regular tables |
| TIME_SERIES | TType | Table for storing time series data |
Table 9. Enum IType for storing table indexing type
| Property name | Type | Description |
|---|---|---|
| GLOBAL | IType | The type of index used (RAW, HISTOGRAM, ENUM), defined by the user at the table level for each column in the SProfile input parameters |
| LOCAL | IType | The type of index used (RAW, HISTOGRAM, ENUM), determined automatically by the system at the data block level for each table column |
Table 10. Enum AType for storing the data analysis method for local indexing
| Имя свойства | Тип | Описание |
|---|---|---|
| ON_LOAD | AType | Data is analyzed upon initial retrieval, and the indexing type for each column in the table is determined based on this analysis |
| FULL_PASS_ONCE | AType | Data is analyzed upon initial retrieval, then on subsequent runs, each column of the table is sequentially analyzed once, and the indexing type is determined based on this information. |
| FULL_PASS_EACH | AType | Same as FULL_PASS_ONCE, except each column of the table is analyzed every time the data write API is called. Each API call analyzes one column, cycling through them in order. |
Table 11. Enum BType for storing supported DB types used as a backend for the API for reading and writing data
| Property name | Type | Description |
|---|---|---|
| BERKLEYDB | BType | BerkleyDB |
| CLICKHOUSE | BType | ClickHouse |
| ORACLE | BType | Oracle |
| POSTGRES | BType | PostgreSQL |
| MSSQL | BType | Microsoft SQL Server |
Table 12. Enum GroupFunction for passing the grouping function when calling Stacked API
| Property name | Type | Description |
|---|---|---|
| COUNT | GroupFunction | Calculating the number of occurrences of unique indicators in the data by the cProfile column |
| SUM | GroupFunction | Calculating the sum in the data by the cProfile column |
| AVG | GroupFunction | Calculating the average value in the data by the cProfile column |
Table 13. CSType class for storing storage parameters and data types of table columns
| Property name | Type | Default value | Description |
|---|---|---|---|
| isTimeStamp | boolean | false | Flag indicating the column that stores the timestamp |
| sType | SType | null | Data storage type in the table (RAW, HISTOGRAM, ENUM) |
| cType | CType | null | Java data type used to store data in Berkley DB (Int, Long, Double, String ... etc.) |
| dType | DataType | null | JDBC data type used to convert data to Berkley DB Java type (INTEGER, DATE, DATETIME, VARCHAR ... etc.) |
Table 14. SType class for storing the table data storage type used in a column or block
| Property name | Type | Default value | Description |
|---|---|---|---|
| RAW | SType | 1 | Data as is or as a key for string data types |
| HISTOGRAM | SType | 2 | Data as a histogram |
| ENUM | SType | 3 | Data as a list of constants |
Table 15. CType class for storing the Java data type used to store data in the Berkley DB table column
| Property name | Type | Default Value | Description |
|---|---|---|---|
| BYTE | int | 1 | Represents a byte |
| INT | int | 2 | Represents an integer |
| LONG | int | 3 | Represents a long integer |
| FLOAT | int | 4 | Represents a float |
| DOUBLE | int | 5 | Represents a double-precision float |
| STRING | int | 6 | Represents a string |
Table 16. DataType class for storing JDBC data types, which is used to convert data to Java storage format in Berkley DB database of Dimension DB database table column
| Property name | Type | Default value | Description |
|---|---|---|---|
| BIT | byte | 0 | Postgres: bit data type |
| ... | ... | ... | ... |
| RAW | byte | 23 | Oracle: raw data |
| ... | ... | ... | ... |
| INT | byte | 31 | MS SQL: integer |
| ... | ... | ... | ... |
| DEC | byte | 47 | ClickHouse: Decimal Numbers |
Full list of properties in file
Table 17. Dimension DBaseConfig class for storing settings
| Property name | Type | Default value | Description |
|---|---|---|---|
| configDirectory | String | Absolute path to the directory for storing the configuration file with Dimension DB metadata | |
| configFileName | String | "metamodel.obj" | Name of the configuration file with Dimension DB metadata |
Table 18. SProfile class for storing table/column settings in the Dimension DB database (used in input parameters)
| Property name | Type | Default value | Description |
|---|---|---|---|
| tableName | String | null | Table name |
| tableType | TType | TIME_SERIES | Table type |
| indexType | IType | GLOBAL | Index type |
| backendType | BType | BERKLEYDB | Backend type for storing data |
| compression | Boolean | FALSE | Whether to use data compression in the table or not |
| csTypeMap | Map<String, CSType> | null | Key-value structure mapping column names to CSTypes |
Table 19. TProfile Class for Getting Table/Column Settings from Metadata (Used in Output Parameters)
| Property Name | Type | Default Value | Description |
|---|---|---|---|
| tableName | String | null | Table Name |
| tableType | TType | null | Table Type |
| indexType | IType | null | Index Type |
| backendType | BType | null | Backend Type |
| compression | Boolean | false | Whether data compression is enabled |
| cProfiles | List | null or empty list | List of table column settings |
Table 20. CProfile Class for Storing Table Column Settings
| Property Name | Type | Default Value | Description |
|---|---|---|---|
| colId | int | 0 | Unique Identifier |
| colIdSql | int | 0 | Column Identifier in SQL |
| colName | String | null | Column Name |
| colDbTypeName | String | null | Data Type Name in DB |
| colSizeDisplay | int | 0 | Size for display |
| colSizeSqlType | int | 0 | SQL Type Size |
| csType | CSType | null | The CSType type stores the storage parameters and data types of a column |
Timestamps store the date in milliseconds that have passed since January 1, 1970 - Unix-time
Table 21. StackedColumn class for get data aggregates from stacked API
| Property name | Type | Default value | Description |
|---|---|---|---|
| key | long | N/A | Object identifier - timestamp of the beginning of the range by which the values of the aggregates in keyCount, keySum, keyAvg are calculated |
| tail | long | 0 | End timestamp of the range over which the aggregate values in keyCount, keySum, keyAvg are calculated |
| keyCount | Map<String, Integer> | Empty map | Key-value structure for storing the number of occurrences of each indicator in the cProfile table column data when calling the Stacked API (for numeric and symbolic data types) |
| keySum | Map<String, Double> | Empty map | Key-value structure for storing the sum of values for each key (only for numeric data types) |
| keyAvg | Map<String, Double> | Empty map | Key-value structure for storing the average value for each key (only for numeric data types) |
Table 22a. GanttColumnCount Class for Two-Level COUNT Aggregation Results
| Property name | Type | Default value | Description |
|---|---|---|---|
| key | String | N/A | Key, stores a unique metric in the firstGrpBy table column data when calling the Gantt API |
| gantt | Map<String, Integer> | Empty map | Key-value structure, for storing the number of occurrences of each unique value from secondGrpBy with the key filter by the data from firstGrpBy when calling the Gantt API |
Table 22b. GanttColumnSum Class for Two-Level SUM Aggregation Results
| Property name | Type | Default value | Description |
|---|---|---|---|
| key | String | N/A | The unique value from the first grouping level column |
| value | Double | 0 | Total sum of all values from the second (numeric) column for this group |
- All code use cases make Dimension DB API calls through the DStore interface;
- A detailed description of the contracts and method signatures of the DStore interface are presented in the Used Methods section;
- Before each call to a method of the DStore interface, it is initialized according to the instructions in the DB Initialization Algorithm section;
- A test case has been developed for each use case in Dimension DBUseCasesCodeTest.
Table 23. Use cases
| # | Method Name | API Type | Description | Use cases |
|---|---|---|---|---|
| 1 | getTProfile | Metadata | Get table metadata by name | getTProfile |
| 2 | loadDirectTableMetadata | Metadata | Load metadata from SProfile | loadDirectTableMetadata |
| 3 | loadJdbcTableMetadata | Metadata | Load metadata via JDBC connection with query | loadJdbcTableMetadata |
| 4 | loadJdbcTableMetadata | Metadata | Load metadata via JDBC (schema and table) | |
| 5 | setTimestampColumn | Metadata | Set timestamp column | setTimestampColumn |
| 6 | putDataDirect | Write | Save data using Java structure | putDataDirect |
| 7 | putDataJdbc | Write | Save data from JDBC ResultSet | putDataJdbc |
| 8 | putDataJdbcBatch | Write | Batch save from JDBC ResultSet | putDataJdbcBatch |
| 9 | getBlockKeyTailList | Read | Get blocks with keys and ranges | getBlockKeyTailList |
| 10 | getStacked | Read | Get aggregated data (COUNT/SUM/AVG) | getStacked |
| 11 | getStacked | Read | Get aggregated data with filter | getStackedCountFilter |
| 12 | getGantt | Read | Two-level COUNT grouping (basic) | getGantt |
| 13 | getGantt | Read | Two-level COUNT grouping (multithreaded) | getGantt |
| 14 | getGantt | Read | Two-level COUNT grouping with filter | getGanttFilter |
| 15 | getGanttSum | Read | Two-level SUM grouping (basic) | getGanttSum |
| 16 | getGanttSum | Read | Two-level SUM grouping with filter | getGanttSumFiltered |
| 17 | getDistinct | Read | Get distinct values (basic) | getDistinct |
| 18 | getDistinct | Read | Get distinct values with filter | getDistinctWithFilter |
| 19 | getRawDataAll | Read | Get raw data (all columns, no filter) | getRawDataAll |
| 20 | getRawDataAll | Read | Get raw data (all columns, with filter) | getRawDataAllFilter |
| 21 | getRawDataByColumn | Read | Get raw data for specific column | getRawDataByColumn |
| 22 | getBatchResultSet | Read | Batch read (regular tables) | getBatchResultSetRegularTable |
| 23 | getBatchResultSet | Read | Batch read (time-series tables) | getBatchResultSetTimeSeriesTable |
| 24 | getFirst | Metadata | Get first timestamp | getFirst |
| 25 | getLast | Metadata | Get last timestamp | getLast |
| 26 | syncBackendDb | Backend | Sync Berkley DB to disk | |
| 27 | closeBackendDb | Backend | Close Berkley DB |
Table 24. Test environment
| Category | Details |
|---|---|
| Processor | AMD Ryzen 5 5600H with Radeon Graphics, 3301 Mhz, 6 Core(s), 12 Logical Processor(s) |
| RAM | 16.0 GB |
| Disk | Generic Flash Disk USB Device - SAMSUNG MZVL2512HCJQ-00B00 (476.94 GB) |
| OS | Microsoft Windows 11 |
| Test data | New York taxi orders |
SQL query to get statistics for partition 2016
WITH
2016_parts AS (
SELECT sum(bytes) AS bytes
FROM system.parts
WHERE table = 'trips_mergetree'
AND database = 'datasets'
AND active
AND partition LIKE '2016%'
)
SELECT
(SELECT count() FROM datasets.trips_mergetree WHERE toYear(pickup_datetime) = 2016) AS exact_row_count,
formatReadableSize((SELECT bytes FROM 2016_parts)) AS total_size,
formatReadableSize((SELECT bytes FROM 2016_parts) / exact_row_count) AS avg_row_size;Table 25. Data for 2016
| exact_row_count | total_size | avg_row_size |
|---|---|---|
| 78325655 | 6.88 GiB | 94.37 B |
Table 26. Load profiles
| Value\Profile | № 1 | № 2 | № 3 | № 4 |
|---|---|---|---|---|
| TType | TIME_SERIES | TIME_SERIES | TIME_SERIES | TIME_SERIES |
| IType | GLOBAL | LOCAL | LOCAL | LOCAL |
| AType | ON_LOAD | ON_LOAD | FULL_PASS_ONCE | FULL_PASS_EACH |
| Compression | true | true | true | true |
| Load (min) | 31 min 9 sec | 23 min 28 sec | 23 min 36 sec | 23 min 31 sec |
| Size (GB) | 9,015 | 11,993 | 11,331 | 12,382 |
| Avg Rows/Sec | 41 891 | 55 620 | 55 314 | 55 499 |
| Avg MB/Sec | 5,28 | 7,02 | 6,98 | 7,00 |
Table 27. Performance tests for gantt API (count)
| № | Test name | № 1 | № 2 | № 3 | № 4 |
|---|---|---|---|---|---|
| 1 | getGanttRawRaw | 13,5 / 9,0 | 13,7 / 9,0 | 14,3 / 9,3 | 13,4 / 8,9 |
| 2 | getGanttEnumEnum | 8,5 / 4,3 | 11,3 / 6,9 | 11,2 / 7,0 | 11,0 / 7,0 |
| 3 | getGanttHistHist | 7,5 / 3,9 | 13,5 / 8,5 | 13,3 / 8,6 | 13,1 / 8,5 |
| 4 | getGanttHistRaw | 9,6 / 6,2 | 11,3 / 7,9 | 11,0 / 7,7 | 10,9 / 7,9 |
| 5 | getGanttHistEnum | 6,9 / 3,7 | 12,7 / 8,5 | 12,4 / 8,5 | 12,3 / 8,6 |
| 6 | getGanttEnumRaw | 9,8 / 6,6 | 13,8 / 9,8 | 14,0 / 9,8 | 13,7 / 9,7 |
| 7 | getGanttEnumHist | 6,8 / 4,1 | 17,0 / 10,9 | 15,0 / 10,5 | 15,0 / 10,4 |
| 8 | getGanttRawHist | 9,8 / 6,4 | 14,4 / 9,8 | 14,2 / 10,0 | 13,9 / 9,8 |
| 9 | getGanttRawEnum | 9,9 / 6,1 | 14,3 / 9,3 | 14,0 / 9,4 | 14,2 / 9,4 |
Table 28. Performance tests for gantt API (sum)
| № | Test name | № 1 | № 2 | № 3 | № 4 |
|---|---|---|---|---|---|
| 1 | getGanttSumHistRaw | 5,5 | 7,4 | 7,2 | 7,2 |
| 2 | getGanttSumEnumRaw | 6,5 | 11,2 | 10,8 | 11,2 |
| 3 | getGanttSumRawRaw | 9,3 | 10,0 | 9,8 | 10,1 |
| 4 | getGanttSumHistHist | 4,8 | 6,8 | 6,6 | 6,6 |
| 5 | getGanttSumEnumEnum | 5,1 | 10,0 | 9,9 | 10,1 |
| 6 | getGanttSumRawEnum | 8,4 | 8,7 | 8,6 | 8,9 |
| 7 | getGanttSumHistRawWithFilter | 6,9 | 9,6 | 9,6 | 9,7 |
Table 29. Performance tests for stacked API
| № | Test name | № 1 | № 2 | № 3 | № 4 |
|---|---|---|---|---|---|
| 1 | stackedHist | 5,4 | 6,1 | 8,8 | 6,1 |
| 2 | stackedHistDate | 0,1 | 0,1 | 0,1 | 0,1 |
| 3 | stackedEnum | 5,0 | 10,3 | 8,9 | 8,9 |
| 4 | stackedEnumDate | 0,1 | 0,1 | 0,1 | 0,1 |
| 5 | stackedRaw | 7,8 | 7,8 | 7,9 | 8,0 |
| 6 | stackedRawDate | 0,1 | 0,1 | 0,1 | 0,1 |
Table 30. Queries Table
| № | Test name | SQL query |
|---|---|---|
| Gantt API (Count) | ||
| 1 | getGanttRawRaw | SELECT pickup_cdeligibil, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, vendor_id; |
| 2 | getGanttHistRaw | SELECT trip_type, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, vendor_id; |
| 3 | getGanttRawEnum | SELECT pickup_cdeligibil, cab_type, COUNT(cab_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, cab_type; |
| 4 | getGanttRawHist | SELECT pickup_cdeligibil, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil, pickup_boroname; |
| 5 | getGanttEnumEnum | SELECT dropoff_puma, dropoff_borocode, COUNT(dropoff_borocode) FROM datasets.trips_mergetree where toYear(pickup_date) = 2016 group by dropoff_puma, dropoff_borocode; |
| 6 | getGanttEnumHist | SELECT dropoff_boroname, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, pickup_boroname; |
| 7 | getGanttEnumRaw | SELECT dropoff_boroname, vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname, vendor_id; |
| 8 | getGanttHistEnum | SELECT trip_type, dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, dropoff_boroname; |
| 9 | getGanttHistHist | SELECT trip_type, pickup_boroname, COUNT(pickup_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type, pickup_boroname; |
| Gantt API (Sum) | ||
| 10 | getGanttSumHistRaw | SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type; |
| 11 | getGanttSumEnumRaw | SELECT pickup_boroname, SUM(trip_distance) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_boroname; |
| 12 | getGanttSumRawRaw | SELECT vendor_id, SUM(total_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id; |
| 13 | getGanttSumHistHist | SELECT trip_type, SUM(passenger_count) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type; |
| 14 | getGanttSumEnumEnum | SELECT dropoff_boroname, SUM(dropoff_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname; |
| 15 | getGanttSumRawEnum | SELECT pickup_cdeligibil, SUM(pickup_borocode) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY pickup_cdeligibil; |
| 16 | getGanttSumWithFilter | SELECT trip_type, SUM(fare_amount) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 AND trip_type IN (1, 2) GROUP BY trip_type; |
| Stacked API | ||
| 17 | stackedHist | SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY trip_type; |
| 18 | stackedHistDate | SELECT trip_type, COUNT(trip_type) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY trip_type; |
| 19 | stackedEnum | SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY dropoff_boroname; |
| 20 | stackedEnumDate | SELECT dropoff_boroname, COUNT(dropoff_boroname) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY dropoff_boroname; |
| 21 | stackedRaw | SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYear(pickup_date) = 2016 GROUP BY vendor_id; |
| 22 | stackedRawDate | SELECT vendor_id, COUNT(vendor_id) FROM datasets.trips_mergetree WHERE toYYYYMMDD(pickup_datetime) = 20160101 GROUP BY vendor_id; |
- Building and installing Dimension DB from source codes to a local Maven repository is described in the Building the project section.
- If you find an error in the code or have suggestions for improvement, create a ticket;
- Before starting work, you must check the project build and successful completion of unit tests according to the instructions in the Building the project section;
- Integration tests are used to check the correctness and performance of the Dimension DB API. For more details, see Preparing the Environment to Run Integration Tests and Code Use Cases.
| EN | RU |
|---|---|
| README in English | README на русском |
Code released under the Apache License Version 2.0
Created with support of "Innovation Promotion Fund".
Created by @akardapolov - feel free to contact me!



