Only submit changes/deltas when processing bulk requests by kiivihal · Pull Request #191 · delving/hub3

kiivihal · 2023-04-25T08:38:56Z

The load on downstream services such as the elasticsearch index and Fuseki triplestore is heavy when processing identical data. This pull request changes the bulk indexing workflow to store the data first in a blob-storage (minio) and keep track of the records and changes in an embedded database (duckdb). After the final call to drop-orphans, the changes are calculated and submitted for processing.

For the SPARQL insert and updates, we remove the 'drop' statement for each named graph, and replace it with a delta insert statement. This deletes specific triples and adds the new ones.

gitguardian · 2023-04-25T08:39:00Z

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	Secret	Commit	Filename
6465341	Generic High Entropy Secret	`edb25ac`	docker-compose.yml	View secret
6465341	Generic High Entropy Secret	`edb25ac`	hub3.toml	View secret
6465341	Generic High Entropy Secret	`b8ae4ce`	hub3.toml	View secret
6753863	Redis Server Password	`81ac24d`	docker-compose.yml	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!}

feat(go.mod): add oklog/ulid v1.3.1 dependency chore(hub3.toml): change minio endpoint port from 9000 to 9010 refactor(hub3/fragments/graph.go): modify Reader method to return the length of the byte array refactor(rdfstream.go): reorganize import statements feat(rdfstream.go): add HubID and OrgID to the fragment metadata in IndexFragments() function refactor(resource.go): fix typo in CreateDateRange error message refactor(resource.go): rename year variable to date in padYears function refactor(resource.go): rename formattedDate variable in padYears function refactor(resource.go): fix typo in hyphenateDate error message refactor(resource.go): rename splitDate function to splitPeriod for clarity refactor(resource.go): improve error messages in SetContextLevels and NewResourceMap functions fix(resource.go): fix typo in error message in SetContextLevels function refactor(sparql.go): add omitempty to SparqlUpdate struct fields feat(config/bulk.go): add StoreRequests field to Bulk struct feat(config/elasticsearch.go): add LogRequests field to BulkConfig struct feat(handle_upload.go): add GetGraph method to Service struct feat(options.go): add SetLogRequests option to Service struct feat(parser.go): add support for logging raw requests in bulk parser service feat(parser.go): add support for storing bulk request to disk for debugging feat(parser.go): add support for storing graphs to MinIO fix(parser.go): fix variable naming inconsistency in setDataSet function refactor(parser.go): remove unused code and comments feat(parser.go): add HubID and OrgID to RDF bulk request fix(parser.go): use IterTriplesOrdered instead of IterTriples to serialize triples in order refactor(service.go): reformat code for better readability feat(service.go): add logRequests boolean option to NewService function

refactor(config): remove unused SparqlUsername and SparqlPassword fields from RDF configuration feat(config): enable storing only changed triples in the triple store feat(bulk): add support for storing RDF data in Redis for delta updates feat(bulk): add support for finding and dropping orphaned graphs in Redis for delta updates refactor(bulk): remove unused SetDBPath option refactor(parser.go): remove unused imports and variables feat(parser.go): add setUpdateDataset and dataset methods to safely access and modify the dataset feat(parser.go): add storeGraphDeltasOld method to be removed later feat(parser.go): add storeGraphDeltas method to store graph deltas in redis and S3 feat(parser.go): add dropGraphOrphans method to drop orphan graphs from redis and triple store feat(parser.go): add incrementRevision method to increment the revision of the dataset feat(parser.go): add process method to process requests and increment revisions feat(parser.go): add stats field to Stats struct to track graphs stored refactor(service.go): remove unused imports and variables feat(service.go): add Redis support to the bulk service to store and retrieve data

… project up to date and reduce clutter chore(go.sum): update dependencies

sonarqubecloud · 2023-05-24T06:10:50Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
6 Code Smells

No Coverage information
0.0% Duplication

kiivihal added 2 commits April 25, 2023 09:40

feat(bulk): added duckdb database to bulk service

c8828b3

feat(bulk): added support for minio blob-storage to bulk service

edb25ac

kiivihal added 5 commits April 25, 2023 12:17

feat(bulk): added support for delta based sparql update queries

b7d7be9

wip(bulk): basic scaffolding for delta storage workflow

f34e8ff

chore(go.mod): update dependencies and remove unused ones to keep the…

ae86dcd

… project up to date and reduce clutter chore(go.sum): update dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only submit changes/deltas when processing bulk requests#191

Only submit changes/deltas when processing bulk requests#191
kiivihal wants to merge 7 commits intomainfrom
feature/bulk-delta-storage

kiivihal commented Apr 25, 2023

Uh oh!

gitguardian bot commented Apr 25, 2023 •

edited

Loading

Uh oh!

sonarqubecloud bot commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kiivihal commented Apr 25, 2023

Uh oh!

gitguardian bot commented Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Uh oh!

sonarqubecloud bot commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitguardian bot commented Apr 25, 2023 •

edited

Loading