Benchmarking and Performance

Badger uses an LMDB database. This means that lookups can be performed at lightning speed, whereas other operations are inevitably far more complex to execute, and are effectively constrained by the amount of memory available in the system for optimisation. The benchmark below is based on the fr_sncf_ter dataset, which contains numerous issues. These can be addressed, but the time required depends heavily on the chosen approach. The table represents a baseline: “if you need to scan for all values, this is the cost when done single-threaded”.

One of our most expensive operations involves (batch) updates — for example, repairing references. This process requires the deletion and recreation of the references, inverse references, embedding, and inverse embedding. If we wouldn't delete, just overwriting would work, but would result into lingering entries that do not exist.

Since LMDB is not a relational database, we cannot simply search for any arbitrary value in a secondary index. We have therefore developed a method that starts from the outward reference (where the computed key is known), then traverses to the inward reference of the referenced objects. From there, all inward reference values are retrieved, deserialised, and compared to match the class, id, and version.

Deleting a single record from the reference table alone already costed 600 ms per object. Considering that deserialising all objects in the entire database takes 21 seconds, it follows that when updating more than 35 objects, a full table scan might be preferable.

Through targeted benchmarking, we have reduced the 600 ms per-object cost to 60 ms. Under this improvement, the threshold where a full table scan becomes worth considering rises to around 350 objects per batch. Using the benchmark data below, and with this improvement, 200 records take roughly 14 seconds — approximately 68 ms per record.

Even if the per-record operation time could be further reduced, there will always be a sweet spot where the batch operation achieves stable performance and outpaces “smart lookups”. In the current implementation, that point is likely somewhere around 235 objects.

Looking at the full table below, one might wonder: in a significant batch operation, what if we simply discarded all entries and recomputed them? We have benchmarked this version as well. 126519 objects took 367 seconds to be processed, averaging to 3 ms per object.

LMDB Benchmark Results

Database	Entries	Time (s)
Connection	322	0.0509
CoupledJourney	125	0.0072
DayType	904	0.0347
DayTypeAssignment	904	0.0470
DefaultConnection	4	0.0008
DestinationDisplay	652	0.0800
JourneyPart	455	0.0451
JourneyPartCouple	125	0.0117
Line	556	0.1967
Network	13	0.0068
Operator	3	0.0012
PassengerStopAssignment	3461	0.4613
Route	15723	6.8190
RouteLink	34376	1.5615
RoutePoint	7673	0.3293
ScheduledStopPoint	3461	0.4111
ServiceJourney	22479	8.5755
ServiceJourneyPattern	5659	1.4576
StopPlace	3461	0.7745
TopographicPlace	2773	0.1798
TrainNumber	22479	0.5661
TypeOfService	3	0.0007
UicOperatingPeriod	904	0.0326
ValidBetween	1	0.0006
ValueSet	3	0.0010
Total:	126519	21.6529

Metadata

Database	Entries	Time (s)
_embedding	763263	6.4301
_embedding_inverse	763263	8.8429
_referencing	959927	12.6878
_referencing_inwards	959927	11.4773

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking and Performance

LMDB Benchmark Results

Metadata

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally