SQLite3-backed JSON document database with support for indices and advanced queries.
JSONLiteDB leverages SQLite3 and JSON1 to create a fast JSON document store with easy persistence, indexing capability, and extensible use.
JSONLiteDB provides an easy API with no need to load the entire database into memory, nor dump it when inserting! JSONLiteDB SQLite files are easily usable in other tools with no proprietary formats or encoding. JSONLiteDB is a great replacement for reading a JSON or JSONLines file. Entries can be modified in place. Queries can be indexed for greatly improved query speed and optionally to enforce uniqueness.
Similar tools and inspiration:
- TinyDB. The API and process of TinyDB heavily inspired JSONLiteDB. But TinyDB reads the entire JSON DB into memory and needs to dump the entire database upon insertion. Hardly efficient or scalable and still queries at O(N).
- Dataset is promising but creates new columns for every key and is very "heavy" with its dependencies. As far as I can tell, there is no native way to support multi-column and/or unique indexes. But still, a very promising tool!
- KenobiDB. Came out while JSONLiteDB was in development. Similar idea with different design decisions. Does not directly support advanced queries indexes which can greatly accelerate queries! (Please correct me if I am wrong. I new to this tool)
- DictTable (also written by me) is nice but entirely in-memory and not always efficient for non-equality queries.
From PyPI:
$ pip install jsonlitedb
$ pip install jsonlitedb --upgrade
Or directly from Github
$ pip install git+https://github.com/Jwink3101/jsonlitedb.git
With some fake data.
>>> import os
>>>
>>> os.environ["JSONLITEDB_DISABLE_META"] = "true" # avoid churn
>>>
>>> from jsonlitedb import JSONLiteDB>>> db = JSONLiteDB(":memory:")
>>> # more generally:
>>> # db = JSONLiteDB('my_data.db')Insert some data. Can use insert() with any number of items or insertmany() with an iterable (insertmany([...]) <--> insert(*[...])).
Can also use a context manager (with db: ...) to batch the insertions (or deletions).
>>> db.insert(
>>> {"first": "John", "last": "Lennon", "born": 1940, "role": "guitar"},
>>> {"first": "Paul", "last": "McCartney", "born": 1942, "role": "bass"},
>>> {"first": "George", "last": "Harrison", "born": 1943, "role": "guitar"},
>>> {"first": "Ringo", "last": "Starr", "born": 1940, "role": "drums"},
>>> {"first": "George", "last": "Martin", "born": 1926, "role": "producer"},
>>> )>>> len(db)5
>>> list(db)[{'first': 'John', 'last': 'Lennon', 'born': 1940, 'role': 'guitar'},
{'first': 'Paul', 'last': 'McCartney', 'born': 1942, 'role': 'bass'},
{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'Ringo', 'last': 'Starr', 'born': 1940, 'role': 'drums'},
{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Let's do some simple queries. The default query() returns an iterator so we wrap them in a list.
>>> db.query(first="George").all()[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
>>> # If you only one the first result, you can use db.one().
>>> # On the SQL call, this adds "LIMIT 1"
>>> db.one(first="George", last="Martin"){'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}
>>> # This will also only give you the first row but it is
>>> # less efficient as it doesn't have a "LIMIT 1" clause.
>>> db.query(first="George", last="Martin").one(){'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}
Now let's query with a dictionary to match
>>> # queries return a QueryResult which can be iterated. list(QueryResult) <==> QueryResult.all()
>>> list(db.query({"first": "George"}))[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Multiples are always an AND query
>>> db.query({"first": "George", "last": "Martin"}).all()[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Can do seperate items but it makes no difference.
>>> db.query({"first": "George"}, {"last": "Martin"}).all()[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
>>> db.count(first="George")2
Query objects enable more complex combinations and inequalities. Query objects can be from the database (db.Query or db.Q) or created on thier own (Query() or Q()). They are all the same.
>>> db.query(db.Q.first == "George").all()[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Note that you need to be careful with parentheses as the operator precedance for the & and | are very high
>>> db.query((db.Q.first == "George") & (db.Q.last == "Martin")).all()[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Can do inequalities too
>>> list(db.query(db.Q.born < 1930))[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
Queries support: ==, !=, <, <=, >, >= for normal comparisons.
In addition they support
%:LIKE*:GLOB@:REGEXPusing Python's regex module. Note that this can be disabled for untrusted input.
>>> # This will all be the same
>>> db.query(db.Q.role % "prod%").all() # LIKE
>>> db.query(db.Q.role * "prod*").all() # GLOB
>>> db.query(db.Q.role @ "prod").all() # REGEXP -- Python based[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
JSONLiteDB supports _orderby on query() (and those that wrap it) and query_by_path_exists() (see Advanced Usage)
The input is effectively the same as those for a query but (a) do not have values assigned and (b) can take "+" (ascending,default) or "-" (descending) construction. See the help for query() for more details including how it is used with the different forms
>>> db.query(db.Q.first == "George").all()[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
>>> db.query(db.Q.first == "George", _orderby="-role").all()[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'},
{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'}]
>>> db.query(_orderby=[-db.Q.role, db.Q.last]).all()[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'},
{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
{'first': 'John', 'last': 'Lennon', 'born': 1940, 'role': 'guitar'},
{'first': 'Ringo', 'last': 'Starr', 'born': 1940, 'role': 'drums'},
{'first': 'Paul', 'last': 'McCartney', 'born': 1942, 'role': 'bass'}]
You can sort by subkeys and subelements as well with a similar syntax to queries. See query() for more details.
Queries can be greatly accelerated with an index. Note that SQLite is extremely picky about how you write the index! For the most part, if you the same method to query as write the index, you will be fine. (This is more of an issue with nested queries and advanced formulating of the query).
The name of the index is imaterial. It is based on the fields. It will look different
>>> db.create_index("last")
>>> db.indexes{'ix_items_1bd45eb5': ['$."last"']}
>>> # of course, with four items, this makes little difference
>>> list(db.query(last="Martin"))[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
And an index can also be used to enforce uniqueness amongst one or more fields
>>> db.create_index("first", "last", unique=True)
>>> db.indexes{'ix_items_1bd45eb5': ['$."last"'],
'ix_items_250e4243_UNIQUE': ['$."first"', '$."last"']}
>>> # db.insert({'first': 'George', 'last': 'Martin', 'type':'FAKE ENTRY'})
>>> # Causes: IntegrityError: UNIQUE constraint failed: index 'ix_items_250e4243_UNIQUE'See Advanced Usage for more examples including nested queries
>>> Queries are detailed in the db.query() method. All queries and paths can take four basic forms, but query objects are, by far, the most versatile.
| Type | Path (e.g. create_index()) |
Query (e.g. query()) |
Comments |
|---|---|---|---|
| Plain string | 'itemkey'
| {'itemkey':'query_val'} |
Limited to a single item |
| JSON Path string |
'$.itemkey'
'$.itemkey.subkey'
'$.itemkey[4]'
'$.itemkey.subkey[4]'
|
{'$.itemkey':'query_val'}
{'$.itemkey.subkey':'query_val'}
{'$.itemkey[4]':'query_val'}
{'$.itemkey.subkey[4]':'query_val'}
|
Be careful about indices on JSON path strings. See more below |
| Tuples (or lists) |
('itemkey',)
('itemkey','subkey')
('itemkey',4)
('itemkey','subkey',4)
|
{('itemkey',):'query_val'}
{('itemkey','subkey'):'query_val'}
{('itemkey',4):'query_val'}
{('itemkey','subkey',4):'query_val'}
|
|
| Query Objects. (Let db be your database) |
db.Q.itemkey
db.Q.itemkey.subkey
db.Q.itemkey[4]
db.Q.itemkey.subkey[4]
|
db.Q.itemkey == 'query_val'
db.Q.itemkey.subkey == 'query_val'
db.Q.itemkey[4] == 'query_val'
db.Q.itemkey.subkey[4] == 'query_val'
|
See below. Can also do many more types of comparisons beyond equality |
Note that JSON Path strings presented here are unquoted, but all other methods will quote them. For example, '$.itemkey.subkey' and ('itemkey','subkey') are functionally identical; the latter becomes '$."itemkey"."subkey"'. While they are functionally the same, an index created on one will not be used on the other.
Query Objects provide a great deal more flexibility than other forms.
They can handle normal equality == but can handle inequalities, including !=, <, <=, >, >=.
db.Q.item < 10
db.Q.other_item > 'bla'
They can also handle logic. Note that you must be very careful about parentheses.
(db.Q.item < 10) & (db.Q.other_item > 'bla') # AND
(db.Q.item < 10) | (db.Q.other_item > 'bla') # OR
Note that while something like 10 <= var <= 20 is valid Python, a query must be done like:
(10 <= db.Q.var) & (db.Q.var <= 20 )
And, as noted in "Basic Usage," they can do SQL LIKE comparisons (db.Q.key % "%Val%"), GLOB comparisons (db.Q.key * "file*.txt"), and REGEXP comparisons (db.Q.key @ "\S+?\.[A-Z]").
REGEXP (the @ query operator) uses Python's re engine and may be unsafe for untrusted patterns.
To disable it, set the environment variable before importing/creating a database:
export JSONLITEDB_DISABLE_REGEX=1You can also disable it in code (for new connections) after import:
import jsonlitedb
jsonlitedb.DISABLE_REGEX = TrueYou can mix and match index or attribute for keys. The following are all identical:
db.Q.itemkey.subkeydb.Q['itemkey'].subkeydb.Q['itemkey','subkey']db.Q['itemkey']['subkey']- ...
JSONLiteDB also installs a tool called "jsonlitedb" that makes it easy to read JSONL and JSON files into a database. This is useful for converting existing databases or appending data. The same workflow is available in the API via db.import_jsonl(...) and db.export_jsonl(...).
For CLI usage only, you can set JSONLITEDB_CLI_TABLE to change the default table name.
Passing --table on a command overrides the environment variable.
$ jsonlitedb insert mydb.db newfile.jsonl
$ cat newdata.jsonl | jsonlitedb insert mydb.db
It can also dump a database to JSONL.
$ jsonlitedb dump mydb.db # stdout
$ jsonlitedb dump mydb.db --output db.jsonl
- Dictionary keys must be strings without a dot, double quote, square bracket, and may not start with
_,+, or-. Some of these may work but could have unexpected and untested consequences. - Functionally identical queries may not match for an index because SQLite is extremely strict about the pattern. Mitigate by using the same query mechanics for index creation and query.
- There is no distinction made between an entry having a key with a value of
Nonevs. not having the key. However, you can usequery_by_path_exists()to query items that have a certain path. There is no way still to mix this with other queries testing existence other than withNone. - While it will accept non-dict items like strings, lists, and tuples as a single item, queries on these do not work reliably.
Yes and no. The idea is the complete lack of schema needed and as a notable improvement to a JSON file. Plus, if you index the field of interest, you get super-fast queries all the same!
Yes! The idea is simplicity and compatibility. SQLite basically runs everywhere and is widely accepted. It is only a slight step down from JSON Lines in being future proof.
There really aren't any other single-file, embedded [JSON] object storage databases with anywhere near the ubiquity or pedigree of SQLite.
When using duplicates='replace', it essentially deletes and inserts the item rather than replacing it for real (and keeping the rowid internally). Is that intended?
Mostly yes. The alternative was considered but this behavior more closely matches the mental model of the tool.
JSONLiteDB provides a lot of functionality between queries and sorting but if you need more, just run on the database directly yourself!
Similarly, the minimal CLI can help in some cases but JSONLiteDB is really meant to be accessed as a library.
Yes and no. You can use your own methods to encode the object you insert but since it uses SQLite's JSON1, it must be JSON that gets stored. You could probably hack something else into it but it is not recommended.
We do not reject the use of AI-, LLM-, or agent-driven development, including “vibe coding.” However, we believe it is important to provide appropriate disclosure, as outlined below. We also prefer human-verified code and place high value on the trust users place in this project.
Up until version 0.1.10, there was no use of coding agents and only minimal AI via a chat interface. After that, OpenAI Codex was used to make small changes or perform grunt work. It also helped identify and fix (minor) security gaps.
These changes were all done minimally and with close human review. There was no black-box "vibe-coding" and this largely remains a human-developed tool.
Beginning in 0.3.0, something closer to "vibe-coding" was used to expand the CLI and refactor into files. It was still reviewed and the majority of the critical module remains primarily human-written.