Muramasa is a Datomic interface for git, primarily for analytics and mining.
It provides a single function sync!, which syncs a git repository to a Datomic database, enabling you to query your git history using Datalog.
- Simple API - Single
sync!function to import git repositories - Incremental Sync - Only syncs new objects, not already in database
- Rich Schema - Captures commits, trees, blobs, and file relationships
- Powerful Queries - Use Datalog to analyze git history
- In-Memory or Persistent - Works with any Datomic database
- Leiningen (for running the demo)
- JDK 8 or higher
- Git repository to analyze
The easiest way to see Muramasa in action is to run the demo:
./demo.shOr using Leiningen directly:
lein runWhat the demo does:
- Creates an in-memory Datomic database
- Syncs the muramasa repository itself (meta!)
- Runs 5 example queries demonstrating different capabilities:
- Count total commits
- Show recent commits with messages
- List all unique files in the repository
- Search commits by message content
- Display object type distribution
Expected output:
Sync Statistics:
Commits synced: 16
Total objects: 100
Repository: /path/to/muramasa
Query 1: Total number of commits
Total commits: 16
Query 2: Most recent 5 commits
- implement sync! function and integration tests
- fix serialization and add transaction preparation
...
Query 3: Unique files in repository
Total unique files: 15
Files: .gitignore, LICENSE, README.md, core.clj, ...
Query 4: Search commits containing 'add'
Found 9 commits: ...
Query 5: Object type distribution
node : 154
tree : 50
blob : 34
commit : 16
(require '[muramasa.core :as m])
(require '[datomic.api :as d])
;; Create an in-memory database
(def conn (m/scratch-conn))
;; Sync a git repository
(m/sync! conn "/path/to/repo")
;; => {:commits-synced 42, :objects-synced 256, :repo-uri "/path/to/repo"}
;; Sync with verbose output
(m/sync! conn "/path/to/repo" {:verbose true})
;; Prints progress: "Collecting commits...", "Found 42 commits", etc.
;; Incremental sync - only new objects are added
(m/sync! conn "/path/to/repo")
;; => {:commits-synced 0, :objects-synced 0, :repo-uri "/path/to/repo"};; Count total commits
(d/q '[:find (count ?c) .
:where [?c :git/type :git.types/commit]]
(d/db conn))
;; => 42
;; Get all commit messages and times
(d/q '[:find ?msg ?time
:where [?c :git/type :git.types/commit]
[?c :git.commit/msg ?msg]
[?c :git.commit/time ?time]]
(d/db conn))
;; => [["Initial commit" #inst "2024-01-01T10:00:00"] ...]
;; Find all unique filenames
(d/q '[:find ?name
:where [?f :file/name ?name]]
(d/db conn))
;; => [["README.md"] ["src/core.clj"] ...];; Find commits in a date range
(d/q '[:find ?msg ?time
:in $ ?start ?end
:where [?c :git/type :git.types/commit]
[?c :git.commit/msg ?msg]
[?c :git.commit/time ?time]
[(>= ?time ?start)]
[(<= ?time ?end)]]
(d/db conn)
#inst "2024-01-01"
#inst "2024-12-31")
;; Fulltext search on commit messages
(d/q '[:find ?msg ?sha
:where [?c :git/type :git.types/commit]
[?c :git.commit/msg ?msg]
[?c :git/sha ?sha]
[(fulltext $ :git.commit/message "bugfix") [[?c]]]]
(d/db conn))
;; Count objects by type
(d/q '[:find ?type (count ?e)
:where [?e :git/type ?type]]
(d/db conn))
;; => [[:git.types/commit 42] [:git.types/blob 128] ...]
;; Find a specific commit by SHA
(d/q '[:find ?msg ?time
:in $ ?sha
:where [?c :git/sha ?sha]
[?c :git.commit/msg ?msg]
[?c :git.commit/time ?time]]
(d/db conn)
"a1b2c3d4...");; Start a REPL session
lein repl
;; In the REPL:
(require '[muramasa.core :as m])
(require '[datomic.api :as d])
(def conn (m/scratch-conn))
(m/sync! conn ".")
;; Explore the data interactively
(def db (d/db conn))
;; Get a random commit
(def commit-id (d/q '[:find ?c . :where [?c :git/type :git.types/commit]] db))
;; See all attributes of that commit
(d/touch (d/entity db commit-id))
;; => {:git/sha "abc123...", :git/type :git.types/commit, :git.commit/msg "...", ...}Muramasa creates the following entity types in Datomic:
-
Commits (
:git.types/commit):git/sha- Commit SHA (unique):git.commit/msg- Short commit message:git.commit/message- Full commit message (fulltext indexed):git.commit/time- Commit timestamp
-
Trees (
:git.types/tree):git/sha- Tree SHA (unique):git.tree/nodes- Component nodes (tree entries)
-
Blobs (
:git.types/blob):git/sha- Blob SHA (unique):git.blob/uri- URI reference to blob content
-
Nodes (
:git.types/node):git.node/type- Node type (:git.types/treeor:git.types/blob):git.node/filename- Reference to file entity:git.node/modeOctal- File mode (e.g., "100644")
-
Files (
:file/name):file/name- Filename (unique, fulltext indexed)
- All
:git/shaattributes are unique identities, enabling upsert semantics - Commit messages support fulltext search
- Filenames support fulltext search
- Nodes are component entities of trees
Run all tests:
lein testRun specific test namespace:
lein test muramasa.core-test
lein test muramasa.query-testTest Coverage:
- 15 tests
- 67 assertions
- 100% passing
- Tree/parent relationships are temporarily disabled pending entity reference resolution improvements
- Blob content is not currently persisted to disk (placeholder URIs used)
- No support for git tags yet
- No support for tracking specific branches/refs
- Re-enable tree and parent entity references
- Implement blob persistence to disk
- Add support for git tags
- Add branch/ref tracking
- Query helper functions for common operations
- Performance optimizations for large repositories
Copyright © 2016 Navgeet Agarwal
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.