Skip to content

Navgeet/muramasa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Muramasa

Muramasa is a Datomic interface for git, primarily for analytics and mining.

It provides a single function sync!, which syncs a git repository to a Datomic database, enabling you to query your git history using Datalog.

Features

  • Simple API - Single sync! function to import git repositories
  • Incremental Sync - Only syncs new objects, not already in database
  • Rich Schema - Captures commits, trees, blobs, and file relationships
  • Powerful Queries - Use Datalog to analyze git history
  • In-Memory or Persistent - Works with any Datomic database

Prerequisites

  • Leiningen (for running the demo)
  • JDK 8 or higher
  • Git repository to analyze

Quick Start

Run the Demo

The easiest way to see Muramasa in action is to run the demo:

./demo.sh

Or using Leiningen directly:

lein run

What the demo does:

  • Creates an in-memory Datomic database
  • Syncs the muramasa repository itself (meta!)
  • Runs 5 example queries demonstrating different capabilities:
    1. Count total commits
    2. Show recent commits with messages
    3. List all unique files in the repository
    4. Search commits by message content
    5. Display object type distribution

Expected output:

Sync Statistics:
  Commits synced: 16
  Total objects: 100
  Repository: /path/to/muramasa

Query 1: Total number of commits
  Total commits: 16

Query 2: Most recent 5 commits
  - implement sync! function and integration tests
  - fix serialization and add transaction preparation
  ...

Query 3: Unique files in repository
  Total unique files: 15
  Files: .gitignore, LICENSE, README.md, core.clj, ...

Query 4: Search commits containing 'add'
  Found 9 commits: ...

Query 5: Object type distribution
   node : 154
   tree : 50
   blob : 34
   commit : 16

Basic Usage

(require '[muramasa.core :as m])
(require '[datomic.api :as d])

;; Create an in-memory database
(def conn (m/scratch-conn))

;; Sync a git repository
(m/sync! conn "/path/to/repo")
;; => {:commits-synced 42, :objects-synced 256, :repo-uri "/path/to/repo"}

;; Sync with verbose output
(m/sync! conn "/path/to/repo" {:verbose true})
;; Prints progress: "Collecting commits...", "Found 42 commits", etc.

;; Incremental sync - only new objects are added
(m/sync! conn "/path/to/repo")
;; => {:commits-synced 0, :objects-synced 0, :repo-uri "/path/to/repo"}

Query Examples

Basic Queries

;; Count total commits
(d/q '[:find (count ?c) .
       :where [?c :git/type :git.types/commit]]
     (d/db conn))
;; => 42

;; Get all commit messages and times
(d/q '[:find ?msg ?time
       :where [?c :git/type :git.types/commit]
              [?c :git.commit/msg ?msg]
              [?c :git.commit/time ?time]]
     (d/db conn))
;; => [["Initial commit" #inst "2024-01-01T10:00:00"] ...]

;; Find all unique filenames
(d/q '[:find ?name
       :where [?f :file/name ?name]]
     (d/db conn))
;; => [["README.md"] ["src/core.clj"] ...]

Advanced Queries

;; Find commits in a date range
(d/q '[:find ?msg ?time
       :in $ ?start ?end
       :where [?c :git/type :git.types/commit]
              [?c :git.commit/msg ?msg]
              [?c :git.commit/time ?time]
              [(>= ?time ?start)]
              [(<= ?time ?end)]]
     (d/db conn)
     #inst "2024-01-01"
     #inst "2024-12-31")

;; Fulltext search on commit messages
(d/q '[:find ?msg ?sha
       :where [?c :git/type :git.types/commit]
              [?c :git.commit/msg ?msg]
              [?c :git/sha ?sha]
              [(fulltext $ :git.commit/message "bugfix") [[?c]]]]
     (d/db conn))

;; Count objects by type
(d/q '[:find ?type (count ?e)
       :where [?e :git/type ?type]]
     (d/db conn))
;; => [[:git.types/commit 42] [:git.types/blob 128] ...]

;; Find a specific commit by SHA
(d/q '[:find ?msg ?time
       :in $ ?sha
       :where [?c :git/sha ?sha]
              [?c :git.commit/msg ?msg]
              [?c :git.commit/time ?time]]
     (d/db conn)
     "a1b2c3d4...")

Working with the REPL

;; Start a REPL session
lein repl

;; In the REPL:
(require '[muramasa.core :as m])
(require '[datomic.api :as d])

(def conn (m/scratch-conn))
(m/sync! conn ".")

;; Explore the data interactively
(def db (d/db conn))

;; Get a random commit
(def commit-id (d/q '[:find ?c . :where [?c :git/type :git.types/commit]] db))

;; See all attributes of that commit
(d/touch (d/entity db commit-id))
;; => {:git/sha "abc123...", :git/type :git.types/commit, :git.commit/msg "...", ...}

Schema

Muramasa creates the following entity types in Datomic:

Entity Types

  • Commits (:git.types/commit)

    • :git/sha - Commit SHA (unique)
    • :git.commit/msg - Short commit message
    • :git.commit/message - Full commit message (fulltext indexed)
    • :git.commit/time - Commit timestamp
  • Trees (:git.types/tree)

    • :git/sha - Tree SHA (unique)
    • :git.tree/nodes - Component nodes (tree entries)
  • Blobs (:git.types/blob)

    • :git/sha - Blob SHA (unique)
    • :git.blob/uri - URI reference to blob content
  • Nodes (:git.types/node)

    • :git.node/type - Node type (:git.types/tree or :git.types/blob)
    • :git.node/filename - Reference to file entity
    • :git.node/modeOctal - File mode (e.g., "100644")
  • Files (:file/name)

    • :file/name - Filename (unique, fulltext indexed)

Schema Notes

  • All :git/sha attributes are unique identities, enabling upsert semantics
  • Commit messages support fulltext search
  • Filenames support fulltext search
  • Nodes are component entities of trees

Development

Running Tests

Run all tests:

lein test

Run specific test namespace:

lein test muramasa.core-test
lein test muramasa.query-test

Test Coverage:

  • 15 tests
  • 67 assertions
  • 100% passing

Known Limitations

  • Tree/parent relationships are temporarily disabled pending entity reference resolution improvements
  • Blob content is not currently persisted to disk (placeholder URIs used)
  • No support for git tags yet
  • No support for tracking specific branches/refs

Roadmap

  • Re-enable tree and parent entity references
  • Implement blob persistence to disk
  • Add support for git tags
  • Add branch/ref tracking
  • Query helper functions for common operations
  • Performance optimizations for large repositories

License

Copyright © 2016 Navgeet Agarwal

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

git on datomic

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors