Add a git based sync option #441#722
Add a git based sync option #441#722carmiac wants to merge 18 commits intoGothenburgBitFactory:mainfrom
Conversation
and then to bed
djmitche
left a comment
There was a problem hiding this comment.
This is so cool! I would like to take the time to read and experiment with it, but a few initial reactions to the comments and the PR description:
- Optional configuration for the path to the
gitbinary will probably be useful for someone, and should be easy to add - It looks like
remoteaccepts the string "None"? Let's make that anOption<String>instead, if that's the case, or make the empty string the special case. - For cleanup, two observations:
- Git keeps all of the data anyway, so "deleting" a version is really just removing the name from the directory. So, I don't think deleting old versions has much impact.
- When a replica syncs, it needs all versions since the last version it saw. Otherwise, it has to restore from the snapshot and lose any local data since it last sync'd.
- So, it's beneficial to keep versions around for as long as is practical. The cloud servers keep a half-year's worth of versions (
src/server/cloud/server.rs,MAX_VERSION_AGE_SECS), and I think that's reasonable here, too.
I'd be interested to know what @ryneeverett thinks, too!
|
Thanks for the review! I'll take a look at those. My reason behind the git rm was to make the globs when adding new versions faster, but I didn't realize how that interacted with restoration. |
|
Hm, good point, but local globs are probably pretty fast even for 100's of dentries. |
djmitche
left a comment
There was a problem hiding this comment.
A bunch of comments here, but all toward improving the implementation rather than show-stopping issues. I think most of what I've suggested is relatively straightforward, but if necessary some of it can be done in followup PRs.
One general observation is, despite Server being an async trait, this invokes Git synchronously. I think that's fine for the expected use-cases for this sync model, and it's something that can be improved later if desired.
tl;dr: This looks great, and I look forward to merging it after some minor revisions!
| # Suppport for sync to another SQLite database on the same machine | ||
| server-local = ["dep:rusqlite"] | ||
| # Support for sync via Git | ||
| server-git = ["git-sync"] |
There was a problem hiding this comment.
Why the two levels of features here? I think one would be sufficient. The cloud syncs have cloud because it enables some common functionality, but there's no such thing for git. So,
server-git = ["dep:serde_with", "dep:glob", "encryption"]
and update the cfg(feature..) in the code.
| //! | ||
| //! - I haven't done any performance testing, but it seems reasonably quick for manual use. | ||
| //! - Currently is uses the same salt for all files. This isn't great security practice, | ||
| //! but does seem to be what the other servers are doing. |
There was a problem hiding this comment.
This is a good point! In defense of the idea, the salt is used in the key derivation, and we use the same key for every file, so in that sense it's only used once. If there's further concern, let's open an issue about it -- I'm sure others would like to chime in too!
| //! | ||
| //! Notes for Reviewers | ||
| //! | ||
| //! - I haven't done any performance testing, but it seems reasonably quick for manual use. |
There was a problem hiding this comment.
This seems fine. We have not focused on performance testing of the sync operations since most of the time is network-related.
| //! create a 'task' branch and let TaskChampion manage that branch. | ||
| //! - This does support both defining a remote and having `local_only` mode set at the same | ||
| //! time. The idea is that maybe the remote isn't ready yet, or eithe rtemporarily or | ||
| //! permanantly down. Either way, you can use this in local mode in the mean time. |
There was a problem hiding this comment.
What are the risks here? In general, the more user-facing bits of the doc here might be better as docstrings on the configuration enum.
| //! | ||
| //! - Since this shells out to git, it assumes that you havea reasonably functional git | ||
| //! setup. I.e. 'git init', 'git add', 'git commit', etc shoud just work. | ||
| //! - If you are using a remote, 'git push' and 'git pull' shoud work. |
There was a problem hiding this comment.
I suspect there's some room for more robustness here, such as disabling prompts. Maybe we can address that as we find issues, but it might be worth thinking about ahead of time.
I see that nothing needs to parse the output of a git command, so that simplifies things!
| } | ||
|
|
||
| /// Run a git command in a given directory, returning an error if it exits non-zero. | ||
| fn git_cmd(dir: &Path, args: &[&str]) -> Result<()> { |
There was a problem hiding this comment.
This ends up putting a lot of git's output in the cargo test output and probably in the taskwarrior output as well. What do you think of amending this function so that it only shows the output on an unexpected error, or logs it all to log::debug or something similar?
| fs::create_dir_all(local_path)?; | ||
|
|
||
| // Check if path is already a git repo. | ||
| let is_repo = Command::new("git") |
There was a problem hiding this comment.
I think this doesn't use git_cmd because a nonzero exit status is expected, and similar for git checkout below. Maybe git_cmd could be extended to support that situation?
| git_cmd( | ||
| local_path, | ||
| &["clean", "-f", "--", "v-*", "snapshot", "meta"], | ||
| )?; |
There was a problem hiding this comment.
This happens at least twice and could be a helper function!
|
|
||
| /// Fetch and fast-forward to the remote branch. No-op in local-only mode. | ||
| /// If the remote branch does not yet exist (e.g. fresh bare repo), this is also a no-op. | ||
| fn pull(&self) -> Result<()> { |
There was a problem hiding this comment.
This isn't really a pull, since it does a hard reset. Maybe fn reset_to_remote?
| parent_version_id, | ||
| history_segment, | ||
| }; | ||
| let version_path = self.add_version_by_parent_version_id(&version)?; |
There was a problem hiding this comment.
For robustness, should this check that the working copy is clean before starting modifications? If a previous invocation of add_version crashed right here, we'd end up with two different version files being committed.
This add a git backed sync server to TaskChampion.
Repo Layout
v-{parent_uuid}-{child_uuid}, containing encrypted [HistorySegment] bytes.snapshot, containing a JSON wrapper around an encrypted full-state blob.meta) holds the latest version UUID and the encryption salt as JSON.Writes
After each write (
add_version,add_snapshot) the server stages the changed files, creates a commit, and pushes to the remote. If the push is rejected , the commit is rolled back and the caller receives an [AddVersionResult::ExpectedParentVersion] or an [Error] so it can retry.After a snapshot is stored, [
GitSyncServer::cleanup] automatically removes all version files whose history is now captured by the snapshot, keeping the repository compact.Options
There are several configuration options.
branchsets the branch to use for TaskChampion. This would let someone keep tasks alongside a project without dealing with history issues.remotesets the remote. Same format the git uses.local_onlymakespushandpullno-ops. This could be used if the remote isn't ready yet, or is unavailable for some other reason.General Notes for Reviewers