implement backup descriptor for incremental backup#632
implement backup descriptor for incremental backup#632qiyanghe1998 wants to merge 16 commits intomasterfrom
Conversation
mradwan
left a comment
There was a problem hiding this comment.
I am still working on reviewing the unit tests. but posting these comments here to unblock Qiyang from addressing them.
| 0 == str.compare(str.size()-suffix.size(), suffix.size(), suffix); | ||
| } | ||
|
|
||
| inline int64_t remove_leading_zero(std::string& num) { |
There was a problem hiding this comment.
If the function is not expensive prefer functions that don't modify input to avoid accidential bugs.
For example I'd just return the new string and pass num as constant.
|
|
||
| // convert the string to a backup_descriptor map, if any error happens, it will return an empty map | ||
| // TODO: add sst_id_min and sst_id_max parsing | ||
| void ApplicationDBBackupManager::parseBackupDesc(const std::string& contents, |
There was a problem hiding this comment.
Instead of writing our own parser we can just write the backup descriptor in json and parse the json
This will also be very easy to read manually.
There was a problem hiding this comment.
| LOG(INFO) << "S3 Backup " << db->db_name() << " to " << ensure_ends_with_pathsep(FLAGS_s3_incre_backup_prefix) << db->db_name(); | ||
| auto ts = common::timeutil::GetCurrentTimestamp(); | ||
| auto local_path = folly::stringPrintf("%ss3_tmp/%s%d/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts); | ||
| auto local_path = folly::stringPrintf("%ss3_tmp/%s%ld/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts); |
There was a problem hiding this comment.
Can we add directory prefix incremental to distinguish regular backup from incremental backup
| boost::system::error_code remove_last_backup_err; | ||
| boost::system::error_code create_last_backup_err; | ||
| // if we have not removed the last backup, we do not need to download it | ||
| bool need_download = !boost::filesystem::exists(local_path_last_backup + backupDesc); |
There was a problem hiding this comment.
Good idea to
- Only download, avoid looking into local directory.
- When writing, write the backup descriptor last (Since this means when we upload backup descriptor we have completely uploaded last backup with no errors).
Generally having two sources of data means we have to deal with inconsistencies between them that's why I am suggesting only use s3.
|
|
||
| // Create the new backup descriptor file | ||
| std::string backup_desc_contents; | ||
| makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents); |
There was a problem hiding this comment.
Why do we need to put min_id and max_id is it an optimization or some sort of check?
| std::string backup_desc_contents; | ||
| makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents); | ||
| common::FileUtil::createFileWithContent(formatted_checkpoint_local_path, backupDesc, backup_desc_contents); | ||
| checkpoint_files.emplace_back(backupDesc); |
There was a problem hiding this comment.
There is no reason to use emplace_back here, just use push_back
There was a problem hiding this comment.
Does this guarantee that backup descriptor will only be uploaded after all files get uploaded?
|
Addressed all the comments. Could you have another look? @mradwan Thanks! |
mradwan
left a comment
There was a problem hiding this comment.
Thanks for addressing the coments. I left one comment regarding leaving another comment if you can leave it that would be great.
Thank you!
| temp_map[item.first] = item.second; | ||
| } | ||
| contents[fileToTs] = temp_map; | ||
| contents[lastBackupTs] = last_ts; |
| std::unordered_map<std::string, int64_t> file_to_ts; | ||
| const string sst_suffix = ".sst"; | ||
|
|
||
| if (last_ts >= 0) { |
There was a problem hiding this comment.
In this case if machine restarts we wouldnt have last_ts.
Can you leave a comment that we need to handle this code doesn't handle this case well?
Thanks!
|
Pushed a new version with backup starter |
Follow Qiyang He's design for incremental backup, this diff adds the backup descriptor for each backup.
Generally, it will create a folder for each backup distinguished by the timestamp. Under each folder, there will be newly added SST files (including those from compaction), backup descriptor, CURRENT, MANIFEST and OPTION. The backup descriptor contains a map from SST filename to timestamp and the range of SST files. You can refer to https://docs.google.com/document/d/1aRO1PJ8qUN-yn6iUK7cisk7qSAsKFhnoyig0_T89jTo/edit#heading=h.oxp5c6sx5wxx for more details.