Skip to content

implement backup descriptor for incremental backup#632

Open
qiyanghe1998 wants to merge 16 commits intomasterfrom
qh/backup-descriptor
Open

implement backup descriptor for incremental backup#632
qiyanghe1998 wants to merge 16 commits intomasterfrom
qh/backup-descriptor

Conversation

@qiyanghe1998
Copy link
Contributor

Follow Qiyang He's design for incremental backup, this diff adds the backup descriptor for each backup.

Generally, it will create a folder for each backup distinguished by the timestamp. Under each folder, there will be newly added SST files (including those from compaction), backup descriptor, CURRENT, MANIFEST and OPTION. The backup descriptor contains a map from SST filename to timestamp and the range of SST files. You can refer to https://docs.google.com/document/d/1aRO1PJ8qUN-yn6iUK7cisk7qSAsKFhnoyig0_T89jTo/edit#heading=h.oxp5c6sx5wxx for more details.

@qiyanghe1998 qiyanghe1998 marked this pull request as ready for review August 5, 2022 21:59
Copy link
Contributor

@mradwan mradwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still working on reviewing the unit tests. but posting these comments here to unblock Qiyang from addressing them.

0 == str.compare(str.size()-suffix.size(), suffix.size(), suffix);
}

inline int64_t remove_leading_zero(std::string& num) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function is not expensive prefer functions that don't modify input to avoid accidential bugs.

For example I'd just return the new string and pass num as constant.


// convert the string to a backup_descriptor map, if any error happens, it will return an empty map
// TODO: add sst_id_min and sst_id_max parsing
void ApplicationDBBackupManager::parseBackupDesc(const std::string& contents,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing our own parser we can just write the backup descriptor in json and parse the json

/// Copyright 2016 Pinterest Inc.

This will also be very easy to read manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG(INFO) << "S3 Backup " << db->db_name() << " to " << ensure_ends_with_pathsep(FLAGS_s3_incre_backup_prefix) << db->db_name();
auto ts = common::timeutil::GetCurrentTimestamp();
auto local_path = folly::stringPrintf("%ss3_tmp/%s%d/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts);
auto local_path = folly::stringPrintf("%ss3_tmp/%s%ld/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add directory prefix incremental to distinguish regular backup from incremental backup

boost::system::error_code remove_last_backup_err;
boost::system::error_code create_last_backup_err;
// if we have not removed the last backup, we do not need to download it
bool need_download = !boost::filesystem::exists(local_path_last_backup + backupDesc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to

  1. Only download, avoid looking into local directory.
  2. When writing, write the backup descriptor last (Since this means when we upload backup descriptor we have completely uploaded last backup with no errors).

Generally having two sources of data means we have to deal with inconsistencies between them that's why I am suggesting only use s3.


// Create the new backup descriptor file
std::string backup_desc_contents;
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to put min_id and max_id is it an optimization or some sort of check?

std::string backup_desc_contents;
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents);
common::FileUtil::createFileWithContent(formatted_checkpoint_local_path, backupDesc, backup_desc_contents);
checkpoint_files.emplace_back(backupDesc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to use emplace_back here, just use push_back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this guarantee that backup descriptor will only be uploaded after all files get uploaded?

@qiyanghe1998
Copy link
Contributor Author

Addressed all the comments. Could you have another look? @mradwan Thanks!

@qiyanghe1998 qiyanghe1998 requested a review from mradwan August 9, 2022 22:47
Copy link
Contributor

@mradwan mradwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the coments. I left one comment regarding leaving another comment if you can leave it that would be great.

Thank you!

temp_map[item.first] = item.second;
}
contents[fileToTs] = temp_map;
contents[lastBackupTs] = last_ts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

std::unordered_map<std::string, int64_t> file_to_ts;
const string sst_suffix = ".sst";

if (last_ts >= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case if machine restarts we wouldnt have last_ts.

Can you leave a comment that we need to handle this code doesn't handle this case well?

Thanks!

@qiyanghe1998 qiyanghe1998 requested a review from mradwan August 12, 2022 00:36
@qiyanghe1998
Copy link
Contributor Author

Pushed a new version with backup starter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants