Skip to content

Commit 526e737

Browse files
Clone: fix race condition that may skip some SST files (#1274)
Summary: A donor thread in the main copy loop took a file to send, released the donor state mutex, and then opened the file. If an ENOENT was received at this point, it was assumed that this was a stale SST file from an older checkpoint that was rolled since. Because the donor state mutex was released between taking of the file and opening it, the following race was possible: 1) Thread 1 takes the file 2) Thread 2 decides to roll the checkpoint, the old checkpoint is deleted 3) Thread 1 tries to open the file, gets ENOENT 4) Thread 2 creates the new checkpoint, the file re-appears, but it's too late. Rolling the checkpoint in a donor state mutex critical section is a possible fix, but such section would do a lot of I/O, serializing the parallel threads. Instead, fix by taking the file and opening it in the same critical section. Pull Request resolved: #1274 Reviewed By: sunshine-Chun Differential Revision: D43629546 Pulled By: hermanlee fbshipit-source-id: 6e7f315
1 parent 1854db1 commit 526e737

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

storage/rocksdb/clone/donor.cc

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ class [[nodiscard]] donor final : public myrocks::clone::session {
525525
}
526526

527527
[[nodiscard]] std::string path(const std::string &fn) const {
528-
assert(get_state() != donor_state::INITIAL);
528+
assert(m_state != donor_state::INITIAL);
529529

530530
return myrocks::has_file_extension(fn, ".log")
531531
? myrocks::rdb_concat_paths(myrocks::get_wal_dir(), fn)
@@ -780,11 +780,6 @@ int donor::copy(const THD *thd, uint task_id, Ha_clone_cbk &cbk) {
780780
m_state == donor_state::FINAL_CHECKPOINT ||
781781
m_state == donor_state::FINAL_CHECKPOINT_WITH_LOGS);
782782

783-
mysql_mutex_unlock(&m_donor_mutex);
784-
785-
auto err = handle_any_error(thd);
786-
if (err != 0) return err;
787-
788783
const auto &name = metadata.get_name();
789784
const auto donor_file_path = path(name);
790785
auto open_flags = O_RDONLY;
@@ -804,18 +799,24 @@ int donor::copy(const THD *thd, uint task_id, Ha_clone_cbk &cbk) {
804799
"Not found, assuming old checkpoint: %s",
805800
donor_file_path.c_str());
806801
assert(myrocks::has_file_extension(donor_file_path, ".sst"));
807-
mysql_mutex_lock(&m_donor_mutex);
808802
const auto erased_count [[maybe_unused]] =
809803
m_in_progress_files.erase(metadata.get_id());
810-
mysql_mutex_unlock(&m_donor_mutex);
811804
assert(erased_count == 1);
805+
mysql_mutex_unlock(&m_donor_mutex);
812806
continue;
813807
}
808+
mysql_mutex_unlock(&m_donor_mutex);
809+
814810
MyOsError(errn, EE_FILENOTFOUND, MYF(0), donor_file_path.c_str());
815811
save_error(ER_CANT_OPEN_FILE, donor_file_path, errn);
816812
return ER_CANT_OPEN_FILE;
817813
}
818814

815+
mysql_mutex_unlock(&m_donor_mutex);
816+
817+
auto err = handle_any_error(thd);
818+
if (err != 0) return err;
819+
819820
#ifdef __APPLE__
820821
if (use_direct_io && fcntl(fd, F_NOCACHE, 1) == -1) {
821822
const auto errn = my_errno();

0 commit comments

Comments
 (0)