Commit 526e737
Clone: fix race condition that may skip some SST files (#1274)
Summary:
A donor thread in the main copy loop took a file to send, released the donor state mutex, and then opened the file. If an ENOENT was received at this point, it was assumed that this was a stale SST file from an older checkpoint that was rolled since.
Because the donor state mutex was released between taking of the file and opening it, the following race was possible:
1) Thread 1 takes the file
2) Thread 2 decides to roll the checkpoint, the old checkpoint is deleted 3) Thread 1 tries to open the file, gets ENOENT
4) Thread 2 creates the new checkpoint, the file re-appears, but it's too late.
Rolling the checkpoint in a donor state mutex critical section is a possible fix, but such section would do a lot of I/O, serializing the parallel threads. Instead, fix by taking the file and opening it in the same critical section.
Pull Request resolved: #1274
Reviewed By: sunshine-Chun
Differential Revision: D43629546
Pulled By: hermanlee
fbshipit-source-id: 6e7f3151 parent 1854db1 commit 526e737
1 file changed
+9
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
525 | 525 | | |
526 | 526 | | |
527 | 527 | | |
528 | | - | |
| 528 | + | |
529 | 529 | | |
530 | 530 | | |
531 | 531 | | |
| |||
780 | 780 | | |
781 | 781 | | |
782 | 782 | | |
783 | | - | |
784 | | - | |
785 | | - | |
786 | | - | |
787 | | - | |
788 | 783 | | |
789 | 784 | | |
790 | 785 | | |
| |||
804 | 799 | | |
805 | 800 | | |
806 | 801 | | |
807 | | - | |
808 | 802 | | |
809 | 803 | | |
810 | | - | |
811 | 804 | | |
| 805 | + | |
812 | 806 | | |
813 | 807 | | |
| 808 | + | |
| 809 | + | |
814 | 810 | | |
815 | 811 | | |
816 | 812 | | |
817 | 813 | | |
818 | 814 | | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
819 | 820 | | |
820 | 821 | | |
821 | 822 | | |
| |||
0 commit comments