Skip to content

Conversation

@riversand963
Copy link
Contributor

@riversand963 riversand963 commented Nov 15, 2022

RocksDB has two public APIs:

  • DB::LockWAL()/DB::UnlockWAL() which acquire/release DBImpl::log_write_mutex,
  • DB::GetSortedWalFiles() which internally calls DBImpl::FindObsoleteFiles() which acquires/releases DBImpl::log_write_mutex.

According to the comment on DBImpl::log_write_mutex_: https://github.com/facebook/rocksdb/blob/7.8.fb/db/db_impl/db_impl.h#L2287:L2288

Note: to avoid dealock, if needed to acquire both log_write_mutex_ and mutex_, the order should be first mutex_ and then log_write_mutex_.

This puts limitations on how applications can use the LockWAL() API. After LockWAL() returns ok, then application
should not perform any operation that acquires mutex_. Currently, the use case of LockWAL() is MyRocks implementing
the MySQL storage engine handlerton lock_hton_log interface. The operation that MyRocks performs after LockWAL()
is GetSortedWalFiless() which not only acquires mutex_, but also log_write_mutex_.

There are two issues:

  1. Applications using these two APIs may hang if one thread calls GetSortedWalFiles() after
    calling LockWAL() because log_write_mutex is not recursive.
  2. Two threads may dead lock due to lock order inversion.

Fix for 1

To fix 1, we can figure out whether the calling thread of FindObsoleteFiles() is holding log_write_mutex.
Therefore, we need to track the thread currently holding a mutex. The tracking logic is put in InstrumentedMutex.

Note that std::thread::id can be reused in the same process if the prior thread finishes. We make sure a thread
holding the mutex will reset owner to a default-constructed std::thread::id before calling Unlock(). Also note
that CondVar::Wait() releases the mutex without resetting owner. But keep in mind that when a thread is not sleeping,
and the thread is holding the mutex, then owner is set to the result of calling std::this_thread::get_id().

Fix for 2

To fix 2, we can update LockWAL() API interface and implementation so that

  • LockWAL() returns information about WALs in the MANIFEST for future verification with actual WAL files
  • LockWAL() acquires mutex before log_write_mutex, moving part of the work from GetSortedWalFiles() to LockWAL()

Note that application should make sure UnLockWAL() is called if and only if LockWAL() returns success.

We also need to add a new API DB::GetSortedWalFilesWithFileDeletionDisabled() which assumes that file deletions
has been disabled and no need to acquire any mutex_.

Test plan:
make check

@riversand963 riversand963 changed the title Conditionally tracking ID of thread holding mutex Conditionally track ID of thread holding mutex Nov 15, 2022
@riversand963
Copy link
Contributor Author

cc @hermanlee

@riversand963
Copy link
Contributor Author

riversand963 commented Nov 15, 2022

Need to address the lock-order-inversion

UPDATE: done

@riversand963
Copy link
Contributor Author

According to comment: https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl.h#L2290:L2291, application should always lock DBImpl::mutex_ before DBImpl::log_write_mutex_. Therefore, current DB::LockWAL() implementation is also questionable.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@riversand963 riversand963 marked this pull request as ready for review November 17, 2022 21:38
@riversand963 riversand963 changed the title Conditionally track ID of thread holding mutex Revise LockWAL API and WAL collection Nov 17, 2022
@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

}
}
}
log_write_mutex_.Lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it'd block writes like the original PR (#5146) claimed. Does it? I think explicitly blocking writes (like a write stop) may work better and be simpler.

I also fail to see how the backup worked before without calling DisableFileDeletions(). Did it work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fail to see how the backup worked before without calling DisableFileDeletions(). Did it work?

MyRocks backup tools do disable file deletions first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #5146 PR claimed that: "Walk through the storage engines and lock writes on each engine. For
InnoDB, redo log is locked. For MyRocks, WAL should be locked."
My original impression is that WAL is locked, thus any writes with WAL will be blocked after this call.

Copy link
Contributor Author

@riversand963 riversand963 Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, I thought introducing write stop is undesirable for users, but maybe that's over-optimizing for this use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original impression is that WAL is locked, thus any writes with WAL will be blocked after this call.

OK, I didn't have that impression by looking at how log_write_mutex_ is used. I will test it then since my concern isn't shared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking another look at https://github.com/facebook/rocksdb/blob/7.9.fb/db/db_impl/db_impl_write.cc#L1213. In fact, all threads calling PreprocessWrite() may be blocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not work originally but #7516 fixed it, probably without noticing. I still think we should re-think the approach as it feels unstable and unintuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also hesitant about exposing log_write_mutex_ to application.
Maybe similar to what innodb does, we don't need keep the log_write_mutex locked as long as we disable file deletion.

const bool track_owner_ = false;

#if defined(OS_WIN) && !defined(_POSIX_THREADS)
std::atomic<port::ThreadId> owner_{kDummyOwner};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to account for cond var wait, etc.

@riversand963
Copy link
Contributor Author

Pursue #11020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants