Skip to content

Optimize new decentralized cluster architecture #366

@CGodiksen

Description

@CGodiksen

Some features and optimizations were left out on the initial PR for removing the manager to avoid having too much in a single PR. This issue contains a list of tasks that can be done to further optimize and add features to the new architecture.

  • Look into removing as much metadata as possible from the server's local data folder.
    • In general we should attempt to remove as much of the information that is going to be duplicated between the remote data folder and the local data folder of each node.
  • When metadata cannot be removed, saving Delta Lake tables and related metadata should be combined into a single request. This includes when we create tables (and save metadata right after) and when we drop tables (and delete metadata right after).
  • Add a more aggressive retry strategy to ensure that cluster operations are forced through.
  • When we create, drop, and use tables, if we encounter an error, we should try to handle it by synchronizing with the remote data folder.
    • For example if a node receives data it does not have a table for, check the remote object store for the table. Also, if data transfer fails because the table does not exist in the remote object store, drop it locally.
  • Handle dead nodes.
  • Add better load balancing for queries to replace the current random selection.
  • Use a very simple optimization that checks if the DeltaTable has been changed since last and saving it in the Cluster struct instead of reading all nodes every time.
  • Add optimization to cloud nodes where they handle load balancing automatically among themselves to not require the user to always use “get_flight_info()”.

When a task in the list is started, we should consider moving it to a separate issue.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions