Skip to content

[Article] Performance Issues, Proposed Solutions, and Insights #2

@aaron-iz

Description

@aaron-iz

The key

The key to optimization is minimizing I/O, especially network I/O. While client-side processing complexity does contribute to performance issues, it remains negligible compared to the impact of I/O operations, even if the client operations are highly inefficient.

The solution? Improved data organization-first through compression, and second through more efficient structuring.

Maximizing data density per message.

Currently, data is being stored as raw text, with each entry labeled by a unique <NUM>~ prefix to serve as a user key. However, this method is inefficient for utilizing the API's resources. Each message can only hold 4,000 characters, which significantly limits the amount of data we can save per message and leads to various performance issues.

While there is some smart utilization of the API's features -for instance, by using message replies as pointers to locate specific entries that are split across multiple messages (due to the character limit mentioned above), this is an insufficient utilization, and quite negligible.

The primary issue is that storing or retrieving data requires many messages, forcing the client to send multiple requests. This approach negatively impacts performance and risks exceeding rate limits due to the increased number of API calls required. We'd like to minimize I/O operations as much as possible, this is the main factor here.

  1. One solution is to use images/files since they can hold more data per message, and the content part of the message could serve as meta storage. But sadly, Guilded hasn't implemented this feature to the rest API yet, although is available in the UI client. Read more here.

  2. Another approach is to leverage the encoding characteristics of the message content field, as documented here. Data could be serialized into byte data and then encoded as a string using the message’s content encoding. I assume Guilded uses UTF-8, although I couldn’t find a definitive answer on this. This is a clever workaround to increase data capacity per message, but I believe the improvement in efficiency would be minimal. However, we shouldn't roll this option out, as any optimization helps.

The more data a single message holds the less network I/O is performed.

Structure, algorithms and all of that fun.

Currently, most operations rely on linear scans, and the database essentially functions as a collection of lists, with larger entries implemented as linked lists. We know this setup isn’t scalable-anyone with a basic understanding of data structures knows it’s inefficient. So, what can we do? One option is to implement a B-Tree; in Guilded’s context, message replies could serve as a natural array for each node. There are plenty of effective options available for structuring data more efficiently.

Additionally, and stupidly, buffers aren't really used. I do remember throwing the word buffer somewhere within this repository but it's not a real buffer. Using more buffers can greatly improve performance. As well as using algorithms which relay on them, such as Block Nested Loop, Grace Hash Join. (But, if there's one thing I won't be implementing it's join functionality).

Bottom line

Should it be implemented? probably for fun, and I might do it soon. Should it be used? eh, I don't think so. Other than a fun project there isn't a real reason to prefer this on SQLite or PostgreSQL.

What can be done? the project is mostly modular (some things are highly coupled but the project is still small, so major refactoring isn't required), I can detach parts even further, so the ideas here aren't explicitly for GuildedSQL, but for any API which provides the same functionality as Guilded, I just picked them because no one really uses them.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions