Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 243 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,103 @@ Post (
** Traffic(Write) = 2700 * 120 = ~ 0.000324 GB/sec
```


### Фиды

Фиды по пользувателю

```
** RPS(Read) = 10 000 000 * 30 / 86400 = ~ 3473 req/sec

Request user token (200B)

Response (
uid
description
created_at
Photos (x3)
tag
reaction (int)
) ~ 1300B x20 = 26000 B

** Traffic(Read) = 26000 * 3473 = ~0.1 GB/sec

```

После получения респонса, браузер делает еще 3x20=60 запросов по картинкам на s3

```
** RPS(Read photos) = 10 000 000 * 60 / 86400 = ~6945 req/sec
** Traffic(Read photos) = 500000 * 6945 = ~ 3.48 GB/sec
```

#### Оценка дисков(Публикации(сохранение) + фиды(полуечние))

```
RPS(Write s3) = 120 req/sec
RPS(Read s3) = ~6945 req/sec

Traffic(Write s3) = ~0.18 GB/sec
Traffic(Read s3) = ~3.48 GB/sec

Capacity(s3) = 0.18 GB/s * 86400 * 365 = 5677 TB

HDD:
Disks_for_capacity = 5677 TB / 32TB = 178 Disk
Disks_for_throughput = 3.66 GB/sec / 100 МБ/с = 36,6 = 40 Disk
Disks_for_iops = iops / disk_iops = 7065 / 100 = 71 Disk
Disks = 178

SSD(SATA):
Disks_for_capacity = 5677 TB / 100TB = 57 Disk
Disks_for_throughput = 3.66 GB/sec / 500 МБ/с = 8 Disk
Disks_for_iops = iops / disk_iops = 7065 / 1000 = 8 Disk
Disks = 57

SSD(nVME):
Disks_for_capacity = 5677 TB / 30TB = Disk
Disks_for_throughput = 3.66 GB/sec / 3GB/с = 2 Disk
Disks_for_iops = iops / disk_iops = 7065 / 10000 = 1 Disk
Disks = 190

Очевидно выбор SSD(SATA) = 57 x 100 TB,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну стоит охлаждать тут старые фото на HDD

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А как это показать в цыфрах ?

хотя есть сомнения и по nVME (из за пропусконой способности и iops)

====================================================================

RPS(Write) - 120 req/sec
RPS(Read) - 3473 req/sec

Traffic(Write) - 0.000324 GB/sec
Traffic(Read) - 0.1 GB/sec

Capacity = 0.000324 GB/sec * 86400 * 365 = 11 TB

HDD:
Disks_for_capacity = 11 TB / 10TB = 2 Disk
Disks_for_throughput = 0.100324 GB/sec / 100 МБ/с = 101/100 = 2 Disk
Disks_for_iops = iops / disk_iops = 3693 / 100 = 37 Disk
Disks = 37

SSD(SATA):
Disks_for_capacity = 11 TB / 20TB = 1 Disk
Disks_for_throughput = 0.100324 GB/sec / 500 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 3693 / 1000 = 4 Disk
Disks = 4

SSD(nVME):
Disks_for_capacity = 11 TB / 10TB = 2 Disk
Disks_for_throughput = 0.100324 GB GB/sec / 3GB/с = 1 Disk
Disks_for_iops = iops / disk_iops = 3693 / 10000 = 1 Disk
Disks = 1

Выбор SSD(nVME) - 2 x 12TB (С запасом, тут желательно получить обратный комнтарий -

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не пойму, к чему относятся эти диски - нужно посчитать диски в разрезе по подсистемам

Copy link
Owner Author

@devel96 devel96 Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

тут фиды (метадата), а вверху медиа была, они же в разных дисках будут находится

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тогда нужно расписать более понятно

правильно думаю насчет 2х12 тб вместо 1х15, по цене переплта есть но думаю есть смысл
не использовать всего 1 диск)

```


### Оценки

```
Expand All @@ -76,6 +173,40 @@ User Token + post_uid + reaction (int) = 200 + 16 + 4 = 220 B
** Traffic(Write) = 220 B * 120 req/sec = 27 KB/sec
```

#### Оценка дисков(Оценки)

```
Read добавился позже. буду брать примерно те жe данные
что в комeнтариях(с учетом Write данных)

RPS(Write) = 120 req/sec
RPS(Read) = 1200 req/sec

Traffic(Write) = 27 KB/sec
Traffic(Read) = 0,0013 GB/sec

Capacity = 27 KB/s * 86400 * 365 = ~1 TB

HDD:
Disks_for_capacity = 1 TB / 2TB = 1 Disk
Disks_for_throughput = 1.4 MB/sec / 100 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 1320 / 100 = 14 Disk
Disks = 15

SSD(SATA):
Disks_for_capacity = 1 TB / 2TB = 1 Disk
Disks_for_throughput = 1.4 MB/sec / 500 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 1320 / 1000 = 2 Disk
Disks = 2

SSD(nVME):
Думаю нту смысла расчитать - SSD(SATA) устраивает

Выберу SSD(SATA) 2x2TB

```


### Коментарии
```
** RPS(Write) = 10 000 000 * 2 / 86400 = 240 req/sec
Expand All @@ -95,6 +226,37 @@ User Token + 10 * comment(1000 B) = 200 + 10000 = 10200 B
** Traffic(Read) = 10200 B * 1200 req/sec = ~ 0,013 GB/sec
```

#### Оценка дисков(Коментарии)

```
RPS(Write) = 240 req/sec
RPS(Read) = 1200 req/sec

Traffic(Write) = 2335 KB/sec
Traffic(Read) = 0,013 GB/sec

Capacity = 2335 KB/s * 86400 * 365 = 74 TB

HDD:
Disks_for_capacity = 74 TB / 30TB = 3 Disk
Disks_for_throughput = 15.3 MB/sec / 100 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 1440 / 100 = 15 Disk
Disks = 15

SSD(SATA):
Disks_for_capacity = 74 TB / 100TB = 1 Disk
Disks_for_throughput = 15.3 MB/sec / 500 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 1440 / 1000 = 2 Disk
Disks = 2

SSD(nVME):
Думаю нту смысла расчитать - SSD(SATA) устраивает

Выберу SSD(SATA) 2x60TB

```


### Подписки
```
** RPS(Write) = 10 000 000 * 0.5 / 86400 = 60 req/sec
Expand All @@ -104,6 +266,27 @@ User Token + follow_uid = 200 + 16 = 216 B
** Traffic(Write) = 216 B * 60 req/sec = 13 KB/sec
```

#### Оценка дисков(Подписки)

```
RPS(Write) = 60 req/sec
Traffic(Write) = 13 KB/sec

Capacity = 13 KB/s * 86400 * 365 = 0.5 TB

HDD:
Disks_for_capacity = 0.5 TB / 1TB = 1 Disk
Disks_for_throughput = 13 KB/sec / 100 МБ/с = 0.013 / 100 = 1 Disk
Disks_for_iops = iops / disk_iops = 60 / 100 = 1 Disk
Disks = 1

SSD: не считаю так как ужe понятно что HDD вполне устраивает

Выберу HDD 2x1TB

```


### Теги (популярные места)

Я думаю создание/обновление происходит во время создания публикаций. тоесть по логике нужно рпс публикаций дублировать
Expand All @@ -123,31 +306,72 @@ User Token + 50 * tag(100 B) = 200 + 5000 = 5200 B
** Traffic(Read) = 5200 B * 60 req/sec = 2496 KB/sec
```

### Фиды

Фиды по пользувателю
#### Оценка дисков(Теги)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Под теги будет отдельная БД?)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

da

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Насколько такое нужно - вопрос


```
** RPS(Read) = 10 000 000 * 30 / 86400 = ~ 3473 req/sec
RPS(Write) = 120 req/sec
RPS(Read) = 60 req/sec

Traffic(Write) = 304 KB/sec
Traffic(Read) = 2496 KB/sec

Capacity = 304 KB/s * 86400 * 365 = 9.6 TB

HDD:
Disks_for_capacity = 9.6 TB / 10TB = 3 Disk
Disks_for_throughput = 2800 KB/sec / 100 МБ/с = 2.8 / 100 = 1 Disk
Disks_for_iops = iops / disk_iops = 180 / 100 = 2 Disk
Disks = 2

SSD(SATA):
Disks_for_capacity = 9.6 TB / 10TB = 1 Disk
Disks_for_throughput = 2.8 MB/sec / 500 МБ/с = 1 Disk
Disks_for_iops = iops / disk_iops = 180 / 1000 = 1 Disk
Disks = 1

SSD(nVME):
Думаю нту смысла расчитать

Выберу HDD 2x10TB

Request user token (200B)
```

Response (
uid
description
created_at
Photos (x3)
tag
reaction (int)
) ~ 1300B x20 = 26000 B

** Traffic(Read) = 26000 * 3473 = ~0.1 GB/sec
---

```
Можно не смотреть - заметки для себя

После получения респонса, браузер делает еще 3x20=60 запросов по картинкам на s3
## Оценка подсистем хранения и нагрузки

```
** RPS(Read photos) = 10 000 000 * 60 / 86400 = ~6945 req/sec
** Traffic(Read photos) = 500000 * 6945 = ~ 3.48 GB/sec
```
| Подсистема | Тип дисков | Кол-во дисков | Capacity (TB) | RPS (Read / Write) | Traffic (GB/sec) | Описние |
|-------------|-------------|----------------|----------------|---------------------|------------------|----------------------------------------------------------|
| **S3 (Фото)** | SSD (SATA) | 57 | **5677 TB** | 6945 / 120 | 3.48 / 0.18 | Основное хранилище изображений, высокая нагрузка на чтение |
| **Посты (мета)** | NVMe SSD | 2 | **11 TB** | 3473 / 120 | 0.1 / 0.0003 | Основная таблица `posts` + `feed`; критично по latency |
| **Комментарии** | SSD (SATA) | 2 | **74 TB** | 1200 / 240 | 0.013 / 0.002 | Высокая частота чтения, можно кэшировать в Redis |
| **Оценки (ratings)** | SSD (SATA) | 2 | **1 TB** | 1200 / 120 | 0.0013 / 0.000027 | Малый объём, часто обновляется |
| **Подписки (follows)** | HDD | 2 | **0.5 TB** | 60 / 60 | 0.000013 | Низкая нагрузка, cold storage подходит |
| **Теги (tags)** | HDD | 2 | **9.6 TB** | 60 / 120 | 0.0028 / 0.0003 | Редкое обновление, можно кэшировать топ-результаты |

---

## Итоговая сводка

| Показатель | Значение |
|------------------------------|-----------------------------------------------------------------------------------------------|
| **Суммарный объём хранения** | ~ **5773 TB** |
| **Общее количество дисков** | ~ **65** (в основном под S3) |
| **Основное узкое место** | Чтение изображений из S3: **3.48 GB/sec** |
| **Основная горячая зона** | Feed (чтение постов + изображений) |
| **Тёплое хранилище** | Комментарии, посты |
| **Холодное хранилище** | Подписки, теги, оценки |
| **Стек хранения** | **S3 + CDN** (для фото), **PostgreSQL NVMe** (для posts/comments), **HDD** (для tags/follows) |

---

## Вывод

- 90% общего трафика и емкости — **S3**.
- Критически важна **скорость отдачи фото (CDN)**.
- Остальные таблицы можно хранить в **PostgreSQL + Redis cache**.
- Основная оптимизация — **разделение горячего (NVMe) и холодного (HDD/S3) слоёв данных**.
73 changes: 73 additions & 0 deletions database/diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
## Схема БД

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все будет в одной БД?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В целом нет не должно быть в одной бд. скоро разделю


```
Table users {
id uuid [pk]
full_name varchar(255)
avatar_url text
created_at timestamptz [default: `now()`]
updated_at timestamptz [default: `now()`]
}
Table tags {
id uuid [pk]
name varchar(50) [unique]
posts_count int [default: 0]
created_at timestamptz [default: `now()`]
updated_at timestamptz [default: `now()`]
}
Table posts {
id uuid [pk]
user_id uuid [not null, ref: > users.id]
description varchar(700)
tag_id uuid [not null, ref: > tags.id]
created_at timestamptz [not null, default: `now()`]
ratings_count int [not null, default: 0]
ratings_avg numeric(3,2)
}
Table media_asset {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Зачем вообще нужна вся эта метаинформация?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Уберу)

id uuid [pk]
owner_user_id uuid [ref: > users.id]
filename varchar(255)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Точно ли это нужно хранить?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net

content_type varchar(100)
file_url text [not null]
created_at timestamptz [default: `now()`]
}
Table post_photo {
id uuid [pk]
post_id uuid [not null, ref: > posts.id]
media_id uuid [not null, ref: > media_asset.id]
}
Table comments {
id uuid [pk]
post_id uuid [not null, ref: > posts.id]
user_id uuid [not null, ref: > users.id]
text varchar(500)
created_at timestamptz [not null, default: `now()`]
}
Table ratings {
post_id uuid [not null, ref: > posts.id]
user_id uuid [not null, ref: > users.id]
value int [not null]
created_at timestamptz [not null, default: `now()`]
updated_at timestamptz [not null, default: `now()`]
indexes {
(post_id,user_id) // unique
}
}
Table follows {
follower_user_id uuid [not null, ref: > users.id]
target_user_id uuid [not null, ref: > users.id]
created_at timestamptz [not null, default: `now()`]
indexes {
(target_user_id),
(follower_user_id)
}
}
```