Skip to content

Update docs to recommend UUIDv4 explicitly #20031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion src/current/_includes/v25.3/faq/auto-generate-unique-ids.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
To auto-generate unique row identifiers, you can use the `gen_random_uuid()`, `uuid_v4()`, or `unique_rowid()` [functions]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions).
To auto-generate unique row identifiers, you can use the following [functions]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions):

- [Use `gen_random_uuid()`](#use-gen_random_uuid): Generates a UUIDv4 with `UUID` data type.
- [Use `uuid_v4()`](#use-uuid_v4): Generates a UUIDv4 with `BYTES` data type.
- [Use `unique_rowid()`](#use-unique_rowid): Generates a globally unique `INT` data type

{{site.data.alerts.callout_success}}
{% include {{ page.version.version }}/sql/use-uuidv4.md %}
{{site.data.alerts.end}}

#### Use `gen_random_uuid()`

To use the [`UUID`]({% link {{ page.version.version }}/uuid.md %}) column with the `gen_random_uuid()` [function]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions) as the [default value]({% link {{ page.version.version }}/default-value.md %}):

Expand Down Expand Up @@ -34,6 +44,8 @@ SELECT * FROM users;
(3 rows)
~~~

#### Use `uuid_v4()`

Alternatively, you can use the [`BYTES`]({% link {{ page.version.version }}/bytes.md %}) column with the `uuid_v4()` function as the default value:

{% include_cached copy-clipboard.html %}
Expand Down Expand Up @@ -72,6 +84,8 @@ In either case, generated IDs will be 128-bit, sufficiently large to generate un

This approach has the disadvantage of creating a primary key that may not be useful in a query directly, which can require a join with another table or a secondary index.

#### Use `unique_rowid()`

If it is important for generated IDs to be stored in the same key-value range, you can use an [integer type]({% link {{ page.version.version }}/int.md %}) with the `unique_rowid()` [function]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions) as the default value, either explicitly or via the [`SERIAL` pseudo-type]({% link {{ page.version.version }}/serial.md %}):

{% include_cached copy-clipboard.html %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

| Property | UUID generated with `uuid_v4()` | INT generated with `unique_rowid()` | Sequences |
| Property | [UUID]({% link {{ page.version.version }}/uuid.md %}) generated with `uuid_v4()` | INT generated with `unique_rowid()` | Sequences |
|--------------------------------------|-----------------------------------------|-----------------------------------------------|--------------------------------|
| Size | 16 bytes | 8 bytes | 1 to 8 bytes |
| Ordering properties | Unordered | Highly time-ordered | Highly time-ordered |
Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -1852,7 +1852,7 @@ Suppose that you are storing the data for users of your application in a table c
);
~~~

The primary key of this table is on the `name` column. This is a poor choice, as some users likely have the same name, and all primary keys enforce a `UNIQUE` constraint on row values of the primary key column. Per our [best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-functions-to-generate-unique-ids), you should instead use a `UUID` for single-column primary keys, and populate the rows of the table with generated, unique values.
The primary key of this table is on the `name` column. This is a poor choice, as some users likely have the same name, and all primary keys enforce a `UNIQUE` constraint on row values of the primary key column. Per our [best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-functions-to-generate-unique-ids), you should instead use a [`UUID`]({% link {{ page.version.version }}/uuid.md %}) for single-column primary keys, and populate the rows of the table with generated, unique values.

You can add a column and change the primary key with a couple of `ALTER TABLE` statements:

Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/create-sequence.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The `CREATE SEQUENCE` [statement]({% link {{ page.version.version }}/sql-stateme

## Considerations

- Using a sequence is slower than [auto-generating unique IDs with the `gen_random_uuid()`, `uuid_v4()` or `unique_rowid()` built-in functions]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb). Incrementing a sequence requires a write to persistent storage, whereas auto-generating a unique ID does not. Therefore, use auto-generated unique IDs unless an incremental sequence is preferred or required. For more information, see [Unique ID best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices).
- Using a sequence is slower than [auto-generating unique IDs with the `gen_random_uuid()`, `uuid_v4()` or `unique_rowid()` built-in functions]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) and is likely to cause performance problems due to [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). Incrementing a sequence requires a write to persistent storage, whereas auto-generating a unique ID does not. Therefore, use auto-generated unique IDs unless an incremental sequence is preferred or required. For more information, see [Unique ID best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices).
- A column that uses a sequence can have a gap in the sequence values if a transaction advances the sequence and is then rolled back. Sequence updates are committed immediately and aren't rolled back along with their containing transaction. This is done to avoid blocking concurrent transactions that use the same sequence.
- {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %}
- By default, you cannot create sequences that are [owned by]({% link {{ page.version.version }}/security-reference/authorization.md %}#object-ownership) columns in tables in other databases. You can enable such sequence creation by setting the `sql.cross_db_sequence_owners.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) to `true`.
Expand Down
1 change: 1 addition & 0 deletions src/current/v25.3/hash-sharded-indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,4 @@ You can specify a different `bucket_count` via a storage parameter on a hash-sha

- [Indexes]({% link {{ page.version.version }}/indexes.md %})
- [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %})
- [`UUID`]({% link {{ page.version.version }}/uuid.md %})
2 changes: 1 addition & 1 deletion src/current/v25.3/liquibase.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ Liquibase does not [retry transactions]({% link {{ page.version.version }}/trans

Suppose that you want to change the primary key of the `accounts` table from a simple, incrementing [integer]({% link {{ page.version.version }}/int.md %}) (in this case, `id`) to an auto-generated [UUID]({% link {{ page.version.version }}/uuid.md %}), to follow some [CockroachDB best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices). You can make these changes to the schema by creating and executing an additional changeset:

1. Create a SQL file to add a new UUID-typed column to the table:
1. Create a SQL file to add a new [UUID]({% link {{ page.version.version }}/uuid.md %})-typed column to the table:

{% include_cached copy-clipboard.html %}
~~~ shell
Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/movr-flask-application.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ The `User` class has the following attributes:

- `__tablename__`, which holds the stored name of the table in the database. SQLAlchemy requires this attribute for all classes that map to tables.
- All of the other attributes of the `User` class (`id`, `city`, `first_name`, etc.), stored as `Column` objects. These attributes represent columns of the `users` table. The constructor for each `Column` takes the column data type as its first argument, and then any additional arguments, such as `primary_key`.
- To help define column objects, SQLAlchemy also includes classes for SQL data types and column constraints. For the columns in this table, we use `UUID` and `String` data types.
- To help define column objects, SQLAlchemy also includes classes for SQL data types and column constraints. For the columns in this table, we use [`UUID`]({% link {{ page.version.version }}/uuid.md %}) and `String` data types.
- The `__repr__` function, which defines the string representation of the object.

#### The `Vehicle` class
Expand Down
11 changes: 4 additions & 7 deletions src/current/v25.3/performance-best-practices-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,7 @@ When a table is created, all columns are stored as a single column family. This

## Unique ID best practices

The best practices for generating unique IDs in a distributed database like CockroachDB are very different than for a legacy single-node database. Traditional approaches for generating unique IDs for legacy single-node databases include:

1. Using the [`SERIAL`]({% link {{ page.version.version }}/serial.md %}) pseudo-type for a column to generate random unique IDs. This can result in a performance bottleneck because IDs generated temporally near each other have similar values and are located physically near each other in a table's storage.
1. Generating monotonically increasing [`INT`]({% link {{ page.version.version }}/int.md %}) IDs by using transactions with roundtrip [`SELECT`]({% link {{ page.version.version }}/select-clause.md %})s, e.g., `INSERT INTO tbl (id, …) VALUES ((SELECT max(id)+1 FROM tbl), …)`. This has a **very high performance cost** since it makes all [`INSERT`]({% link {{ page.version.version }}/insert.md %}) transactions wait for their turn to insert the next ID. You should only do this if your application really does require strict ID ordering. In some cases, using [change data capture (CDC)]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can help avoid the requirement for strict ID ordering. If you can avoid the requirement for strict ID ordering, you can use one of the higher-performance ID strategies outlined in the following sections.

The preceding approaches are likely to create [hotspots](#hotspots) for both reads and writes in CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %}
The best practices for generating unique IDs in a distributed database like CockroachDB are very different than for a legacy single-node database.

To create unique and non-sequential IDs, we recommend the following approaches (listed in order from best to worst performance):

Expand All @@ -79,6 +74,8 @@ To create unique and non-sequential IDs, we recommend the following approaches (
| 2. [Use functions to generate unique IDs](#use-functions-to-generate-unique-ids) | Good performance; spreads load well; easy choice | May leave some performance on the table; requires other columns to be useful in queries |
| 3. [Use `INSERT` with the `RETURNING` clause](#use-insert-with-the-returning-clause-to-generate-unique-ids) | Easy to query against; familiar design | Slower performance than the other options; higher chance of [transaction contention](#transaction-contention) |

Traditional approaches using monotonically increasing [`INT`]({% link {{ page.version.version }}/int.md %}) or [`SERIAL`]({% link {{ page.version.version }}/serial.md %}) data types will create [hotspots](#hotspots) for both reads and writes in a distributed database like CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %}

### Use multi-column primary keys

A well-designed multi-column primary key can yield even better performance than a [UUID primary key](#use-functions-to-generate-unique-ids), but it requires more up-front schema design work. To get the best performance, ensure that any monotonically increasing field is located **after** the first column of the primary key. When done right, such a composite primary key should result in:
Expand Down Expand Up @@ -155,7 +152,7 @@ Time: 1ms total (execution 1ms / network 0ms)

Note that the above query also follows the [indexing best practice]({% link {{ page.version.version }}/indexes.md %}#best-practices) of indexing all columns in the `WHERE` clause.

### Use functions to generate unique IDs
### Use functions to automatically generate unique IDs

{% include {{ page.version.version }}/faq/auto-generate-unique-ids.md %}

Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/row-level-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ GRANT SELECT, INSERT, UPDATE, DELETE ON invoices TO app_dev;

Each application will need to set the tenant context for the session. In this example, you will use the `application_name` session variable to pass in a tenant ID that will later be extracted from the variable.

Specifically, the UUID following the period in `application_name` is the tenant ID. We will use the `current_setting()` function in our RLS policies to extract the ID.
Specifically, the [UUID]({% link {{ page.version.version }}/uuid.md %}) following the period in `application_name` is the tenant ID. We will use the `current_setting()` function in our RLS policies to extract the ID.

{{site.data.alerts.callout_danger}}
For multi-tenancy to work correctly, this setting **must** be reliably managed by the application layer and passed in the connection string.
Expand Down
4 changes: 2 additions & 2 deletions src/current/v25.3/schema-design-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,9 +288,9 @@ For detailed reference documentation for each supported constraint, see [the con

To set default values on columns, use the `DEFAULT` constraint. Default values enable you to write queries without the need to specify values for every column.

When combined with [supported SQL functions]({% link {{ page.version.version }}/functions-and-operators.md %}), default values can save resources in your application's persistence layer by offloading computation onto CockroachDB. For example, rather than using an application library to generate unique `UUID` values, you can set a default value to be an automatically-generated `UUID` value with the `gen_random_uuid()` SQL function. Similarly, you could use a default value to populate a `TIMESTAMP` column with the current time of day, using the `now()` function.
When combined with [supported SQL functions]({% link {{ page.version.version }}/functions-and-operators.md %}), default values can save resources in your application's persistence layer by offloading computation onto CockroachDB. For example, rather than using an application library to generate unique [`UUID`]({% link {{ page.version.version }}/uuid.md %}) values, you can set a default value to be an automatically-generated `UUID` value with the `gen_random_uuid()` SQL function. Similarly, you could use a default value to populate a `TIMESTAMP` column with the current time of day, using the `now()` function.

For example, in the `vehicles` table definition in `max_init.sql`, you added a `DEFAULT gen_random_uuid()` clause to the `id` column definition. This set the default value to a generated `UUID` value. Now, add a default value to the `creation_time` column:
For example, in the `vehicles` table definition in `max_init.sql`, you added a `DEFAULT gen_random_uuid()` clause to the `id` column definition. This set the default value to a generated [`UUID`]({% link {{ page.version.version }}/uuid.md %}) value. Now, add a default value to the `creation_time` column:

{% include_cached copy-clipboard.html %}
~~~ sql
Expand Down
1 change: 1 addition & 0 deletions src/current/v25.3/serial.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,4 @@ When we insert rows without values in column `a` and display the new rows, we se

- [FAQ: How do I auto-generate unique row IDs in CockroachDB?]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb)
- [Data Types]({% link {{ page.version.version }}/data-types.md %})
- [`UUID`]({% link {{ page.version.version }}/uuid.md %})
4 changes: 2 additions & 2 deletions src/current/v25.3/understand-hotspots.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ On this page, the phrase _index hotspot_ will be reserved for a hot by write hot

#### Resolving index hotspots

The resolution of the index hotspot often depends on your requirements for the data. If the sequential nature of the index serves no purpose, it is recommended to change the writes into the index to be randomly distributed. Ideally, primary keys in this instance would be set to `UUID`s, if your tolerance for swapover or even downtime allows it.
The resolution of the index hotspot often depends on your requirements for the data. If the sequential nature of the index serves no purpose, it is recommended to change the writes into the index to be randomly distributed. Ideally, primary keys in this instance would be set to [`UUID`]({% link {{ page.version.version }}/uuid.md %})s, if your tolerance for swapover or even downtime allows it.

If inserting in sequential order is important, the index itself can be [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), which means that it is still stored in order, albeit in some number of shards. Consider a `users` table, with a primary key `id INT`, which is hash-sharded with 4 shards, and a hashing function of modulo 4. The following image illustrates this example:

Expand Down Expand Up @@ -339,4 +339,4 @@ For a demo on hotspot reduction, watch the following video:

- [Detect Hotspots]({% link {{ page.version.version }}/detect-hotspots.md %})
- [Performance Tuning Recipes: Hotspots]({% link {{ page.version.version }}/performance-recipes.md %}#hotspots)
- [Single hot node]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#single-hot-node)
- [Single hot node]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#single-hot-node)
Loading
Loading