From a07b2ad71d74bd473fd1d5b2c3d1da6383e7fee3 Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Wed, 30 Jul 2025 17:47:59 -0400 Subject: [PATCH] Update docs to recommend UUIDv4 explicitly Fixes DOC-9482 --- .../v25.3/faq/auto-generate-unique-ids.md | 16 +++++++++++++++- .../faq/differences-between-numberings.md | 2 +- src/current/v25.3/alter-table.md | 2 +- src/current/v25.3/create-sequence.md | 2 +- src/current/v25.3/hash-sharded-indexes.md | 1 + src/current/v25.3/liquibase.md | 2 +- src/current/v25.3/movr-flask-application.md | 2 +- .../performance-best-practices-overview.md | 11 ++++------- src/current/v25.3/row-level-security.md | 2 +- src/current/v25.3/schema-design-table.md | 4 ++-- src/current/v25.3/serial.md | 1 + src/current/v25.3/understand-hotspots.md | 4 ++-- src/current/v25.3/uuid.md | 19 +++++++++++++++---- 13 files changed, 46 insertions(+), 22 deletions(-) diff --git a/src/current/_includes/v25.3/faq/auto-generate-unique-ids.md b/src/current/_includes/v25.3/faq/auto-generate-unique-ids.md index ebe3262e6df..2124fb051ab 100644 --- a/src/current/_includes/v25.3/faq/auto-generate-unique-ids.md +++ b/src/current/_includes/v25.3/faq/auto-generate-unique-ids.md @@ -1,4 +1,14 @@ -To auto-generate unique row identifiers, you can use the `gen_random_uuid()`, `uuid_v4()`, or `unique_rowid()` [functions]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions). +To auto-generate unique row identifiers, you can use the following [functions]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions): + +- [Use `gen_random_uuid()`](#use-gen_random_uuid): Generates a UUIDv4 with `UUID` data type. +- [Use `uuid_v4()`](#use-uuid_v4): Generates a UUIDv4 with `BYTES` data type. +- [Use `unique_rowid()`](#use-unique_rowid): Generates a globally unique `INT` data type + +{{site.data.alerts.callout_success}} +{% include {{ page.version.version }}/sql/use-uuidv4.md %} +{{site.data.alerts.end}} + +#### Use `gen_random_uuid()` To use the [`UUID`]({% link {{ page.version.version }}/uuid.md %}) column with the `gen_random_uuid()` [function]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions) as the [default value]({% link {{ page.version.version }}/default-value.md %}): @@ -34,6 +44,8 @@ SELECT * FROM users; (3 rows) ~~~ +#### Use `uuid_v4()` + Alternatively, you can use the [`BYTES`]({% link {{ page.version.version }}/bytes.md %}) column with the `uuid_v4()` function as the default value: {% include_cached copy-clipboard.html %} @@ -72,6 +84,8 @@ In either case, generated IDs will be 128-bit, sufficiently large to generate un This approach has the disadvantage of creating a primary key that may not be useful in a query directly, which can require a join with another table or a secondary index. +#### Use `unique_rowid()` + If it is important for generated IDs to be stored in the same key-value range, you can use an [integer type]({% link {{ page.version.version }}/int.md %}) with the `unique_rowid()` [function]({% link {{ page.version.version }}/functions-and-operators.md %}#id-generation-functions) as the default value, either explicitly or via the [`SERIAL` pseudo-type]({% link {{ page.version.version }}/serial.md %}): {% include_cached copy-clipboard.html %} diff --git a/src/current/_includes/v25.3/faq/differences-between-numberings.md b/src/current/_includes/v25.3/faq/differences-between-numberings.md index 80f7fe26d50..95d1425f925 100644 --- a/src/current/_includes/v25.3/faq/differences-between-numberings.md +++ b/src/current/_includes/v25.3/faq/differences-between-numberings.md @@ -1,5 +1,5 @@ -| Property | UUID generated with `uuid_v4()` | INT generated with `unique_rowid()` | Sequences | +| Property | [UUID]({% link {{ page.version.version }}/uuid.md %}) generated with `uuid_v4()` | INT generated with `unique_rowid()` | Sequences | |--------------------------------------|-----------------------------------------|-----------------------------------------------|--------------------------------| | Size | 16 bytes | 8 bytes | 1 to 8 bytes | | Ordering properties | Unordered | Highly time-ordered | Highly time-ordered | diff --git a/src/current/v25.3/alter-table.md b/src/current/v25.3/alter-table.md index b2dd4c6a65f..7a64a0dca66 100644 --- a/src/current/v25.3/alter-table.md +++ b/src/current/v25.3/alter-table.md @@ -1852,7 +1852,7 @@ Suppose that you are storing the data for users of your application in a table c ); ~~~ -The primary key of this table is on the `name` column. This is a poor choice, as some users likely have the same name, and all primary keys enforce a `UNIQUE` constraint on row values of the primary key column. Per our [best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-functions-to-generate-unique-ids), you should instead use a `UUID` for single-column primary keys, and populate the rows of the table with generated, unique values. +The primary key of this table is on the `name` column. This is a poor choice, as some users likely have the same name, and all primary keys enforce a `UNIQUE` constraint on row values of the primary key column. Per our [best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-functions-to-generate-unique-ids), you should instead use a [`UUID`]({% link {{ page.version.version }}/uuid.md %}) for single-column primary keys, and populate the rows of the table with generated, unique values. You can add a column and change the primary key with a couple of `ALTER TABLE` statements: diff --git a/src/current/v25.3/create-sequence.md b/src/current/v25.3/create-sequence.md index e3bf4214fe9..e1f2346fdea 100644 --- a/src/current/v25.3/create-sequence.md +++ b/src/current/v25.3/create-sequence.md @@ -11,7 +11,7 @@ The `CREATE SEQUENCE` [statement]({% link {{ page.version.version }}/sql-stateme ## Considerations -- Using a sequence is slower than [auto-generating unique IDs with the `gen_random_uuid()`, `uuid_v4()` or `unique_rowid()` built-in functions]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb). Incrementing a sequence requires a write to persistent storage, whereas auto-generating a unique ID does not. Therefore, use auto-generated unique IDs unless an incremental sequence is preferred or required. For more information, see [Unique ID best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices). +- Using a sequence is slower than [auto-generating unique IDs with the `gen_random_uuid()`, `uuid_v4()` or `unique_rowid()` built-in functions]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) and is likely to cause performance problems due to [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). Incrementing a sequence requires a write to persistent storage, whereas auto-generating a unique ID does not. Therefore, use auto-generated unique IDs unless an incremental sequence is preferred or required. For more information, see [Unique ID best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices). - A column that uses a sequence can have a gap in the sequence values if a transaction advances the sequence and is then rolled back. Sequence updates are committed immediately and aren't rolled back along with their containing transaction. This is done to avoid blocking concurrent transactions that use the same sequence. - {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %} - By default, you cannot create sequences that are [owned by]({% link {{ page.version.version }}/security-reference/authorization.md %}#object-ownership) columns in tables in other databases. You can enable such sequence creation by setting the `sql.cross_db_sequence_owners.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) to `true`. diff --git a/src/current/v25.3/hash-sharded-indexes.md b/src/current/v25.3/hash-sharded-indexes.md index 9ee74d592a8..7151690e19b 100644 --- a/src/current/v25.3/hash-sharded-indexes.md +++ b/src/current/v25.3/hash-sharded-indexes.md @@ -170,3 +170,4 @@ You can specify a different `bucket_count` via a storage parameter on a hash-sha - [Indexes]({% link {{ page.version.version }}/indexes.md %}) - [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %}) +- [`UUID`]({% link {{ page.version.version }}/uuid.md %}) diff --git a/src/current/v25.3/liquibase.md b/src/current/v25.3/liquibase.md index b3024c0a791..673fe06f22c 100644 --- a/src/current/v25.3/liquibase.md +++ b/src/current/v25.3/liquibase.md @@ -332,7 +332,7 @@ Liquibase does not [retry transactions]({% link {{ page.version.version }}/trans Suppose that you want to change the primary key of the `accounts` table from a simple, incrementing [integer]({% link {{ page.version.version }}/int.md %}) (in this case, `id`) to an auto-generated [UUID]({% link {{ page.version.version }}/uuid.md %}), to follow some [CockroachDB best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#unique-id-best-practices). You can make these changes to the schema by creating and executing an additional changeset: -1. Create a SQL file to add a new UUID-typed column to the table: +1. Create a SQL file to add a new [UUID]({% link {{ page.version.version }}/uuid.md %})-typed column to the table: {% include_cached copy-clipboard.html %} ~~~ shell diff --git a/src/current/v25.3/movr-flask-application.md b/src/current/v25.3/movr-flask-application.md index c3b8f437a75..68b2a10b900 100644 --- a/src/current/v25.3/movr-flask-application.md +++ b/src/current/v25.3/movr-flask-application.md @@ -110,7 +110,7 @@ The `User` class has the following attributes: - `__tablename__`, which holds the stored name of the table in the database. SQLAlchemy requires this attribute for all classes that map to tables. - All of the other attributes of the `User` class (`id`, `city`, `first_name`, etc.), stored as `Column` objects. These attributes represent columns of the `users` table. The constructor for each `Column` takes the column data type as its first argument, and then any additional arguments, such as `primary_key`. -- To help define column objects, SQLAlchemy also includes classes for SQL data types and column constraints. For the columns in this table, we use `UUID` and `String` data types. +- To help define column objects, SQLAlchemy also includes classes for SQL data types and column constraints. For the columns in this table, we use [`UUID`]({% link {{ page.version.version }}/uuid.md %}) and `String` data types. - The `__repr__` function, which defines the string representation of the object. #### The `Vehicle` class diff --git a/src/current/v25.3/performance-best-practices-overview.md b/src/current/v25.3/performance-best-practices-overview.md index 1b88e654f6f..a3e810a84c7 100644 --- a/src/current/v25.3/performance-best-practices-overview.md +++ b/src/current/v25.3/performance-best-practices-overview.md @@ -64,12 +64,7 @@ When a table is created, all columns are stored as a single column family. This ## Unique ID best practices -The best practices for generating unique IDs in a distributed database like CockroachDB are very different than for a legacy single-node database. Traditional approaches for generating unique IDs for legacy single-node databases include: - -1. Using the [`SERIAL`]({% link {{ page.version.version }}/serial.md %}) pseudo-type for a column to generate random unique IDs. This can result in a performance bottleneck because IDs generated temporally near each other have similar values and are located physically near each other in a table's storage. -1. Generating monotonically increasing [`INT`]({% link {{ page.version.version }}/int.md %}) IDs by using transactions with roundtrip [`SELECT`]({% link {{ page.version.version }}/select-clause.md %})s, e.g., `INSERT INTO tbl (id, …) VALUES ((SELECT max(id)+1 FROM tbl), …)`. This has a **very high performance cost** since it makes all [`INSERT`]({% link {{ page.version.version }}/insert.md %}) transactions wait for their turn to insert the next ID. You should only do this if your application really does require strict ID ordering. In some cases, using [change data capture (CDC)]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can help avoid the requirement for strict ID ordering. If you can avoid the requirement for strict ID ordering, you can use one of the higher-performance ID strategies outlined in the following sections. - -The preceding approaches are likely to create [hotspots](#hotspots) for both reads and writes in CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %} +The best practices for generating unique IDs in a distributed database like CockroachDB are very different than for a legacy single-node database. To create unique and non-sequential IDs, we recommend the following approaches (listed in order from best to worst performance): @@ -79,6 +74,8 @@ To create unique and non-sequential IDs, we recommend the following approaches ( | 2. [Use functions to generate unique IDs](#use-functions-to-generate-unique-ids) | Good performance; spreads load well; easy choice | May leave some performance on the table; requires other columns to be useful in queries | | 3. [Use `INSERT` with the `RETURNING` clause](#use-insert-with-the-returning-clause-to-generate-unique-ids) | Easy to query against; familiar design | Slower performance than the other options; higher chance of [transaction contention](#transaction-contention) | +Traditional approaches using monotonically increasing [`INT`]({% link {{ page.version.version }}/int.md %}) or [`SERIAL`]({% link {{ page.version.version }}/serial.md %}) data types will create [hotspots](#hotspots) for both reads and writes in a distributed database like CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %} + ### Use multi-column primary keys A well-designed multi-column primary key can yield even better performance than a [UUID primary key](#use-functions-to-generate-unique-ids), but it requires more up-front schema design work. To get the best performance, ensure that any monotonically increasing field is located **after** the first column of the primary key. When done right, such a composite primary key should result in: @@ -155,7 +152,7 @@ Time: 1ms total (execution 1ms / network 0ms) Note that the above query also follows the [indexing best practice]({% link {{ page.version.version }}/indexes.md %}#best-practices) of indexing all columns in the `WHERE` clause. -### Use functions to generate unique IDs +### Use functions to automatically generate unique IDs {% include {{ page.version.version }}/faq/auto-generate-unique-ids.md %} diff --git a/src/current/v25.3/row-level-security.md b/src/current/v25.3/row-level-security.md index 5791c16d78e..ba9545566b0 100644 --- a/src/current/v25.3/row-level-security.md +++ b/src/current/v25.3/row-level-security.md @@ -404,7 +404,7 @@ GRANT SELECT, INSERT, UPDATE, DELETE ON invoices TO app_dev; Each application will need to set the tenant context for the session. In this example, you will use the `application_name` session variable to pass in a tenant ID that will later be extracted from the variable. -Specifically, the UUID following the period in `application_name` is the tenant ID. We will use the `current_setting()` function in our RLS policies to extract the ID. +Specifically, the [UUID]({% link {{ page.version.version }}/uuid.md %}) following the period in `application_name` is the tenant ID. We will use the `current_setting()` function in our RLS policies to extract the ID. {{site.data.alerts.callout_danger}} For multi-tenancy to work correctly, this setting **must** be reliably managed by the application layer and passed in the connection string. diff --git a/src/current/v25.3/schema-design-table.md b/src/current/v25.3/schema-design-table.md index dd32022260d..01b774cf40e 100644 --- a/src/current/v25.3/schema-design-table.md +++ b/src/current/v25.3/schema-design-table.md @@ -288,9 +288,9 @@ For detailed reference documentation for each supported constraint, see [the con To set default values on columns, use the `DEFAULT` constraint. Default values enable you to write queries without the need to specify values for every column. -When combined with [supported SQL functions]({% link {{ page.version.version }}/functions-and-operators.md %}), default values can save resources in your application's persistence layer by offloading computation onto CockroachDB. For example, rather than using an application library to generate unique `UUID` values, you can set a default value to be an automatically-generated `UUID` value with the `gen_random_uuid()` SQL function. Similarly, you could use a default value to populate a `TIMESTAMP` column with the current time of day, using the `now()` function. +When combined with [supported SQL functions]({% link {{ page.version.version }}/functions-and-operators.md %}), default values can save resources in your application's persistence layer by offloading computation onto CockroachDB. For example, rather than using an application library to generate unique [`UUID`]({% link {{ page.version.version }}/uuid.md %}) values, you can set a default value to be an automatically-generated `UUID` value with the `gen_random_uuid()` SQL function. Similarly, you could use a default value to populate a `TIMESTAMP` column with the current time of day, using the `now()` function. -For example, in the `vehicles` table definition in `max_init.sql`, you added a `DEFAULT gen_random_uuid()` clause to the `id` column definition. This set the default value to a generated `UUID` value. Now, add a default value to the `creation_time` column: +For example, in the `vehicles` table definition in `max_init.sql`, you added a `DEFAULT gen_random_uuid()` clause to the `id` column definition. This set the default value to a generated [`UUID`]({% link {{ page.version.version }}/uuid.md %}) value. Now, add a default value to the `creation_time` column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.3/serial.md b/src/current/v25.3/serial.md index 9a3a74c7eab..5f1fd62c323 100644 --- a/src/current/v25.3/serial.md +++ b/src/current/v25.3/serial.md @@ -242,3 +242,4 @@ When we insert rows without values in column `a` and display the new rows, we se - [FAQ: How do I auto-generate unique row IDs in CockroachDB?]({% link {{ page.version.version }}/sql-faqs.md %}#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) - [Data Types]({% link {{ page.version.version }}/data-types.md %}) +- [`UUID`]({% link {{ page.version.version }}/uuid.md %}) diff --git a/src/current/v25.3/understand-hotspots.md b/src/current/v25.3/understand-hotspots.md index 154fdf93d05..b418f7e2091 100644 --- a/src/current/v25.3/understand-hotspots.md +++ b/src/current/v25.3/understand-hotspots.md @@ -141,7 +141,7 @@ On this page, the phrase _index hotspot_ will be reserved for a hot by write hot #### Resolving index hotspots -The resolution of the index hotspot often depends on your requirements for the data. If the sequential nature of the index serves no purpose, it is recommended to change the writes into the index to be randomly distributed. Ideally, primary keys in this instance would be set to `UUID`s, if your tolerance for swapover or even downtime allows it. +The resolution of the index hotspot often depends on your requirements for the data. If the sequential nature of the index serves no purpose, it is recommended to change the writes into the index to be randomly distributed. Ideally, primary keys in this instance would be set to [`UUID`]({% link {{ page.version.version }}/uuid.md %})s, if your tolerance for swapover or even downtime allows it. If inserting in sequential order is important, the index itself can be [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), which means that it is still stored in order, albeit in some number of shards. Consider a `users` table, with a primary key `id INT`, which is hash-sharded with 4 shards, and a hashing function of modulo 4. The following image illustrates this example: @@ -339,4 +339,4 @@ For a demo on hotspot reduction, watch the following video: - [Detect Hotspots]({% link {{ page.version.version }}/detect-hotspots.md %}) - [Performance Tuning Recipes: Hotspots]({% link {{ page.version.version }}/performance-recipes.md %}#hotspots) -- [Single hot node]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#single-hot-node) \ No newline at end of file +- [Single hot node]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#single-hot-node) diff --git a/src/current/v25.3/uuid.md b/src/current/v25.3/uuid.md index 5d097099a99..b21c5246dfd 100644 --- a/src/current/v25.3/uuid.md +++ b/src/current/v25.3/uuid.md @@ -5,10 +5,17 @@ toc: true docs_area: reference.sql --- -The `UUID` (Universally Unique Identifier) [data type]({% link {{ page.version.version }}/data-types.md %}) stores a 128-bit value that is [unique across both space and time](https://www.ietf.org/rfc/rfc4122.txt). +The `UUID` (Universally Unique Identifier) [data type]({% link {{ page.version.version }}/data-types.md %}) implements the UUIDv4 format from [RFC 4122](https://www.ietf.org/rfc/rfc4122.txt). It stores a 128-bit value that is "unique across both space and time, with respect to the space of all UUIDs" as specified by the RFC. + +To auto-generate UUIDs: + +- Use the `gen_random_uuid()` function as the default value of the `UUID` data type. +- Use the `uuid_v4()` function as the default value of the [`BYTES`]({% link {{ page.version.version }}/bytes.md %}) data type. + +For examples, refer to [Create a table with auto-generated unique row IDs](#create-a-table-with-auto-generated-unique-row-ids). {{site.data.alerts.callout_success}} -To auto-generate unique row identifiers, use [`UUID`]({% link {{ page.version.version }}/uuid.md %}) with the `gen_random_uuid()` function as the default value. See the [example](#create-a-table-with-auto-generated-unique-row-ids) below for more details. +{% include {{ page.version.version }}/sql/use-uuidv4.md %} {{site.data.alerts.end}} ## Syntax @@ -17,12 +24,12 @@ You can express `UUID` values using the following formats: Format | Description -------|------------- -Standard [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) format | Hyphen-separated groups of 8, 4, 4, 4, and 12 hexadecimal digits.

Example: `acde070d-8c4c-4f0d-9d8a-162843c10333` +Standard [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) (UUIDv4) format | Hyphen-separated groups of 8, 4, 4, 4, and 12 hexadecimal digits.

Example: `acde070d-8c4c-4f0d-9d8a-162843c10333` `BYTES` | `UUID` value specified as a [`BYTES`]({% link {{ page.version.version }}/bytes.md %}) value.

Example: `b'kafef00ddeadbeed'` Uniform Resource Name | A [Uniform Resource Name (URN)](https://www.ietf.org/rfc/rfc2141.txt) specified as "urn:uuid:" followed by the [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) format.

Example: `urn:uuid:63616665-6630-3064-6465-616462656564` Alternate PostgreSQL-supported formats | All [alternate `UUID` formats supported by PostgreSQL](https://www.postgresql.org/docs/current/datatype-uuid.html), including the [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) format surrounded by braces, any supported format with upper-case digits, any supported format with some or all hyphens omitted, and any supported format with hyphens after any group of four digits.

Examples: `{acde070d-8c4c-4f0d-9d8a-162843c10333}`, `ACDE070D-8C4C-4f0D-9d8A-162843c10333`, `acde070d8c4c4f0d9d8a162843c10333`, `acde-070d-8c4c-4f0d-9d8a-1628-43c1-0333` -CockroachDB displays all `UUID` values in the standard [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) format. +CockroachDB displays all `UUID` values in the standard [RFC4122](http://www.ietf.org/rfc/rfc4122.txt) format, and implements the UUIDv4 (random) version from the RFC. ## Size @@ -32,6 +39,10 @@ A `UUID` value is 128 bits in width, but the total storage size is likely to be ### Create a table with manually-entered `UUID` values +{{site.data.alerts.callout_success}} +{% include {{ page.version.version }}/sql/use-uuidv4.md %} +{{site.data.alerts.end}} + #### Create a table with `UUID` in standard [RFC4122](http://www.ietf.org/rfc/rfc4122.txt)-specified format {% include_cached copy-clipboard.html %}