From 1885cd18a2956771a031e6665497f090fd69b979 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 30 Jul 2025 12:11:34 -0400 Subject: [PATCH 1/5] CITEXT data type --- src/current/v25.3/citext.md | 122 ++++++++++++++++++++++++++++++++ src/current/v25.3/data-types.md | 1 + src/current/v25.3/string.md | 1 + 3 files changed, 124 insertions(+) create mode 100644 src/current/v25.3/citext.md diff --git a/src/current/v25.3/citext.md b/src/current/v25.3/citext.md new file mode 100644 index 00000000000..293f2c21fdb --- /dev/null +++ b/src/current/v25.3/citext.md @@ -0,0 +1,122 @@ +--- +title: CITEXT +summary: The CITEXT data type stores case-insensitive text values. +toc: true +docs_area: reference.sql +--- + +The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. + +All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. + +For example, the following comparison evaluates to `true`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT 'Roach'::CITEXT = 'roach'::CITEXT; +~~~ + +~~~ + ?column? +------------ + t +~~~ + +With `CITEXT`, equality operators (`=`, `!=`, `<>`), ordering operators (`<`, `>`, etc.), and [`STRING` functions]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions), treat values as case-insensitive by default. Refer to the [example](#example). + +Aside from comparisons, `CITEXT` behaves like [`STRING`]({% link {{ page.version.version }}/string.md %}). + +## Syntax + +To declare a `CITEXT` column, use the type name directly in your `CREATE TABLE` statement: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE TABLE logins ( + name CITEXT PRIMARY KEY, + email TEXT NOT NULL +); +~~~ + +## Size + +As with `STRING`, `CITEXT` values should be kept below 64 KB for best performance. Because `CITEXT` values are folded to lowercase on every comparison, `CITEXT` columns and indexes consume marginally more CPU and memory than their `STRING` equivalents, especially on write-heavy workloads. + +## Collations + +`CITEXT` compares values as a `STRING` column with the `und-u-ks-level2` [collation]({% link {{ page.version.version }}/collate.md %}), meaning it is case-insensitive but accent-sensitive. If you need accent-insensitive behavior, consider using `STRING` with a nondeterministic collation instead. + +## Example + +Create and populate a table: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE TABLE logins ( + username CITEXT, + email STRING +); +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +INSERT INTO logins VALUES +('roach', 'roach@example.com'), +('juno', 'juno@example.com'); +~~~ + +Because `CITEXT` comparisons are case-insensitive, an equality predicate matches regardless of letter case: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT * FROM logins WHERE username = 'Roach'; +~~~ + +~~~ + name | email +--------+-------------------- + roach | roach@example.com +(1 row) +~~~ + +An ordering comparison is also case-insensitive with `CITEXT`. In the following eaxmple, `'Xavi'` is folded to lowercase before the comparison: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT username FROM logins WHERE username < 'Xavi'; +~~~ + +~~~ + username +------------ + roach + juno +(2 rows) +~~~ + +For case-sensitive comparisons on `CITEXT` values, cast to `STRING` explicitly. In the default Unicode ordering, the uppercase value is considered less than the lowercase values in the table: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT username FROM logins WHERE username::STRING < 'Xavi'; +~~~ + +~~~ + username | email +-----------+-------- +(0 rows) +~~~ + +## Supported casting and conversion + +`CITEXT` values can be [cast]({% link {{ page.version.version }}/data-types.md %}#data-type-conversions-and-casts) to the following data types: + +Type | Details +-----|-------- +`STRING` | Preserves case information when casting to `STRING`. + +## See also + +- [`STRING`]({% link {{ page.version.version }}/string.md %}) +- [Data Types]({% link {{ page.version.version }}/data-types.md %}) +- [`COLLATE`]({% link {{ page.version.version }}/collate.md %}) diff --git a/src/current/v25.3/data-types.md b/src/current/v25.3/data-types.md index ab14f2bee51..b198a07c886 100644 --- a/src/current/v25.3/data-types.md +++ b/src/current/v25.3/data-types.md @@ -16,6 +16,7 @@ Type | Description | Example [`BOOL`]({% link {{ page.version.version }}/bool.md %}) | A Boolean value. | `true` [`BYTES`]({% link {{ page.version.version }}/bytes.md %}) | A string of binary characters. | `b'\141\061\142\062\143\063'` [`COLLATE`]({% link {{ page.version.version }}/collate.md %}) | The `COLLATE` feature lets you sort [`STRING`]({% link {{ page.version.version }}/string.md %}) values according to language- and country-specific rules, known as collations. | `'a1b2c3' COLLATE en` +[`CITEXT`]({% link {{ page.version.version }}/citext.md %}) | Case-insensitive text. | `'Roach'` [`DATE`]({% link {{ page.version.version }}/date.md %}) | A date. | `DATE '2016-01-25'` [`ENUM`]({% link {{ page.version.version }}/enum.md %}) | A user-defined data type comprised of a set of static values. | `ENUM ('club, 'diamond', 'heart', 'spade')` [`DECIMAL`]({% link {{ page.version.version }}/decimal.md %}) | An exact, fixed-point number. | `1.2345` diff --git a/src/current/v25.3/string.md b/src/current/v25.3/string.md index 7fea14d9eca..ee7964b4723 100644 --- a/src/current/v25.3/string.md +++ b/src/current/v25.3/string.md @@ -135,6 +135,7 @@ Type | Details `BIT` | Requires supported [`BIT`]({% link {{ page.version.version }}/bit.md %}) string format, e.g., `'101001'` or `'xAB'`. `BOOL` | Requires supported [`BOOL`]({% link {{ page.version.version }}/bool.md %}) string format, e.g., `'true'`. `BYTES` | For more details, [see here]({% link {{ page.version.version }}/bytes.md %}#supported-conversions). +`CITEXT` | Preserves the original letter case, but value comparisons are treated case-insensitively. Refer to [`CITEXT`]({% link {{ page.version.version }}/citext.md %}). `DATE` | Requires supported [`DATE`]({% link {{ page.version.version }}/date.md %}) string format, e.g., `'2016-01-25'`. `DECIMAL` | Requires supported [`DECIMAL`]({% link {{ page.version.version }}/decimal.md %}) string format, e.g., `'1.1'`. `FLOAT` | Requires supported [`FLOAT`]({% link {{ page.version.version }}/float.md %}) string format, e.g., `'1.1'`. From a09973fd277f1d563e5f6db48d7ec747ec268b0a Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 30 Jul 2025 12:20:23 -0400 Subject: [PATCH 2/5] add callout --- src/current/v25.3/citext.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/current/v25.3/citext.md b/src/current/v25.3/citext.md index 293f2c21fdb..9f868a55e21 100644 --- a/src/current/v25.3/citext.md +++ b/src/current/v25.3/citext.md @@ -24,6 +24,10 @@ SELECT 'Roach'::CITEXT = 'roach'::CITEXT; With `CITEXT`, equality operators (`=`, `!=`, `<>`), ordering operators (`<`, `>`, etc.), and [`STRING` functions]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions), treat values as case-insensitive by default. Refer to the [example](#example). +{{site.data.alerts.callout_info}} +Unlike in PostgreSQL, `CITEXT` on CockroachDB is case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). +{{site.data.alerts.end}} + Aside from comparisons, `CITEXT` behaves like [`STRING`]({% link {{ page.version.version }}/string.md %}). ## Syntax From 4ab784c0a9d3224c2cbb8049a4e7da66facf32df Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 30 Jul 2025 18:54:03 -0400 Subject: [PATCH 3/5] address reviewer comments; simplify doc --- src/current/v25.3/citext.md | 62 +++++++++++++------------------------ 1 file changed, 21 insertions(+), 41 deletions(-) diff --git a/src/current/v25.3/citext.md b/src/current/v25.3/citext.md index 9f868a55e21..b18709f6932 100644 --- a/src/current/v25.3/citext.md +++ b/src/current/v25.3/citext.md @@ -5,31 +5,14 @@ toc: true docs_area: reference.sql --- -The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. +The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) represents a case-insensitive string. Like `STRING` values, `CITEXT` values preserve their casing when stored and retrieved. Unlike `STRING` values, comparisons between `CITEXT` values are case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). -All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. +Equality operators (`=`, `!=`, `<>`) and ordering operators (`<`, `>`, etc.) treat `CITEXT` values as case-insensitive by default. Refer to the [example](#example). -For example, the following comparison evaluates to `true`: - -{% include_cached copy-clipboard.html %} -~~~ sql -SELECT 'Roach'::CITEXT = 'roach'::CITEXT; -~~~ - -~~~ - ?column? ------------- - t -~~~ - -With `CITEXT`, equality operators (`=`, `!=`, `<>`), ordering operators (`<`, `>`, etc.), and [`STRING` functions]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions), treat values as case-insensitive by default. Refer to the [example](#example). - -{{site.data.alerts.callout_info}} -Unlike in PostgreSQL, `CITEXT` on CockroachDB is case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). +{{site.data.alerts.callout_success}} +`CITEXT` compares values as a `STRING` column with the `und-u-ks-level2` [collation]({% link {{ page.version.version }}/collate.md %}), meaning it is case-insensitive but accent-sensitive. {{site.data.alerts.end}} -Aside from comparisons, `CITEXT` behaves like [`STRING`]({% link {{ page.version.version }}/string.md %}). - ## Syntax To declare a `CITEXT` column, use the type name directly in your `CREATE TABLE` statement: @@ -44,11 +27,7 @@ CREATE TABLE logins ( ## Size -As with `STRING`, `CITEXT` values should be kept below 64 KB for best performance. Because `CITEXT` values are folded to lowercase on every comparison, `CITEXT` columns and indexes consume marginally more CPU and memory than their `STRING` equivalents, especially on write-heavy workloads. - -## Collations - -`CITEXT` compares values as a `STRING` column with the `und-u-ks-level2` [collation]({% link {{ page.version.version }}/collate.md %}), meaning it is case-insensitive but accent-sensitive. If you need accent-insensitive behavior, consider using `STRING` with a nondeterministic collation instead. +As with `STRING`, `CITEXT` values should be kept below 64 KB for best performance. Because `CITEXT` values resort to a collation engine on every comparison, `CITEXT` columns and indexes consume marginally more CPU and memory than their `STRING` equivalents. ## Example @@ -65,25 +44,25 @@ CREATE TABLE logins ( {% include_cached copy-clipboard.html %} ~~~ sql INSERT INTO logins VALUES -('roach', 'roach@example.com'), -('juno', 'juno@example.com'); +('Roach', 'Roach@example.com'), +('lincoln', 'lincoln@example.com'); ~~~ Because `CITEXT` comparisons are case-insensitive, an equality predicate matches regardless of letter case: {% include_cached copy-clipboard.html %} ~~~ sql -SELECT * FROM logins WHERE username = 'Roach'; +SELECT * FROM logins WHERE username = 'roach'; ~~~ ~~~ - name | email ---------+-------------------- - roach | roach@example.com + username | email +-----------+-------------------- + Roach | Roach@example.com (1 row) ~~~ -An ordering comparison is also case-insensitive with `CITEXT`. In the following eaxmple, `'Xavi'` is folded to lowercase before the comparison: +An ordering comparison is also case-insensitive with `CITEXT`: {% include_cached copy-clipboard.html %} ~~~ sql @@ -93,12 +72,12 @@ SELECT username FROM logins WHERE username < 'Xavi'; ~~~ username ------------ - roach - juno + Roach + lincoln (2 rows) ~~~ -For case-sensitive comparisons on `CITEXT` values, cast to `STRING` explicitly. In the default Unicode ordering, the uppercase value is considered less than the lowercase values in the table: +For case-sensitive comparisons on `CITEXT` values, cast to `STRING` explicitly. In the default Unicode ordering, an uppercase value is considered less than the lowercase value in the table: {% include_cached copy-clipboard.html %} ~~~ sql @@ -106,9 +85,10 @@ SELECT username FROM logins WHERE username::STRING < 'Xavi'; ~~~ ~~~ - username | email ------------+-------- -(0 rows) + username +------------ + Roach +(1 row) ~~~ ## Supported casting and conversion @@ -121,6 +101,6 @@ Type | Details ## See also -- [`STRING`]({% link {{ page.version.version }}/string.md %}) - [Data Types]({% link {{ page.version.version }}/data-types.md %}) -- [`COLLATE`]({% link {{ page.version.version }}/collate.md %}) +- [`STRING`]({% link {{ page.version.version }}/string.md %}) +- [`COLLATE`]({% link {{ page.version.version }}/collate.md %}) \ No newline at end of file From 8095ac424d10d99ed396e456681e38f198bf0da2 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Thu, 31 Jul 2025 16:24:04 -0400 Subject: [PATCH 4/5] address review comments --- src/current/v25.3/citext.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/src/current/v25.3/citext.md b/src/current/v25.3/citext.md index b18709f6932..fa1e63f28d1 100644 --- a/src/current/v25.3/citext.md +++ b/src/current/v25.3/citext.md @@ -91,14 +91,6 @@ SELECT username FROM logins WHERE username::STRING < 'Xavi'; (1 row) ~~~ -## Supported casting and conversion - -`CITEXT` values can be [cast]({% link {{ page.version.version }}/data-types.md %}#data-type-conversions-and-casts) to the following data types: - -Type | Details ------|-------- -`STRING` | Preserves case information when casting to `STRING`. - ## See also - [Data Types]({% link {{ page.version.version }}/data-types.md %}) From d4d2ca0d2f414b93c3e381703aed7d95427033da Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Fri, 1 Aug 2025 10:57:10 -0400 Subject: [PATCH 5/5] review edits --- src/current/_includes/v25.3/sidebar-data/sql.json | 6 ++++++ src/current/v25.3/citext.md | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/current/_includes/v25.3/sidebar-data/sql.json b/src/current/_includes/v25.3/sidebar-data/sql.json index 8759eae97e6..e2bd29ee86e 100644 --- a/src/current/_includes/v25.3/sidebar-data/sql.json +++ b/src/current/_includes/v25.3/sidebar-data/sql.json @@ -1004,6 +1004,12 @@ "/${VERSION}/bytes.html" ] }, + { + "title": "CITEXT", + "urls": [ + "/${VERSION}/citext.html" + ] + }, { "title": "COLLATE", "urls": [ diff --git a/src/current/v25.3/citext.md b/src/current/v25.3/citext.md index fa1e63f28d1..3f91ae6429b 100644 --- a/src/current/v25.3/citext.md +++ b/src/current/v25.3/citext.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) represents a case-insensitive string. Like `STRING` values, `CITEXT` values preserve their casing when stored and retrieved. Unlike `STRING` values, comparisons between `CITEXT` values are case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). +The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) represents a case-insensitive string. Like `STRING` values, `CITEXT` values preserve their casing when stored and retrieved. Unlike `STRING` values, comparisons between `CITEXT` values are case-insensitive for all [Unicode characters](https://en.wikipedia.org/wiki/List_of_Unicode_characters) that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). Equality operators (`=`, `!=`, `<>`) and ordering operators (`<`, `>`, etc.) treat `CITEXT` values as case-insensitive by default. Refer to the [example](#example).