Skip to content

introduce hash-based redaction#33

Open
arjunmahishi wants to merge 4 commits intomasterfrom
hash-redaction
Open

introduce hash-based redaction#33
arjunmahishi wants to merge 4 commits intomasterfrom
hash-redaction

Conversation

@arjunmahishi
Copy link
Contributor

@arjunmahishi arjunmahishi commented Nov 13, 2025

This commit introduces the ability to hash sensitive information instead of fully redacting it. This let's the user correlate entries while still preserving privacy. This is an opt-in feature and requires the user to explicitly mark values as "hashable".

Changes:

  • EnableHashing(salt) is used to turn on hash-based redaction. If this is
    not called, redaction behaves as before (full redaction).
  • If both SafeValue and HashValue interfaces are implemented on a type,
    SafeValue takes precedence.
  • Uses SHA-256 / HMAC-SHA256 (if salt is provided) to hash the sensitive
    parts of the string.
  • A dagger () prefix is added within the redaction markers to denote
    that it should be hashed instead of fully redacted.
  • There are 2 ways to mark a value "hashable" (similar to redaction):
    1. Implementing the HashValue interface directly on a type.
    2. Wrapping a value with the HashString/HashBytes/Hash* functions
  • Hashes are calculated at formatting time and not during string
    creation. Just like redaction.

Example usage

  • Implementing the HashValue interface:

    type UserID string
    func (u UserID) HashValue() string {} // makes the type "hashable"
    
    fmt.Println(redact.Sprintf("%s", UserID("alice")).Redact()) // Output: ‹522b276a›
  • Inline with helper functions:

    fmt.Println(redact.Sprintf("%s", redact.HashString("alice")).Redact()) // Output: ‹522b276a›

Benchmark of hashing enabled vs disabled

image

This change is Reviewable

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces hash-based redaction as an opt-in feature that allows correlating log entries while preserving privacy. When enabled, sensitive values marked with the HashValue interface are hashed (using SHA-1/HMAC-SHA1) instead of being fully redacted, with the hash output denoted by a dagger (†) prefix in redaction markers.

Key Changes:

  • Adds EnableHashing(salt) API to enable hash-based redaction with optional HMAC salt
  • Introduces HashValue interface and wrapper types (HashString, HashInt, etc.) for marking values as hashable
  • Implements hash computation logic using SHA-1/HMAC-SHA1 with truncation to 8 hex characters

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
api.go Exports new HashValue types and EnableHashing function to public API
interfaces/interfaces.go Defines HashValue marker interface and concrete wrapper types for hashable values
internal/markers/constants.go Adds HashPrefix constant (†) and updates regex patterns to handle hash markers
internal/markers/hash.go Implements hash computation logic with SHA-1/HMAC-SHA1 and global state management
internal/markers/hash_test.go Adds unit tests for low-level hash functions with various inputs and salt configurations
internal/markers/markers.go Updates Redact() methods to detect and process hash markers (‹†value›)
internal/rfmt/print.go Adds HashValue interface detection in three locations to wrap values with hash markers during formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arjunmahishi arjunmahishi force-pushed the hash-redaction branch 2 times, most recently from 53b3995 to 365e09b Compare November 13, 2025 13:55
Copy link
Contributor

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I will review more thoroughly next week. Thanks for working on this.

This commit introduces the ability to hash sensitive information instead
of fully redacting it. This let's the user correlate entries while still
preserving privacy. This is an opt-in feature and requires the user to explicitly mark
values as "hashable".

Changes:
  * EnableHashing(salt) is used to turn on hash-based redaction. If this is
    not called, redaction behaves as before (full redaction).
  * If both SafeValue and HashValue interfaces are implemented on a type,
    SafeValue takes precedence.
  * Uses SHA-1 / HMAC-SHA1 (if salt is provided) to hash the sensitive
    parts of the string.
  * A dagger (†) prefix is added within the redaction markers to denote
    that it should be hashed instead of fully redacted.
  * There are 2 ways to mark a value "hashable" (similar to redaction): 
    1) Implementing the HashValue interface directly on a type.
    2) Wrapping a value with the HashString/HashBytes/Hash* functions
  * Hashes are calculated at formatting time and not during string
    creation. Just like redaction.

**Example usage**

  * Implementing the HashValue interface:

    ```go 
    type UserID string
    func (u UserID) HashValue() string {} // makes the type "hashable"

    fmt.Println(redact.Sprintf("%s", UserID("alice")).Redact()) // Output: ‹522b276a›
    ```

  * Inline with helper functions: 
  
    ```go 
    fmt.Println(redact.Sprintf("%s", redact.HashString("alice")).Redact()) // Output: ‹522b276a›
    ```
@arjunmahishi
Copy link
Contributor Author

Ack. Will address the comments end of the week.

@cockroachlabs-cla-agent
Copy link

cockroachlabs-cla-agent bot commented Feb 18, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also contains no tests for any redactable and hashed strings and mixed hashed redactable stuff, etc. Huge test coverage gap.

Copy link

@aa-joshi aa-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aa-joshi reviewed 4 files and made 3 comments.
Reviewable status: 4 of 9 files reviewed, 17 unresolved discussions (waiting on Abhinav1299, arjunmahishi, and dhartunian).


internal/markers/hash.go line 1 at r4 (raw file):

// Copyright 2025 The Cockroach Authors.

nit: 2026


internal/markers/markers.go line 94 at r4 (raw file):

// they are redacted like regular markers.
func (s RedactableBytes) Redact() RedactableBytes {
	result := ReStripSensitive.ReplaceAllFunc([]byte(s), func(match []byte) []byte {

same as here: https://reviewable.io/reviews/cockroachdb/redact/33#gh-2823788809

Copy link
Contributor

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Just a few more edge cases and then we should be good go. Thanks!

// TestHashRedact_RedactableBytesDisabled verifies bytes-path behavior when hashing is disabled.
func TestHashRedact_RedactableBytesDisabled(t *testing.T) {
defer DisableHashing()
DisableHashing()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we disable hashing twice?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be only one.
the defer DisableHashing() was a boilerplate I used for all testcases and then applied the hashing state which was needed as per test. In this test it was specifically testing a disbabled state, so the defer is not needed.

}
}

// TestHashRedact_WithSalt verifies that salted hashing produces different output than unsalted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These comments on top of each test are noise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed them and kept only the ones required.

}

// TestHashRedact_ToggleTransitions verifies Enable -> Disable -> Enable transitions.
func TestHashRedact_ToggleTransitions(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that concurrently enables/disables hashing from multiple goroutines? that will help test the concurrency model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

EnableHashing(nil)
defer DisableHashing()

s := Sprintf("user=%s action=%s", HashString("alice"), "login")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a string like %s%s with first a hashed and then a non-hashed value, they'll get combined together because of an optimization that combines adjacent redaction zones, that will hash the combined value because the hash marker will apply to both. See if we can avoid that.

Copy link

@visheshbardia visheshbardia Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified startRedactable() to support this and added test cases as well for multiple scenarios under this.

EscapeMark = '?'
EscapeMarkS = string(EscapeMark)
RedactedS = StartS + "×" + EndS
HashPrefix = '†'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this marker needs to be escaped in InternalEscapeBytes so that it doesn't force hash behavior in customer-controlled text. EscapeMarkers does this correctly because it uses ReStripMarkers, but InternalEscapeBytes is different.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, InternalEscapeBytes should also escape leading for consistency with EscapeMarkers. I have added the changes for it along with test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants