Skip to content

Reduce allocations in LicenseExpressionTokenizer.HasValidCharacters by caching Regex instance#7237

Open
nareshjo wants to merge 1 commit intoNuGet:devfrom
nareshjo:LicenseExpressionTokenizer-Regex-Alloc
Open

Reduce allocations in LicenseExpressionTokenizer.HasValidCharacters by caching Regex instance#7237
nareshjo wants to merge 1 commit intoNuGet:devfrom
nareshjo:LicenseExpressionTokenizer-Regex-Alloc

Conversation

@nareshjo
Copy link
Copy Markdown
Contributor

🤖 AI-Generated Pull Request 🤖

This pull request was generated by the VS Perf Rel AI Agent. Please review this AI-generated PR with extra care! For more information, visit our wiki. Please share feedback with TIP Insights


  • Issue: LicenseExpressionTokenizer.HasValidCharacters() constructs a new Regex(...) on every invocation using a hardcoded constant pattern string. Each Regex constructor call triggers internal parsing that allocates RegexParser, RegexTree, RegexNode, RegexCharClass, RegexCode, StringBuilder, and other objects (~12–15 short-lived objects per call).

    This method is called once per license expression parse via the call chain: SearchObject.ProcessSearchResultsAsyncPackageSearchMetadataCacheItemPackageSearchMetadataContextInfo.CreatePackageSearchMetadata.get_LicenseMetadataNuGetLicenseExpressionParser.ParseGetTokensLicenseExpressionTokenizer.HasValidCharacters.
    For a typical NuGet package search, this produces hundreds of Regex constructions — each allocating ~12–14 short-lived internal objects — contributing to measurable GC pressure on a path that the allocation telemetry specifically flagged.

    Allocation site Original (per search) After fix (per search)
    new Regex(...) internal objects ~12–15 × N (N = license parses) 1 (static, once per process)

    This matches the allocation stack trace showing StringBuilder allocated inside Regex..ctor called by HasValidCharacters during NuGet package search:

nuget.packagemanagement.visualstudio.dll!SearchObject+<ProcessSearchResultsAsync>d__.MoveNext
nuget.packagemanagement.visualstudio.dll!SearchObject.CacheBackgroundData
nuget.packagemanagement.visualstudio.dll!PackageSearchMetadataCacheItem..ctor
nuget.packagemanagement.visualstudio.dll!PackageSearchMetadataCacheItem.GetVersionInfoContextInfoAsync
nuget.visualstudio.internal.contracts.dll!VersionInfoContextInfo+<CreateAsync>d__.MoveNext
nuget.visualstudio.internal.contracts.dll!PackageSearchMetadataContextInfo.Create
nuget.protocol.dll!PackageSearchMetadata.get_LicenseMetadata
nuget.packaging.dll!NuGetLicenseExpressionParser.Parse
nuget.packaging.dll!NuGetLicenseExpressionParser.GetTokens
nuget.packaging.dll!LicenseExpressionTokenizer.HasValidCharacters
└── system.dll!Regex..ctor
    └── system.dll!RegexParser.Parse → RegexParser.ScanCharClass
        └── RegexCharClass..ctor → TypeAllocated!System.Text.StringBuilder
  • Issue type: Reduce repeated identical allocations from per-call Regex construction on a hot path

  • Proposed fix: Promote the per-call Regex local variable to a private static readonly Regex field with RegexOptions.CultureInvariant. The pattern and matching behavior are identical; Regex.IsMatch is documented thread-safe on constructed instances. RegexOptions.Compiled is intentionally omitted — the pattern is a trivial single character-class match where interpreted mode is sub-microsecond, and Compiled would add a non-collectible DynamicMethod on .NET Framework 4.7.2 with no measurable benefit for this pattern complexity and call frequency.

    This follows the existing convention in the codebase: PackageIdValidator.IdRegex uses the identical private static readonly Regex pattern for package ID validation.

Best practices wiki
See related failure in PRISM
ADO work item

@nareshjo nareshjo requested a review from a team as a code owner March 26, 2026 18:01
@nareshjo nareshjo requested review from martinrrm and nkolev92 March 26, 2026 18:01
@dotnet-policy-service dotnet-policy-service bot added the Community PRs created by someone not in the NuGet team label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community PRs created by someone not in the NuGet team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants