Skip to content

Add per-source telemetry for package IDs containing non-ASCII characters#7213

Open
Copilot wants to merge 5 commits intodevfrom
copilot/add-telemetry-package-id-validation
Open

Add per-source telemetry for package IDs containing non-ASCII characters#7213
Copilot wants to merge 5 commits intodevfrom
copilot/add-telemetry-package-id-validation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

Adds a boolean property nupkgs.idcontainsnonasciicharacter to the PackageSourceDiagnostics telemetry event, indicating whether any package installed from that source had an ID containing characters outside [A-Za-z0-9.\-]. Since the event is already scoped per-source, this automatically correlates non-standard ID usage to specific feed providers.

Protocol layer

  • Added PackageId property to ProtocolDiagnosticNupkgCopiedEvent (nullable; new 3-param constructor, old 2-param delegates to it)
  • Updated all four raise-sites to pass the package ID:
    • FindPackagesByIdNupkgDownloaderidentity.Id
    • LocalPackageArchiveDownloader_packageIdentity.Id
    • LocalV3FindPackageByIdResource / LocalV2FindPackageByIdResourceid parameter
  • New public API declared in both PublicAPI.Unshipped.txt files

Telemetry layer (PackageSourceTelemetry)

  • Data class: new IdContainsNonAsciiCharacter bool, set to true on first non-standard ID seen per source (stays true once set)
  • Detection uses a zero-allocation ReadOnlySpan<char> character loop checking each char against the allowed ranges (A-Z, a-z, 0-9, ., -) — faster than a compiled Regex and produces no heap allocations
  • AddNupkgCopiedData: skips check when PackageId is null (events raised by old constructor)
  • ToTelemetryAsync: emits nupkgs.idcontainsnonasciicharacter alongside the existing nupkgs.copied / nupkgs.bytes
  • PropertyNames.Nupkgs.IdContainsNonAsciiCharacter = "nupkgs.idcontainsnonasciicharacter" added

Tests

New AddNupkgCopiedData_* test cases covering: all-standard IDs → false; non-standard IDs (underscore, Unicode, space, @, +) → true; mixed batch with one non-standard → true; null PackageIdfalse. Existing ToTelemetry_WithData_CreatesTelemetryProperties extended to assert the new property is emitted.

Original prompt

Summary

Add telemetry to track whether a package ID contains characters other than A-Z, a-z, 0-9, ., or -. This will help correlate non-standard package ID usage to specific feed providers (e.g., nuget.org vs myget.org vs others).

Background

The existing telemetry infrastructure in PackageSourceTelemetry.cs already:

  • Logs a PackageSourceDiagnostics event per source that was used during restore/search
  • Identifies well-known feeds via GetMsFeed() (nuget.org, Azure DevOps, GitHub, VS Offline)
  • Logs the raw source URL as PII data (source.url)
  • Tracks nupkg copy events via ProtocolDiagnostics.NupkgCopiedEvent

Requirements

Add a new boolean telemetry property to the PackageSourceDiagnostics event that indicates whether any package installed from that source during the operation had a package ID containing characters outside of the set [A-Za-z0-9.\-] (i.e., anything but ASCII letters, digits, ., or -).

  • The property should be a bool named something like nupkgs.hasnonstandard_id (following the existing naming convention in PropertyNames)
  • It should be tracked per-source in the Data class
  • It should be populated when a nupkg is copied/installed (via ProtocolDiagnostics.NupkgCopiedEvent)
  • The ProtocolDiagnosticNupkgCopiedEvent already has access to the package identity/ID — check if it includes the package ID, and if not, determine the best way to get it (e.g., parse from the file path, or add it to the event)
  • The final telemetry value is emitted in ToTelemetryAsync via AddSourceProperties or a new helper method
  • Since the property correlates to a feed (via the existing source.msfeed or source.url properties already in the event), no additional feed correlation logic is needed — the existing per-source event structure already provides that correlation

Relevant Files

  • Primary file to modify: src/NuGet.Clients/NuGet.VisualStudio.Common/Telemetry/PackageSourceTelemetry.cs

    • Add a HasNonStandardId (or similar) bool field to the Data inner class
    • Add a constant for the new property name to PropertyNames.Nupkgs
    • Update AddNupkgCopiedData to detect non-standard characters in the package ID and set the flag on Data
    • Emit the new property in ToTelemetryAsync (in the nupkgs section)
    • Update AddSourceProperties or create a new helper to emit the new property
  • Check the event type: ProtocolDiagnosticNupkgCopiedEvent (in NuGet.Protocol) — verify if it carries the package ID. If not, determine how to obtain the package ID at the point the event fires.

  • Tests: Add unit tests in test/NuGet.Clients.Tests/NuGet.VisualStudio.Common.Test/Telemetry/PackageSourceTelemetryTests.cs covering:

    • Package IDs with only standard characters → property is false
    • Package IDs with non-standard characters (e.g., Unicode letters, underscores, etc.) → property is true
    • Multiple packages from same source: at least one non-standard → true

Non-Standard Character Definition

A package ID is considered to have a non-standard ID if it contains any character that does NOT match [A-Za-z0-9.\-]. Use a Regex or simple character-by-character check.

Note: Do NOT use #nullable enable directives — the project has nullable enabled at the project level. Do NOT use reflection.

Coding Standards

Follow the NuGet coding guidelines and match the style of the existing code in PackageSourceTelemetry.cs.

Pull Request Requirements

  • Follow the existing PR template
  • Format code before submitting

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: where is the code that logs telemetry of packages used during restore?

Assistant: The restore telemetry code is spread across a few key files. Here's a breakdown of where it all lives:


1. Core Telemetry Initialization & Package Counts — RestoreCommand.cs

This is the heart of the per-project restore telemetry. There are two main areas:

InitializeTelemetry() — Populates general restore metadata at the start of a restore:

private void InitializeTelemetry(TelemetryActivity telemetry, int httpSourcesCount, bool auditEnabled)
{
    telemetry.TelemetryEvent.AddPiiData(ProjectFilePath, _request.Project.FilePath);
    bool isPackageSourceMappingEnabled = _request.PackageSourceMapping?.IsEnabled ?? false;
    telemetry.TelemetryEvent[PackageSourceMappingIsMappingEnabled] = isPackageSourceMappingEnabled;
    telemetry.TelemetryEvent[SourcesCount] = _request.DependencyProviders.RemoteProviders.Count;
    telemetry.TelemetryEvent[HttpSourcesCount] = httpSourcesCount;
    telemetry.TelemetryEvent[LocalSourcesCount] = _request....

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

📍 Connect Copilot coding agent with [Jira](https://gh.io/cca-jira-docs), [Azure Boards](https://gh.io/cca-azure-boards-docs) or [Linear](https://gh.io/cca-linear-docs) to delegate work to Copilot in one click without leaving your project management tool.

Copilot AI changed the title [WIP] Add telemetry for non-standard package ID tracking Add per-source telemetry for non-standard package ID characters Mar 13, 2026
Copilot AI requested a review from jeffkl March 13, 2026 17:32
Copilot AI requested a review from jeffkl March 16, 2026 20:04
@jeffkl jeffkl changed the title Add per-source telemetry for non-standard package ID characters Add per-source telemetry for package IDs containing non-ASCII characters Mar 16, 2026
Copilot AI requested a review from jeffkl March 16, 2026 20:11
@dotnet-policy-service dotnet-policy-service bot added the Status:No recent activity PRs that have not had any recent activity and will be closed if the label is not removed label Mar 23, 2026
Copilot AI and others added 5 commits March 30, 2026 10:50
… per feed source

Co-authored-by: jeffkl <17556515+jeffkl@users.noreply.github.com>
Co-authored-by: jeffkl <17556515+jeffkl@users.noreply.github.com>
Co-authored-by: jeffkl <17556515+jeffkl@users.noreply.github.com>
@jeffkl jeffkl force-pushed the copilot/add-telemetry-package-id-validation branch from 7e34992 to 79664c2 Compare March 30, 2026 20:12
@jeffkl jeffkl marked this pull request as ready for review March 30, 2026 20:12
@jeffkl jeffkl requested a review from a team as a code owner March 30, 2026 20:12
@jeffkl jeffkl requested a review from donnie-msft March 30, 2026 20:12
@dotnet-policy-service dotnet-policy-service bot removed the Status:No recent activity PRs that have not had any recent activity and will be closed if the label is not removed label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants