From f9682c5afb17f5d3aeff7d1c30dc0de3a2a6f0fd Mon Sep 17 00:00:00 2001 From: Arpad Borsos Date: Tue, 21 Mar 2023 13:56:11 +0100 Subject: [PATCH 1/4] Reliable JavaScript/SourceMap processing via `DebugId` We want to make processing / SourceMap-ing of JavaScript stack traces more reliable. To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a DebugId. The same DebugId also uniquely identifies the corresponding SourceMap. That way it should be possible to _reliably_ look up the SourceMap corresponding to a JavaScript file, which is necessary to have reliable SourceMap processing. --- README.md | 1 + text/0081-sourcemap-debugid.md | 488 +++++++++++++++++++++++++++++++++ 2 files changed, 489 insertions(+) create mode 100644 text/0081-sourcemap-debugid.md diff --git a/README.md b/README.md index a5decae6..01355490 100644 --- a/README.md +++ b/README.md @@ -40,3 +40,4 @@ This repository contains RFCs and DACIs. Lost? - [0074-source-context-via-links](text/0074-source-context-via-links.md): Source context via links - [0078-escalating-issues](text/0078-escalating-issues.md): Escalating Issues - [0080-issue-states](text/0080-issue-states.md): Issue States +- [0081-sourcemap-debugid](text/0081-sourcemap-debugid.md): Reliable JavaScript/SourceMap processing via `DebugId` diff --git a/text/0081-sourcemap-debugid.md b/text/0081-sourcemap-debugid.md new file mode 100644 index 00000000..b739404a --- /dev/null +++ b/text/0081-sourcemap-debugid.md @@ -0,0 +1,488 @@ +- Start Date: 2023-03-21 +- RFC Type: initiative +- RFC PR: https://github.com/getsentry/rfcs/pull/81 +- RFC Status: draft + +# Summary / Motivation + +We want to make processing / SourceMap-ing of JavaScript stack traces more reliable. +To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a `DebugId`. +The same `DebugId` also uniquely identifies the corresponding SourceMap. +That way it should be possible to _reliably_ look up the SourceMap corresponding to +a JavaScript file. + +# Background + +It is currently not possible to _reliably_ find the associated SourceMap for a +JavaScript file. + +A JavaScript stack trace only points to the (minified / transformed) source file +by its URL, such as `https://example.com/file.min.js`, or `/path/to/local/file.min.js`. + +The corresponding SourceMap is often referenced using a `sourceMappingURL` comment +at the end of that file. It is also possible to have a "hidden" SourceMap that is +not referenced in such a way, but is typically found by its filename `{js_filename}.map`. + +However it is not guaranteed that the SourceMap found in such a way actually +corresponds to the JavaScript file in which the error happened. + +A classical example is caching. + +1. An end-user is loading version `1` of `https://example.com/file.min.js`. +2. A new app version `2` is deployed. +3. The user experiences an error. +4. The SourceMap at `https://example.com/file.min.js.map` (version `2`) at this point in time does not correspond to + the code the user was running. + +This problem is even worse at Sentry scale, as at any point in time, errors can come in that happened with arbitrary +versions of the deployed code, sometimes even involving multiple files which might be out-of-sync with each other. + +To work around this problem, Sentry has used the combination of `release` and optional `dist`, together with matching +paths to better associate JavaScript files from one release with SourceMaps uploaded to Sentry. + +However this solution is still not reliable, as mentioned above, even two files loaded in the end-users browser can +belong to a different release, due to caching or other reasons. + +Using a `DebugId`, which uniquely associates the JavaScript file and its corresponding SourceMap, should make source-mapping +a lot more reliable. + +# Supporting Data + +TODO: please fill in the gaps here! + +Sentry has used the `release (+ dist) + abs_path` solution for quite some time and found it inadequate. +A lot of events are not being resolved correctly due to these mismatches, and problems with source-mapping are very +common in customer-support interactions. Up to X% of customer-support questions are related to SourceMaps. + +On the other hand, using a `DebugId` for symbolication of Native crashes and stack traces is working reliably both in +Sentry and in the wider native ecosystem. The Native and C# community has the concept of _Symbol Servers_, which can +serve any debug file based on its `DebugId`, which allows reliable symbolication for any release, at any point in time. + +## Real-world example + +Anecdotally, Sentry itself has recently experienced a hard-to-diagnose problem with SourceMap processing. +The root of the problem was with path-matching, which becomes unnecessary using `DebugId`s. + +The file `https://s1.sentry-cdn.com/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js` +appeared as part of an `Error` stack trace. After stripping the domain name prefix, the path that is searched for +was `~/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js`. However, due to a +misconfiguration, all the uploaded files had a wrong path associated with them, in this particular case it was +`~/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js`. With these paths not matching, Sentry was unable +to correctly process this file, as no minified file or SourceMap was found with that path. + +This is a prime example that would be solved by `DebugId`s. + +# Options Considered + +To make `DebugId` work, we need to generate one, and associate it to both the JavaScript file, and its corresponding +SourceMap. + +## The `DebugId` format + +The `DebugId` should have the same format as a standard UUI, specifically: +It should be a 128 bit (16 byte), formatted to a string using base-16 hex encoding like so: + +`XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX` + +## How to generate `DebugId`s? + +There is two options of choosing a `DebugId`: Making it completely random, or make it reproducible by deriving it from +a content hash. + +### Based on JavaScript Content-hash + +This creates a new `DebugId` by hashing the contents of the (minified) JavaScript file. + +**pros** + +- Is fully reproducible. The same JavaScript file will always have the same `DebugId`. +- Works well with existing caching solutions. + +**cons** + +- Increases overhead in server-side SourceMap processing, as one file can potentially be included in multiple _bundles_. + See [_What is an `ArtifactBundle`_](#what-is-an-artifactbundle) below. +- A difference in a source file might not be reflected in the JavaScript file. An example of this might be changes to + whitespace, comments, or code that was dead-code-eliminated by bundlers. + +### Based on SourceMap Content-hash + +This creates a new `DebugId` by hashing the contents of the SourceMap file. The reasoning for this is primarily +motivated by a drawback of the previous option. +It is possible that non-essential changes to the source file, such as whitespace or comments, will be completely +removed by a minifier, and will result in an identical minified output file. +These changes would however create different mappings in the SourceMap file, making it possible to resolve to the +correct original source. + +It is also possible to use a combined content-hash of both the (minified) JavaScript file, and its corresponding SourceMap. + +**pros** + +- Generates a new `DebugId` for changes to source files that would otherwise not lead to changes in the (minified) JavaScript file. +- SourceMap processing will always resolve back to the original source code. + +**cons** + +- Does lead to slightly more cache invalidation. + +### Random `DebugId` + +This option would create a new random `DebugId` for each file, on each build. + +**pros** + +- Simpler server-side SourceMap processing, as one `DebugId` is only included in a single _bundle_, and that one bundle + can serve multiple stack frames for multiple files of the same build. + +**cons** + +- Completely breaks the concept of _caching_, as every file is unique for every build. + +### Example + +Consider the following two files: + +```js +// File A + +/* with some added comment*/ +export function anExportedFunction() { + return anInternalInlinedFunction(); +} + +function anInternalInlinedFunction() { + window.foo = "side effect"; +} +``` + +```js +// File B + +function aCompletelyDifferentFunction() { + // that has its own comments and everything + window.foo = "side effect"; +} + +function deadCodeThatsEliminated() {} + +export function anExportedFunction() { + return aCompletelyDifferentFunction(); +} +``` + +Both of these files will generate the exact same minified file using `terser`, +which will completely inline simple tail calls, and eliminate dead code: + + +```js +export function anExportedFunction(){window.foo="side effect"} +``` + +(Unfortunately, the https://try.terser.org/ REPL does not allow hotlinking, but you can try this for yourself.) + +It is not even necessary that all these functions are defined in the same file. They might have been combined from +several files using a bundler like `rollup`: +[REPL example](https://rollupjs.org/repl/?version=3.20.2&shareable=JTdCJTIyZXhhbXBsZSUyMiUzQW51bGwlMkMlMjJtb2R1bGVzJTIyJTNBJTVCJTdCJTIyY29kZSUyMiUzQSUyMmltcG9ydCUyMCU3QiUyMGFDb21wbGV0ZWx5RGlmZmVyZW50RnVuY3Rpb24lMjAlN0QlMjBmcm9tJTIwJy4lMkZmb28uanMnJTNCJTVDbiU1Q24lMkYqJTIwd2l0aCUyMHNvbWUlMjBhZGRlZCUyMGNvbW1lbnQqJTJGJTVDbmV4cG9ydCUyMGZ1bmN0aW9uJTIwYW5FeHBvcnRlZEZ1bmN0aW9uKCklMjAlN0IlNUNuJTIwJTIwcmV0dXJuJTIwYUNvbXBsZXRlbHlEaWZmZXJlbnRGdW5jdGlvbigpJTNCJTVDbiU3RCUyMiUyQyUyMmlzRW50cnklMjIlM0F0cnVlJTJDJTIybmFtZSUyMiUzQSUyMm1haW4uanMlMjIlN0QlMkMlN0IlMjJjb2RlJTIyJTNBJTIyZXhwb3J0JTIwZnVuY3Rpb24lMjBhQ29tcGxldGVseURpZmZlcmVudEZ1bmN0aW9uKCklMjAlN0IlNUNuJTIwJTIwJTJGJTJGJTIwdGhhdCUyMGhhcyUyMGl0cyUyMG93biUyMGNvbW1lbnRzJTIwYW5kJTIwZXZlcnl0aGluZyU1Q24lMjAlMjB3aW5kb3cuZm9vJTIwJTNEJTIwJTVDJTIyc2lkZSUyMGVmZmVjdCU1QyUyMiUzQiU1Q24lN0QlNUNuJTVDbmZ1bmN0aW9uJTIwZGVhZENvZGVUaGF0c0VsaW1pbmF0ZWQoKSUyMCU3QiU3RCUyMiUyQyUyMmlzRW50cnklMjIlM0FmYWxzZSUyQyUyMm5hbWUlMjIlM0ElMjJmb28uanMlMjIlN0QlNUQlMkMlMjJvcHRpb25zJTIyJTNBJTdCJTIyb3V0cHV0JTIyJTNBJTdCJTIyZm9ybWF0JTIyJTNBJTIyZXMlMjIlN0QlMkMlMjJ0cmVlc2hha2UlMjIlM0F0cnVlJTdEJTdE) + +As these examples show, vastly different code can produce the exact same minified output. However when doing SourceMap +processing, we want to link back to the original code in every one of these cases. + +## How to inject the `DebugId` into the JavaScript file? + +### `//# debugId` comment + +We propose to add a new magic comment to the end of JavaScript files similar to the existing `//# sourceMappingURL` +comment. It should be at the end of the file, preferable as the line right before the `sourceMappingURL`, as the +second line from the bottom + +It should look like this: + +```js +someRandomJSCode(); +//# debugId=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX +//# sourceMappingURL=file.min.js.map +``` + +### Runtime Detection / Resolution of `DebugId` + +In a shiny utopian future, Browsers would directly expose builtin APIs to programmatically access each frame of an `Error`s stack. +This might include the absolute path, the line and column number, and the `DebugId`. +Though the reality of today is that each browser has its own text-based `Error.stack` format, which might even give +completely different line and column numbers across the different browsers. +No programmatic API exists today, and might never exist. At the very least, widespread support for this is years away. + +It is therefore necessary to extract this `DebugId` through other means. + +#### Reading the `//# debugId` comment when capturing Errors + +Current JavaScript stack traces include the absolute path (called `abs_path`) of each stack frame. It should be possible +to load and inspect that file at runtime whenever an error happens. + +An example of this might look like this: + +```js +// cached in the SDK: +const RESOLVED_FRAMES = new Map(); +async function attachDebugMeta(event) { + for (const { abs_path } of allStackFrames(event)) { + if (!RESOLVED_FRAMES.has(abs_path)) { + const rawSource = await fetch(abs_path).then((res) => res.text()); + RESOLVED_FRAMES.set(abs_path, extractDebugIdFromSource(rawSource)); + } + } + + event.debug_meta = resolvedFramesToImages(RESOLVED_FRAMES); +} +``` + +**pros** + +- Does not require injecting any _code_ into the JavaScript files. + +**cons** + +- Needs `async` code to resolve `DebugId`s, and might incur some async fetching / IO when capturing an Error. +- Though any source referenced from the stack trace via `abs_path` is likely in the browser cache already. + +#### Add the `DebugId` to a global at load time + +One solution here is to inject a small snippet of JS which will be executed when the JavaScript file is loaded, and adds +the `DebugId` to a global map. + +An example snippet is here: + + +```js +!function(){try{var e="undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:{},n=(new Error).stack;n&&(e._sentryDebugIds=e._sentryDebugIds||{},e._sentryDebugIds[n]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")}catch(e){}}() +``` + +This snippet adds a complete `Error.stack` to a global called `_sentryDebugIds`. +Further post-processing at time of capturing an `Error` is required to extract the `abs_path` from that captured stack. + +**pros** + +- Does not require any async fetching at time of capturing an `Error`. + +**cons** + +- It does however require parsing of the `Error.stack`s in `_sentryDebugIds` at time of capturing the `Error`. +- However this should be cached and only happen once. + +An alternative implementation might use the [`import.meta.url`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta) +property. This would avoid capturing and post-processing an `Error.stack`, but does require usage of ECMAScript Modules. + + +```js +((globalThis._sentryDebugIds=globalThis._sentryDebugIds||{})[import.meta.url]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"); +``` + +**pros** + +- More compact snippet. +- No post-processing required. + +**cons** + +- Depends on usage of ECMAScript Modules. + +## When to inject the `DebugId` into the JavaScript file? + +Deploying JavaScript applications can range from a simple _copy files via ftp_ +to a complex workflow like the following: + +```mermaid +graph TD + transpile[Transpile source files] --> bundle[Bundle source files] + bundle --> minify[Minify bundled chunk] + minify --> fingerprint[Fingerprint minified chunks] + minify --> sentry[Upload release to Sentry] + fingerprint --> upload[Upload assets to CDN] + upload --> propagate[Wait for CDS assets to propagate] + fingerprint --> deploy[Deploy updated asset references] + propagate --> deploy +``` + +In this example, assets are _fingerprinted_, and after being fully propagated +through a global CDN, they are starting to be referenced from the backend +service via HTML. + +_Fingerprinting_ in this case means creating a unique content-hash which is then +used in various of ways: + +- As part of the filename, to give each file a unique and stable filename. +- Use the derived hash for [Subresource Integrity (SRI)](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity). + +An example may look like this, for a CDN-deployed and fingerprinted reference +to [katex](https://katex.org/docs/browser.html#starter-template): + +```html + +``` + +Not only is the deployment pipeline very complex, it can also involve a variety of tools with varying degree of +integration between them. +The example `