diff --git a/README.md b/README.md index 31f9a3fc..d25ce444 100644 --- a/README.md +++ b/README.md @@ -42,4 +42,5 @@ This repository contains RFCs and DACIs. Lost? - [0078-escalating-issues](text/0078-escalating-issues.md): Escalating Issues - [0079-exception-groups](text/0079-exception-groups.md): Exception Groups - [0080-issue-states](text/0080-issue-states.md): Issue States +- [0081-sourcemap-debugid](text/0081-sourcemap-debugid.md): Reliable JavaScript/SourceMap processing via `DebugId` - [0084-move-docs-to-sentry-repository](text/0084-move-docs-to-sentry-repository.md): Move onboarding docs from sentry-docs over to sentry repository diff --git a/text/0081-sourcemap-debugid.md b/text/0081-sourcemap-debugid.md new file mode 100644 index 00000000..c64ca3ef --- /dev/null +++ b/text/0081-sourcemap-debugid.md @@ -0,0 +1,469 @@ +- Start Date: 2023-03-21 +- RFC Type: initiative +- RFC PR: https://github.com/getsentry/rfcs/pull/81 +- RFC Status: draft + +# Summary / Motivation + +We want to make processing / SourceMap-ing of JavaScript stack traces more reliable. +To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a `DebugId`. +The same `DebugId` also uniquely identifies the corresponding SourceMap. +That way it should be possible to _reliably_ look up the SourceMap corresponding to +a JavaScript file. + +# Background + +It is currently not possible to _reliably_ find the associated SourceMap for a +JavaScript file. + +A JavaScript stack trace only points to the (minified / transformed) source file +by its URL, such as `https://example.com/file.min.js`, or `/path/to/local/file.min.js`. + +The corresponding SourceMap is often referenced using a `sourceMappingURL` comment +at the end of that file. It is also possible to have a "hidden" SourceMap that is +not referenced in such a way, but is typically found by its filename `{js_filename}.map`. + +However it is not guaranteed that the SourceMap found in such a way actually +corresponds to the JavaScript file in which the error happened. + +A classical example is caching. + +1. An end-user is loading version `1` of `https://example.com/file.min.js`. +2. A new app version `2` is deployed. +3. The user experiences an error. +4. The SourceMap at `https://example.com/file.min.js.map` (version `2`) at this point in time does not correspond to + the code the user was running. + +This problem is even worse at Sentry scale, as at any point in time, errors can come in that happened with arbitrary +versions of the deployed code, sometimes even involving multiple files which might be out-of-sync with each other. + +To work around this problem, Sentry has used the combination of `release` and optional `dist`, together with matching +paths to better associate JavaScript files from one release with SourceMaps uploaded to Sentry. + +However this solution is still not reliable, as mentioned above, even two files loaded in the end-users browser can +belong to a different release, due to caching or other reasons. + +Using a `DebugId`, which uniquely associates the JavaScript file and its corresponding SourceMap, should make source-mapping +a lot more reliable. + +# Supporting Data + +Sentry has used the `release (+ dist) + abs_path` solution for quite some time and found it inadequate. +A lot of events are not being resolved correctly due to these mismatches, and problems with source-mapping are very +common in customer-support interactions. Up to X% of customer-support questions are related to SourceMaps. + +On the other hand, using a `DebugId` for symbolication of Native crashes and stack traces is working reliably both in +Sentry and in the wider native ecosystem. The Native and C# community has the concept of _Symbol Servers_, which can +serve any debug file based on its `DebugId`, which allows reliable symbolication for any release, at any point in time. + +## Real-world example + +Anecdotally, Sentry itself has recently experienced a hard-to-diagnose problem with SourceMap processing. +The root of the problem was with path-matching, which becomes unnecessary using `DebugId`s. + +The file `https://s1.sentry-cdn.com/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js` +appeared as part of an `Error` stack trace. After stripping the domain name prefix, the path that is searched for +was `~/_static/dist/sentry/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js`. However, due to a +misconfiguration, all the uploaded files had a wrong path associated with them, in this particular case it was +`~/chunks/app_views_issueDetails_index_tsx.fd0d65cc90b6c116f093.js`. With these paths not matching, Sentry was unable +to correctly process this file, as no minified file or SourceMap was found with that path. + +This is a prime example that would be solved by `DebugId`s. + +# Options Considered + +To make `DebugId` work, we need to generate one, and associate it to both the JavaScript file, and its corresponding +SourceMap. + +## The `DebugId` format + +The `DebugId` should have the same format as a standard UUI, specifically: +It should be a 128 bit (16 byte), formatted to a string using base-16 hex encoding like so: + +`XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX` + +## How to generate `DebugId`s? + +There is two options of choosing a `DebugId`: Making it completely random, or make it reproducible / deterministic, derived +from the build. + +Different approaches and examples are shown below, but in summary, we would like the `DebugId` to be reproducible and +deterministic for a build. Building the exact same project and original code should always result in the same assets. +This means the same minified JS files (with embedded `DebugId`), and the same SourceMap file. + +## Deterministic + +As long as the `DebugId` is deterministic, it does not matter how it is being derived. Different tools will use different +methods. And a `DebugId` may differ from one tool to the next, just as the exact output files may differ between tools, +and even tool versions. + +We do not want content hashes to be only based on the minified JavaScript file to also pick up on the original source +cahnges. For a source hash only approach the solutions outlined in [RFC 85](https://github.com/getsentry/rfcs/pull/85) +apply. + +Instead we recommend to generate a new `DebugId` by hashing the contents of the +SourceMap file. The reasoning for this is primarily motivated by a drawback of +the previous option. It is possible that non-essential changes to the source +file, such as whitespace or comments, will be completely removed by a minifier, +and will result in an identical minified output file. These changes would +however create different mappings in the SourceMap file, making it possible to +resolve to the correct original source. + +It is also possible to use a combined content-hash of both the (minified) JavaScript file, and its corresponding SourceMap. + +**pros** + +- Generates a new `DebugId` for changes to source files that would otherwise not lead to changes in the (minified) JavaScript file. +- SourceMap processing will always resolve back to the original source code. + +**cons** + +- Does lead to slightly more cache invalidation. + +### Random + +This option would create a new random `DebugId` for each file, on each build. + +**pros** + +- Simpler server-side SourceMap processing, as one `DebugId` is only included in a single _bundle_, and that one bundle + can serve multiple stack frames for multiple files of the same build. + +**cons** + +- Completely breaks the concept of _caching_, as every file is unique for every build. + +### Example + +Consider the following two files: + +```js +// File A + +/* with some added comment*/ +export function anExportedFunction() { + return anInternalInlinedFunction(); +} + +function anInternalInlinedFunction() { + window.foo = "side effect"; +} +``` + +```js +// File B + +function aCompletelyDifferentFunction() { + // that has its own comments and everything + window.foo = "side effect"; +} + +function deadCodeThatsEliminated() {} + +export function anExportedFunction() { + return aCompletelyDifferentFunction(); +} +``` + +Both of these files will generate the exact same minified file using `terser`, +which will completely inline simple tail calls, and eliminate dead code: + + +```js +export function anExportedFunction(){window.foo="side effect"} +``` + +(Unfortunately, the https://try.terser.org/ REPL does not allow hotlinking, but you can try this for yourself.) + +It is not even necessary that all these functions are defined in the same file. They might have been combined from +several files using a bundler like `rollup`: +[REPL example](https://rollupjs.org/repl/?version=3.20.2&shareable=JTdCJTIyZXhhbXBsZSUyMiUzQW51bGwlMkMlMjJtb2R1bGVzJTIyJTNBJTVCJTdCJTIyY29kZSUyMiUzQSUyMmltcG9ydCUyMCU3QiUyMGFDb21wbGV0ZWx5RGlmZmVyZW50RnVuY3Rpb24lMjAlN0QlMjBmcm9tJTIwJy4lMkZmb28uanMnJTNCJTVDbiU1Q24lMkYqJTIwd2l0aCUyMHNvbWUlMjBhZGRlZCUyMGNvbW1lbnQqJTJGJTVDbmV4cG9ydCUyMGZ1bmN0aW9uJTIwYW5FeHBvcnRlZEZ1bmN0aW9uKCklMjAlN0IlNUNuJTIwJTIwcmV0dXJuJTIwYUNvbXBsZXRlbHlEaWZmZXJlbnRGdW5jdGlvbigpJTNCJTVDbiU3RCUyMiUyQyUyMmlzRW50cnklMjIlM0F0cnVlJTJDJTIybmFtZSUyMiUzQSUyMm1haW4uanMlMjIlN0QlMkMlN0IlMjJjb2RlJTIyJTNBJTIyZXhwb3J0JTIwZnVuY3Rpb24lMjBhQ29tcGxldGVseURpZmZlcmVudEZ1bmN0aW9uKCklMjAlN0IlNUNuJTIwJTIwJTJGJTJGJTIwdGhhdCUyMGhhcyUyMGl0cyUyMG93biUyMGNvbW1lbnRzJTIwYW5kJTIwZXZlcnl0aGluZyU1Q24lMjAlMjB3aW5kb3cuZm9vJTIwJTNEJTIwJTVDJTIyc2lkZSUyMGVmZmVjdCU1QyUyMiUzQiU1Q24lN0QlNUNuJTVDbmZ1bmN0aW9uJTIwZGVhZENvZGVUaGF0c0VsaW1pbmF0ZWQoKSUyMCU3QiU3RCUyMiUyQyUyMmlzRW50cnklMjIlM0FmYWxzZSUyQyUyMm5hbWUlMjIlM0ElMjJmb28uanMlMjIlN0QlNUQlMkMlMjJvcHRpb25zJTIyJTNBJTdCJTIyb3V0cHV0JTIyJTNBJTdCJTIyZm9ybWF0JTIyJTNBJTIyZXMlMjIlN0QlMkMlMjJ0cmVlc2hha2UlMjIlM0F0cnVlJTdEJTdE) + +As these examples show, vastly different code can produce the exact same minified output. However when doing SourceMap +processing, we want to link back to the original code in every one of these cases. + +## How to inject the `DebugId` into the JavaScript file? + +### `//# debugId` comment + +We propose to add a new magic comment to the end of JavaScript files similar to the existing `//# sourceMappingURL` +comment. It should be at the end of the file, preferable as the line right before the `sourceMappingURL`, as the +second line from the bottom + +It should look like this: + +```js +someRandomJSCode(); +//# debugId=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX +//# sourceMappingURL=file.min.js.map +``` + +### Runtime Detection / Resolution of `DebugId` + +In a shiny utopian future, Browsers would directly expose builtin APIs to programmatically access each frame of an `Error`s stack. +This might include the absolute path, the line and column number, and the `DebugId`. +Though the reality of today is that each browser has its own text-based `Error.stack` format, which might even give +completely different line and column numbers across the different browsers. +No programmatic API exists today, and might never exist. At the very least, widespread support for this is years away. + +It is therefore necessary to extract this `DebugId` through other means. + +#### Reading the `//# debugId` comment when capturing Errors + +Current JavaScript stack traces include the absolute path (called `abs_path`) of each stack frame. It should be possible +to load and inspect that file at runtime whenever an error happens. + +An example of this might look like this: + +```js +// cached in the SDK: +const RESOLVED_FRAMES = new Map(); +async function attachDebugMeta(event) { + for (const { abs_path } of allStackFrames(event)) { + if (!RESOLVED_FRAMES.has(abs_path)) { + const rawSource = await fetch(abs_path).then((res) => res.text()); + RESOLVED_FRAMES.set(abs_path, extractDebugIdFromSource(rawSource)); + } + } + + event.debug_meta = resolvedFramesToImages(RESOLVED_FRAMES); +} +``` + +**pros** + +- Does not require injecting any _code_ into the JavaScript files. + +**cons** + +- Needs `async` code to resolve `DebugId`s, and might incur some async fetching / IO when capturing an Error. +- Though any source referenced from the stack trace via `abs_path` is likely in the browser cache already. + +#### Add the `DebugId` to a global at load time + +One solution here is to inject a small snippet of JS which will be executed when the JavaScript file is loaded, and adds +the `DebugId` to a global map. + +Any snippet would need to be injected in the _top_ of JS files, to account for errors that happen during execution of +top-level statements. Additional care needs to be taken for any possible `"use strict"` annotation on the top of the file. +Doing injection on the top of the file would also mean that changes to the SourceMap are needed, more specifically, all +the `line` offsets need to shift down by `1`, for a 1-line snippet. + +An example snippet is here: + + +```js +!function(){try{var e="undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:{},f="_sentryDebugIds",n=(new Error).stack;n&&(e[f]=e[f]||{},e[f][n]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")}catch(e){}}() +``` + +This snippet adds a complete `Error.stack` to a global called `_sentryDebugIds`. +Further post-processing at time of capturing an `Error` is required to extract the `abs_path` from that captured stack. + +**pros** + +- Does not require any async fetching at time of capturing an `Error`. +- Compatible with current JS runtimes. + +**cons** + +- Injection needs to modify the mappings of the SourceMap itself. +- It does however require parsing of the `Error.stack`s in `_sentryDebugIds` at time of capturing the `Error`. +- However this should be cached and only happen once. + +An alternative implementation might use the [`import.meta.url`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta) +property. This would avoid capturing and post-processing an `Error.stack`, but does require usage of ECMAScript Modules. + + +```js +((globalThis._sentryDebugIds=globalThis._sentryDebugIds||{})[import.meta.url]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"); +``` + +**pros** + +- More compact snippet. +- No post-processing required. + +**cons** + +- Depends on usage of ECMAScript Modules. + +## When to inject the `DebugId` into the JavaScript file? + +Deploying JavaScript applications can range from a simple _copy files via ftp_ +to a complex workflow like the following: + +```mermaid +graph TD + transpile[Transpile source files] --> bundle[Bundle source files] + bundle --> minify[Minify bundled chunk] + minify --> fingerprint[Fingerprint minified chunks] + minify --> sentry[Upload release to Sentry] + fingerprint --> upload[Upload assets to CDN] + upload --> propagate[Wait for CDS assets to propagate] + fingerprint --> deploy[Deploy updated asset references] + propagate --> deploy +``` + +In this example, assets are _fingerprinted_, and after being fully propagated +through a global CDN, they are starting to be referenced from the backend +service via HTML. + +_Fingerprinting_ in this case means creating a unique content-hash which is then +used in various of ways: + +- As part of the filename, to give each file a unique and stable filename. +- Use the derived hash for [Subresource Integrity (SRI)](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity). + +An example may look like this, for a CDN-deployed and fingerprinted reference +to [katex](https://katex.org/docs/browser.html#starter-template): + +```html + +``` + +Not only is the deployment pipeline very complex, it can also involve a variety of tools with varying degree of +integration between them. +The example `