Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions text/XXXX-browser-traces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
* Start Date: 2022-09-28
* RFC Type: decision
* RFC PR: <link>

# Summary

This RFC proposes to require a frontend SDK to retain a trace that it continues
until the SDK naturally ends the user session. This change is primarily to the
browser trace model to better support dynamic sampling and to create a more
coherent user experience.
Comment on lines +7 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this affects mobile as well since the concept is similar to the browser trace model, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will also affect mobile - by making ui.load and resulting interaction and navigation transactions linked.


# Motivation

Today there are two ways to start a trace: they can be started on the backend and then
continued on the client, or they can be started on the client. In either case however
client SDKs are likely to create more traces that are disconnected to the backend trace
on page navigation or view changes. This creates the situation that today the only sensible
way for dynamic sampling is to have traces started uniformly on the frontend project as
otherwise the dynamic sampling rules from both projects need to be modified (head trace
is where dynamic sampling rules apply).

The secondary motivation is that creating new traces on navigation also wipes out the
causal relationship to what happened before. For instance it's more than possible that
before a client-side navigation the state of the application corrupted, but we lose that
trace relationship and a user has to manually piece it back together by for instance
listing all transactions created by a specific user ID.

# Background

We became aware of this problem in two ways recently:

1. Users want to sometimes create a transaction within another transaction. Today there
is no way to link these together for the purpose of dynamic sampling. A separate RFC
[0014](https://github.com/getsentry/rfcs/pull/14) is proposed to add an explicit way
to carry forward the sampling context for a new transaction started after an already
existing one. It works by explicitly continuing the trace. This solves part of this
issue, but it leaves out the case where the sampling context naturally moves to another
Sentry project.

2. We wanted to change our own tracing integration to start tracing on the server
[Sentry PR #39349](https://github.com/getsentry/sentry/pull/39349) where this would
require mirroring the sampling settings to another Sentry project and would also affect
API requests detatched from user sessions.

# Supporting Data

The [honeycomb whitepaper on front-end observability](https://www.honeycomb.io/wp-content/uploads/2022/03/Front-end-Observability-Whitepaper-1.pdf)
recommends continung traces from the server until the natural end of the user Session:

> To accomplish this task, you will use the first event (page load) as the start of your trace and
> connect that first event to additional spans to build a full trace of the user session. Each span
> will represent a single thing that you want to track, such as a server request or a user click.

# Options Considered

There are multiple ways in which this problem can be addressed.

## Encouraged Root Trace Project and Session Long Traces

In the most trivial case the recommendation to customers would be to pick one project that
starts traces for real user sessions. This could be *either* the frontend or backend, but it
should attempt to be consistent about it. In either case the client SDK should *continue the
trace* until the browser tab naturally closes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is with traces that then can be multiple hours long - and so then the value of analyzing a trace is completely lost.

I'm strongly against this because I feel like it'll reduce the value of trace view, and make it harder for us to expand the tracing product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a bit like a UX issue? I don't think that it's inherently problematic if a trace is very long, we today already do not draw much of a value out of the trace view.

To me it feels at least like there is more value to the trace being connected than the trace being split into transactions on every navigation.

Copy link
Member

@dashed dashed Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User inactivity in a browser tab could be another natural way of ending a session, and beginning a new one.

There could be users that leave tabs open for a very long time, and having large time gaps between transactions might seem weird.


The consequences are that ``startTransaction()`` always anchors to the already open trace
and dynamic sampling context. There can be an extra flag to force the start of a new trace
but that would be strongly recommended against.

## Alternative A: Detaching Sampling Project from Root Project

An alternative approach would be to allow a transaction to start again on the client but to
continue with the sampling context that came from the server. In that case the root of the
trace is in fact the frontend for continued transactions after a page navigate, however the
sampling context is reused from the original server side request.

In this case the relationship of root project setting the dynamic sampling context would be
broken up and instead a transaction can explicitly pick up the sampling context of another
project but still issue disconnected traces.

## Alternative B: Trace to Trace Relationships

A potential alternative would be to continue the current project but allow a trace to annotate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, in my eyes there are various correlation ID mechanisms that we chat about, and it would be nice to get an RFC going that establishes them all. Off the top of my head:

  • release id
  • user id
  • session id
  • trace id
  • profile id
  • replay id (basically a session id)

The most important relationships are as follows:

  • 1 users x R releases
  • 1 user x S sessions/replays
  • 1 user x T traces
  • 1 session/replay x T traces
  • 1 profile x 1 trace (though this is changing to 1 profile x T trace iirc)

If we can establish these concepts consistently it'll be a nice foundation for us to keep building off of

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big +1 here, i still feel like we are moving torwards treating a trace more like a session, and it will get more and more confusing with our existing sessions/releases and session replay (which is closes to what is the standard definition of a session)

The problem we are solving for to relate transactions across a user session by exanding the inclusion of more traces in a single trace for sampling, I feel will just create more fuzzy area. Confusion for us and our users.

itself as being the successor of another trace. Our data model currently does not have a
trace to trace relationship but such a desire has come up before with session replays. In that
case when a new trace starts on the client, it can annotate itself as the successor of a prior
trace and take over that sampling context.

# Drawbacks

Not addressing this issue might result in user confusion later as front-end user sessions
are likely to originate in different projects with different sampling rules. However future
direction assumes that sampling will eventually happen adaptively in which case the user
confusion is less likely to be an issue.

# Unresolved questions

* This RFC does not attempt to address the issue of traces vs sessions for replays or other
RUM like situations.