Skip to content

Conversation

@tonyherre
Copy link
Contributor

@tonyherre tonyherre commented Feb 20, 2023

Add a captureTimestamp to RTCEncodedVideoFrameMetadata, defined only for local device captures to match the mediaTime as defined in requestVideoFrameCallback() and also match the timestamp in WebCodecs VideoFrame and EncodedVideoChunk. This allows apps to match encoded frames after encoding/before decoding with the corresponding raw frames in the mediacapture-transform APIs.

Essentially an adoption of the open #137, incorporating the comments there.


Preview | Diff

@tonyherre
Copy link
Contributor Author

@alvestrand Uploaded as we had discussed. Needs some editorial work to add links in the right places, but the logic is there. PTAL

aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 27, 2023
Adds a new field captureTimestamp to the RTCEncodedVideoFrameMetadata
idl to expose this at the JS level. The field gets its value from |capture_time_identifier| in its corresponding webrtc::VideoFrame.

Spec change: w3c/webrtc-encoded-transform#173

Bug: webrtc:14878
Change-Id: If0af3fce6d76c1aa299fa3e0c98be397af970696
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4218374
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Commit-Queue: Palak Agarwal <agpalak@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1110320}
@tonyherre tonyherre force-pushed the tonyherre-capturetime branch from e5b0a7f to 867d5b8 Compare April 4, 2023 14:01
@tonyherre tonyherre changed the title Add captureTimestamp to RTCEncodedVideoFrameMetadata Add presentationTimestamp to RTCEncodedVideoFrameMetadata Apr 4, 2023
@tonyherre
Copy link
Contributor Author

I released after some more testing that what we actually want, to be able to match the WebCodecs VideoFrame.timestamp is to have the presentationTimestamp included with the encodedFrame - this will also match the requestVideoFrameCallback's mediaTime field. Updated this PR to match this. PTAL @alvestrand

index.bs Outdated
<p>
The media presentation timestamp (PTS) in microseconds of raw frame, matching the
{{VideoFrame/timestamp}} for raw frames which correspond to this frame and the
{{VideoFrameCallbackMetadata/mediaTime}} given if this frame is decoded and rendererd.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VideoFrameCallbackMetadata/mediaTime is a double in seconds, maybe we should remove the reference to this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VideoFrameCallbackMetadata/mediaTime is a double in seconds, maybe we should remove the reference to this one.

Do you mean milliseconds? I also found w3c/webcodecs#122 on why WebCodecs uses microseconds for timestamps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jan-ivar jan-ivar Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it uses seconds to match mediaElement.currentTime. 🤦

While I think it makes sense for a method on HTMLVideoElement to stay consistent with its attributes, this here is arguably a lower-level API.

To make matters complicated, W3C design principle § 8.3. Use milliseconds for time measurement says... that.

And in w3c/webcodecs#122 they use microseconds, out of concern for audio AV drift.

I see the original PR was for captureTimestamp and then it was changed to presentationTimestamp... What's the use case for this value? We may need to consider this to figure out which unit to stay consistent with.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our model is VideoFrame -> encoder -> encoded chunk -> encoded transform.
Seems best to be consistent with VideoFrame here, and remove the reference to rvfc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the OP was edited and originally used microseconds. Can someone clarify what changed? Sorry if I missed it, I find the github thread here a bit hard to follow.

Normally, an issue is opened first for discussion, which tends to leave a thread that is more easy to follow, then a PR is opened later once discussion solidifies (github's PR workflow tends to hide things once resolved, which suits a review process more than a discussion imho).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion and links!
Consistency with VideoFrame is indeed the motivating usecase here as Youenn said - so that an app is able to associate a raw frame before encoding with an encoded frame afterwards, when applying transforms on both sides. I'm happy to just remove the reference to rVFC here - would definitely simplify the text.

Apologies for editing the in-flight PR. The intention was always to have a field here which matched VideoFrame.timestamp, but the previous PR #137 that I was asked to adopt seemed to mistake which timestamp this corresponded to. Given noone but Harald had commented here yet, I thought it easier to just modify in place before kicking off the review more widely.

@aboba
Copy link
Contributor

aboba commented Apr 6, 2023

Why is presentationTimestamp defined with a different type than presentationTime as defined in VideoFrameCallbackMetadata? If these are defining the same thing, can we be consistent?

@tonyherre
Copy link
Contributor Author

Why is presentationTimestamp defined with a different type than presentationTime as defined in VideoFrameCallbackMetadata? If these are defining the same thing, can we be consistent?

The primary motivator is for being able to match with VideoFrame.timestamp, which uses a long long timestamp. I was trying to also pull in the rVFC field, per the discussion towards the end of #137, but that was probably a mistake. Per the comment thread with Jan-Ivar and Youenn, it seems to make sense to define this only in reference with the VideoFrame timestamp - I've pushed a commit removing the ref from the text at least. Does that make sense to you @aboba?

@jan-ivar
Copy link
Member

Thanks, microseconds makes sense now. However, I remain concerned about the name.

Consistency with VideoFrame and EncodedVideoChunk would mean calling it "timestamp ... The presentation timestamp, given in microseconds. The timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame."

However, in RTCEncodedVideoFrame timestamp is taken (used for RTP). But there's a comment claiming this "will eventually re-use or extend the equivalent defined in WebCodecs" (presumably EncodedVideoChunk)?

image

If that is still our intent, we would appear to be at a crossroads. Is it too late to align this?

@tonyherre
Copy link
Contributor Author

How about us using timestamp here inside RTCEncodedVideoFrameMetadata for this presentation timestamp field, to match WebCodecs, and I'll create a follow-up PR to also add rtpTimestamp inside this Metadata and deprecating/removing the RTCEncodedVideoFrame.timestamp field?

This moves us into more alignment with the WebCodecs type by having a consistent timestamp field, gets away from the confusion of having two timestamps without needing to have an immediate breaking change, and preserves the amount of information available to web apps.

@dontcallmedom-bot
Copy link

@tonyherre
Copy link
Contributor Author

I've pushed a commit implementing the name suggestion from my previous comment - calling this new field just timestamp to match WebCodecs.
I suggest we land this along with a followup similar to this PR: tonyherre#1 to remove the confusion of having two different timestamps on encoded video frames and continue towards WebCodecs alignment. Happy to merge that into this PR as well if that would be simpler.

PTAL @aboba, @jan-ivar, @alvestrand et al

@henbos
Copy link
Collaborator

henbos commented Apr 25, 2023

This was discussed during the last Virtual Interim and there was support for it

@henbos henbos merged commit 442e4c2 into w3c:main Apr 27, 2023
github-actions bot added a commit that referenced this pull request Apr 27, 2023
SHA: 442e4c2
Reason: push, by henbos

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
aarongable pushed a commit to chromium/chromium that referenced this pull request Dec 13, 2023
The old name was based on a misconception by us when initially
implementing, clarified during the spec discussions and landed there in
w3c/webrtc-encoded-transform#173.

Driveby fix removing the incorrect comment on the value of rtpTimestamp.

Note: still guarded by the as yet unlaunched feature
RTCEncodedVideoFrameAdditionalMetadata

Bug: 1441825
Change-Id: I388b197e93858957a79ed11c43d7a474f80c56a7
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5110960
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Auto-Submit: Tony Herre <toprice@chromium.org>
Commit-Queue: Guido Urdaneta <guidou@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1236889}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants