diff --git a/files/en-us/web/api/webcodecs_api/audio-data.png b/files/en-us/web/api/webcodecs_api/audio-data.png new file mode 100644 index 000000000000000..6208bcea6f7615f Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/audio-data.png differ diff --git a/files/en-us/web/api/webcodecs_api/audio-encoder-decoder.png b/files/en-us/web/api/webcodecs_api/audio-encoder-decoder.png new file mode 100644 index 000000000000000..d79e2958930b9dd Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/audio-encoder-decoder.png differ diff --git a/files/en-us/web/api/webcodecs_api/codec_selection/index.md b/files/en-us/web/api/webcodecs_api/codec_selection/index.md new file mode 100644 index 000000000000000..5f9e4fd456535b7 --- /dev/null +++ b/files/en-us/web/api/webcodecs_api/codec_selection/index.md @@ -0,0 +1,282 @@ +--- +title: Codec selection +slug: Web/API/WebCodecs_API/Codec_selection +page-type: guide +--- + +{{DefaultAPISidebar("WebCodecs API")}} + +While developers commonly refer to codecs by their code identifier string, such as `vp9` or `h264`, there are many configuration profiles, levels, and other parameters that control exactly how the data is encoded and decoded. + +The [WebCodecs API](/en-US/docs/Web/API/WebCodecs_API) requires working with fully specified codec strings, such as `vp09.00.40.08.00`, instead of ambiguous strings like `vp9` or `h264`. Fully specified codec strings detail not just the codec family but also the profile, level, and other parameters. + +Selecting the correct string depends on your use case, but is primarily influenced by compatibility concerns, and the hardware and software on which you want to run. This guide explains how codec strings work, how to choose the right codecs for [common use cases](#common_use_cases), and common approaches for gracefully falling back to alternative codec strings when your preferences are unavailable. + +## Decoding vs encoding + +When **decoding** a video or audio file, the codec is determined by how the file was originally encoded — you do not choose it. Demuxing libraries such as [Mediabunny](https://mediabunny.dev/) and [web-demuxer](https://github.com/bilibili/web-demuxer) will extract the correct codec string for a given file, which you can supply directly to {{domxref("VideoDecoder")}} or {{domxref("AudioDecoder")}} during configuration. + +When **encoding**, you choose the codec. The rest of this guide covers how to choose a codec. + +## Video codecs + +### Video codec families + +Before choosing a codec string, like `vp09.00.40.08.00` or `avc1.4d0034`, it is worth reviewing the codec families. + +#### H.264 (AVC) + +H.264 is one of the most widely supported codecs across browsers, operating systems, and consumer devices. It is the most common codec used in MP4 files, and applications which encode videos intended for playback in third-party software typically choose H.264 as a pragmatic choice for maximum compatibility. + +While popular, it is worth noting that H.264 is a patented codec. While browser vendors hold licenses covering the H.264 encoder implementations used by WebCodecs, the codec is subject to royalties in certain circumstances. Developers should review usage with legal counsel. + +#### VP9 + +VP9 is an open source codec developed by Google, and offers better compression than H.264 at equivalent quality. VP9 within WebM containers is widely supported across modern browsers, with coverage comparable to or exceeding H.264. + +VP9 within WebM containers is also supported by native video players on Windows (Windows Media Player) and third-party players such as VLC, but currently lacks native playback support on macOS and iOS. + +VP9 is sometimes, but not always, supported as a codec within MP4 files, as support for this configuration depends on playback software. + +VP9 is often chosen for internal use cases for its better compression, or when open source licensing matters. + +#### AV1 + +AV1 is a newer open source codec developed by the [Alliance for Open Media](https://aomedia.org/). AV1 has better compression than both H.264 and VP9 for the same quality, and decode support is now over 90% coverage globally across browsers. + +AV1 encoding support is strong across desktop browsers but limited on Safari and on Android. AV1 offers better quality-per-bit than VP9, but is more computationally intensive to encode. Consumer devices increasingly have support for AV1 hardware acceleration, which can make AV1 encoding more practical. The decision to use AV1 over VP9 typically comes down to whether the better quality-per-bit justifies the additional encoding overhead for a given use case. + +#### HEVC (H.265) + +HEVC offers better compression than H.264 but has significant gaps in browser encoding support outside of Apple platforms. +It is not recommended as a general-purpose encoding target. + +Like H.264, HEVC is a patented codec. The codec is subject to royalties in certain circumstances. Developers should review usage with legal counsel. + +### Codec-container compatibility + +Not all codecs are supported by all containers. +The following table covers the two most common web video containers: + +| Codec | MP4 | WebM | +| ----- | ------- | ---- | +| H.264 | Yes | No | +| VP9 | Partial | Yes | +| AV1 | Partial | Yes | +| HEVC | Yes | No | + +H.264 is the standard codec for MP4. VP9 and AV1 are the standard codecs for WebM. +While VP9 and AV1 have partial MP4 support in some environments, pairing them with WebM is more reliable. + +### Codec string selection + +For each codec family, there are hundreds of possible codec strings. + +Each codec string encodes a **profile** and **level** that determine the capabilities and compatibility of the encoded stream. The profile controls which encoding features are enabled — lower profiles such as Baseline are simpler and more broadly compatible, while higher profiles such as High enable better compression at the cost of requiring more capable hardware. The level sets the maximum resolution and bitrate the stream can use. In general, prefer lower profiles and levels unless you specifically need the higher resolution or compression efficiency. + +The following tables provide a practical starting point for codec strings, with levels and profiles that maximize encoding compatibility. + +#### H.264 + +| Codec string | Profile | Max resolution | Support | +| ------------- | -------- | -------------- | ------------------------------------------------------------------ | +| `avc1.42001f` | Baseline | 720p | [99.6%](https://webcodecsfundamentals.org/codecs/avc1.42001f.html) | +| `avc1.4d0034` | Main | 4K | [98.9%](https://webcodecsfundamentals.org/codecs/avc1.4d0034.html) | +| `avc1.42003e` | Baseline | 8K | [86.8%](https://webcodecsfundamentals.org/codecs/avc1.42003e.html) | +| `avc1.64003e` | High | 8K | [85.9%](https://webcodecsfundamentals.org/codecs/avc1.64003e.html) | + +#### VP9 + +| Codec string | Level | Max resolution | Support | +| ------------------ | ----- | -------------- | ------------------------------------------------------------------------ | +| `vp09.00.30.08.00` | 3 | 720p | [99.98%](https://webcodecsfundamentals.org/codecs/vp09.00.30.08.00.html) | +| `vp09.00.40.08.00` | 4 | 2K | [99.96%](https://webcodecsfundamentals.org/codecs/vp09.00.40.08.00.html) | +| `vp09.00.50.08.00` | 5 | 4K | [99.97%](https://webcodecsfundamentals.org/codecs/vp09.00.50.08.00.html) | +| `vp09.00.61.08.00` | 6.1 | 8K | [99.97%](https://webcodecsfundamentals.org/codecs/vp09.00.61.08.00.html) | + +#### AV1 + +| Codec string | Level | Max resolution | Support | +| --------------- | ----- | -------------- | -------------------------------------------------------------------- | +| `av01.0.05M.08` | 3.1 | 720p | [87.9%](https://webcodecsfundamentals.org/codecs/av01.0.05M.08.html) | +| `av01.0.08M.08` | 4.0 | 1080p | [87.8%](https://webcodecsfundamentals.org/codecs/av01.0.08M.08.html) | +| `av01.0.12M.08` | 5.0 | 4K | [87.8%](https://webcodecsfundamentals.org/codecs/av01.0.12M.08.html) | + +#### HEVC + +| Codec string | Level | Max resolution | Support | +| ------------------ | ----- | -------------- | ----------------------------------------------------------------------- | +| `hvc1.1.6.L120.B0` | 4.0 | 1080p | [73.6%](https://webcodecsfundamentals.org/codecs/hev1.1.6.L120.B0.html) | +| `hvc1.1.6.L150.B0` | 5.0 | 4K | [73.6%](https://webcodecsfundamentals.org/codecs/hvc1.1.6.L150.B0.html) | +| `hvc1.1.6.L180.B0` | 6.0 | 8K | [73.1%](https://webcodecsfundamentals.org/codecs/hvc1.1.6.L180.B0.html) | + +See the [Codec Support Table](https://webcodecsfundamentals.org/datasets/codec-support-table/) for an exhaustive list of potential codec strings, and support across browsers & devices. + +### Codec string format + +The fully qualified codec string encodes the codec family, profile, level, and other parameters that affect which hardware can encode or decode the stream and at what resolution and quality. + +The format for these codec strings is specified in the [W3C codec registry](https://www.w3.org/TR/webcodecs-codec-registry/), and the format is different for each codec family. + +#### H.264 + +`avc1.4d0034` + +- `avc1` — H.264/AVC codec identifier +- `4d` — Profile IDC in hexadecimal (`4d` = Main profile) +- `00` — Constraint flags +- `34` — Level IDC in hexadecimal (`34` = level 5.2, supports up to 4K) + +#### VP9 + +`vp09.00.40.08.00` + +- `vp09` — VP9 codec identifier +- `00` — Profile +- `40` — Level (`40` = level 4.0, supports up to 2K) +- `08` — Bit depth (8-bit) +- `00` — Chroma subsampling + +#### AV1 + +`av01.0.05M.08` + +- `av01` — AV1 codec identifier +- `0` — Profile (Main) +- `05M` — Level and tier (`05` = level 3.1, `M` = Main tier) +- `08` — Bit depth (8-bit) + +#### HEVC + +`hvc1.1.6.L150.B0` + +- `hvc1` — HEVC codec identifier (MP4/QuickTime variant) +- `1` — Profile (`1` = Main profile) +- `6` — Compatibility flags +- `L150` — Level × 30 (`L150` = level 5.0, supports up to 4K) +- `B0` — Tier and constraint flags (`B0` = Main tier) + +## Audio codecs + +### Opus + +Opus is an open source codec with broad encoding support across browsers and platforms. It is the standard audio codec for WebM files, and the recommended choice for most WebCodecs audio encoding use cases. + +### AAC + +AAC is the standard audio codec for MP4 files and is required when targeting MP4 output. However, AAC encoding support in WebCodecs has notable gaps: it is not supported in Firefox on any platform, or in any browser on desktop Linux. + +AAC encoding is universally supported on Safari versions that support {{domxref("AudioEncoder")}} (Safari 26+), but previous versions of Safari do not support audio encoding in general. + +### MP3 and PCM + +MP3 and PCM are not widely supported as encoding targets, with MP3 encoding not currently supported by any major browser. PCM (uncompressed audio) is available as a {{domxref("AudioData")}} format for raw audio processing, but support for encoding with `AudioEncoder` is limited. + +### Audio codec string reference + +Audio codec strings are simpler than video codec strings. Opus requires no additional parameters; AAC uses a short parameter string. + +| Codec | Codec string | Container | Encoder support | Decoder support | +| ------ | ------------ | --------- | ---------------------------------------------------------------- | ---------------------------------------------------------------- | +| Opus | `opus` | WebM | [96.1%](https://webcodecsfundamentals.org/codecs/opus.html) | [96.5%](https://webcodecsfundamentals.org/codecs/opus.html) | +| AAC | `mp4a.40.2` | MP4 | [90.1%](https://webcodecsfundamentals.org/codecs/mp4a.40.2.html) | [96.4%](https://webcodecsfundamentals.org/codecs/mp4a.40.2.html) | +| MP3 | `mp3` | — | [0%](https://webcodecsfundamentals.org/codecs/mp3.html) | [96.5%](https://webcodecsfundamentals.org/codecs/mp3.html) | +| FLAC | `flac` | — | [0%](https://webcodecsfundamentals.org/codecs/flac.html) | [96.5%](https://webcodecsfundamentals.org/codecs/flac.html) | +| Vorbis | `vorbis` | WebM | [3.8%](https://webcodecsfundamentals.org/codecs/vorbis.html) | [96.5%](https://webcodecsfundamentals.org/codecs/vorbis.html) | +| PCM | `pcm-f32` | — | [8.7%](https://webcodecsfundamentals.org/codecs/pcm-f32.html) | [94.6%](https://webcodecsfundamentals.org/codecs/pcm-f32.html) | + +The lower AAC encoding support figure reflects the platform gaps described above — Firefox (all platforms), desktop Linux (all browsers), and partial support for `AudioEncoder` on Apple devices. AAC has several variants — `mp4a.40.2` (AAC-LC) is the standard choice for encoding. `mp4a.40.5` and `mp4a.40.29` correspond to HE-AAC configurations using Spectral Band Replication (SBR), which causes the decoder to output audio at double the configured sample rate. + +PCM is available in several variants: `pcm-f32` (32-bit float), `pcm-s16` (16-bit signed), `pcm-s24` (24-bit signed), `pcm-s32` (32-bit signed), and `pcm-u8` (8-bit unsigned). All variants have equivalent browser support. The `pcm-f32` format matches the `f32-planar` layout used by {{domxref("AudioData")}} and is the most practical choice for raw audio processing. + +Use {{domxref("AudioEncoder/isConfigSupported_static", "AudioEncoder.isConfigSupported()")}} to check support at runtime before configuring an `AudioEncoder`. Note that `AudioEncoder` itself is not available in all browsers — check for its existence with `typeof AudioEncoder !== "undefined"` before calling `isConfigSupported()`. + +## Common use cases + +You need to choose a video codec and an audio codec, along with the container format, together as a package. For practical quickstart guidance, here are some common configurations: + +- **Targeting maximum compatibility** (video intended for playback in third-party software or on a wide range of devices): H.264 (e.g., `avc1.4d0034`) + AAC (`mp4a.40.2`) in an MP4 container is the most common choice in practice. +- **Open-source projects or applications controlling both encoding and playback** (e.g., internal tooling, in-app streaming): VP9 (e.g., `vp09.00.40.08.00`) + Opus (`opus`) in a WebM container is a natural fit — both are open-source, and WebM is the standard container for this combination. +- **Maximum compression** (e.g., large-scale streaming): AV1 + Opus in a WebM container, provided your target audience has sufficient hardware support. Use {{domxref("VideoEncoder/isConfigSupported_static", "VideoEncoder.isConfigSupported()")}} to verify before committing to this combination. + +## Checking support at runtime + +Before encoding, use {{domxref("VideoEncoder/isConfigSupported_static", "VideoEncoder.isConfigSupported()")}} to verify that a given configuration is supported on the current device: + +```js +const { supported } = await VideoEncoder.isConfigSupported({ + codec: "avc1.4d0034", + width: 1920, + height: 1080, +}); +``` + +Since hardware support varies by device, a common pattern is to test codec strings from highest to lowest quality and use the first one supported: + +```js +const candidates = ["avc1.64003e", "avc1.4d0034", "avc1.42003e", "avc1.42001f"]; +let codecString; + +for (const codec of candidates) { + const { supported } = await VideoEncoder.isConfigSupported({ + codec, + width: 1920, + height: 1080, + bitrate: 5_000_000, + framerate: 30, + }); + if (supported) { + codecString = codec; + break; + } +} +``` + +The same pattern applies to VP9: + +```js +const candidates = [ + "vp09.00.61.08.00", + "vp09.00.50.08.00", + "vp09.00.40.08.00", + "vp09.00.10.08.00", +]; +let codecString; + +for (const codec of candidates) { + const { supported } = await VideoEncoder.isConfigSupported({ + codec, + width: 1920, + height: 1080, + bitrate: 5_000_000, + framerate: 30, + }); + if (supported) { + codecString = codec; + break; + } +} +``` + +The same pattern applies to audio. Since {{domxref("AudioEncoder")}} is not available in all browsers, check for its existence before calling `isConfigSupported()`: + +```js +if (typeof AudioEncoder !== "undefined") { + const { supported } = await AudioEncoder.isConfigSupported({ + codec: "opus", + sampleRate: 48000, + numberOfChannels: 2, + }); +} +``` + +## See also + +- [WebCodecs Support Dataset](https://zenodo.org/records/19187467) +- [Video processing concepts](/en-US/docs/Web/API/WebCodecs_API/Video_processing_concepts) +- [Using the WebCodecs API](/en-US/docs/Web/API/WebCodecs_API/Using_the_WebCodecs_API) +- [Codec Support Table](https://webcodecsfundamentals.org/datasets/codec-support-table/) on WebCodecs Fundamentals +- {{domxref("VideoEncoder/isConfigSupported_static", "VideoEncoder.isConfigSupported()")}} +- {{domxref("VideoDecoder/isConfigSupported_static", "VideoDecoder.isConfigSupported()")}} +- {{domxref("AudioEncoder/isConfigSupported_static", "AudioEncoder.isConfigSupported()")}} +- {{domxref("AudioDecoder/isConfigSupported_static", "AudioDecoder.isConfigSupported()")}} diff --git a/files/en-us/web/api/webcodecs_api/decoder-demuxer.png b/files/en-us/web/api/webcodecs_api/decoder-demuxer.png new file mode 100644 index 000000000000000..ffb4fbc3ffc29b1 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/decoder-demuxer.png differ diff --git a/files/en-us/web/api/webcodecs_api/index.md b/files/en-us/web/api/webcodecs_api/index.md index 023c21d4dbdd5a7..9745b1bff555ae5 100644 --- a/files/en-us/web/api/webcodecs_api/index.md +++ b/files/en-us/web/api/webcodecs_api/index.md @@ -6,43 +6,105 @@ page-type: web-api-overview {{DefaultAPISidebar("WebCodecs API")}}{{AvailableInWorkers("window_and_dedicated")}} -The **WebCodecs API** gives web developers low-level access to the individual frames of a video stream and chunks of audio. -It is useful for web applications that require full control over the way media is processed. -For example, video or audio editors, and video conferencing. +The **WebCodecs API** enables web developers to encode and decode video and audio in the browser efficiently (using hardware acceleration) and with very low-level control (processing on a per-frame basis). -## Concepts and Usage +It is useful for web applications that do heavy media processing, or which require low-level control over the way media is encoded. +This includes browser-based video and audio editing, as well as live-streaming and video conferencing. -Many Web APIs use media codecs internally. -For example, the [Web Audio API](/en-US/docs/Web/API/Web_Audio_API), and the [WebRTC API](/en-US/docs/Web/API/WebRTC_API). -However these APIs do not allow developers to work with individual frames of a video stream and unmixed chunks of encoded audio or video. +## Concepts -Web developers have typically used WebAssembly in order to get around this limitation, -and to work with media codecs in the browser. -However, this requires additional bandwidth to download codecs that already exist in the browser, -reducing performance and power efficiency, and adding additional development overhead. +The WebCodecs API provides browser-native interfaces to represent raw video frames, encoded video frames, as well as raw and encoded audio. -The WebCodecs API provides access to codecs that are already in the browser. -It gives access to raw video frames, chunks of audio data, image decoders, audio and video encoders and decoders. +| | Video | Audio | +| ----------- | -------------------------------- | -------------------------------- | +| **Raw** | {{domxref("VideoFrame")}} | {{domxref("AudioData")}} | +| **Encoded** | {{domxref("EncodedVideoChunk")}} | {{domxref("EncodedAudioChunk")}} | -### Processing Model +The WebCodecs API also introduces the {{domxref("VideoDecoder")}} and {{domxref("VideoEncoder")}} interfaces, which transform `EncodedVideoChunk` objects into `VideoFrame` objects and vice-versa. -The WebCodecs API uses an asynchronous [processing model](https://w3c.github.io/webcodecs/#codec-processing-model-section). Each instance -of an encoder or decoder maintains an internal, independent processing queue. When queueing a substantial amount of work, it's important to -keep this model in mind. +![VideoEncoder and VideoDecoder](video-encoder-decoder.png) -Methods named `configure()`, `encode()`, `decode()`, and `flush()` operate asynchronously by appending control messages -to the end the queue, while methods named `reset()` and `close()` synchronously abort all pending work and purge the -processing queue. After `reset()`, more work may be queued following a call to `configure()`, but `close()` is a permanent operation. +Likewise, the WebCodecs API also introduces the {{domxref("AudioDecoder")}} and {{domxref("AudioEncoder")}} interfaces, which transform `EncodedAudioChunk` objects into `AudioData` objects and vice-versa. -Methods named `flush()` can be used to wait for the completion of all work that was pending at the time `flush()` was called. However, it -should generally only be called once all desired work is queued. It is not intended to force progress at regular intervals. Calling it -unnecessarily will affect encoder quality and cause decoders to require the next input to be a key frame. +![AudioEncoder and AudioDecoder](audio-encoder-decoder.png) -### Demuxing +Generally there is a 1:1 correspondence between the raw and encoded versions of each media type. Decoding a number of `EncodedVideoChunk` objects will yield the same number of `VideoFrame` objects (and this is also true for audio). -There is currently no API for demuxing media containers. Developers working with containerized media will need to implement their own -or use third party libraries. E.g., [MP4Box.js](https://github.com/gpac/mp4box.js/) or [jswebm](https://github.com/jscodec/jswebm) can be -used to demux audio and video data into {{domxref("EncodedAudioChunk")}} and {{domxref("EncodedVideoChunk")}} objects respectively. +### Video + +A `VideoFrame` represents a video frame, and is tied to actual pixel data on the device's graphics memory, as well as metadata such as the timestamp and duration (in microseconds), format and resolution. A `VideoFrame` can be constructed from any image source, and can also be rendered to a {{domxref("Canvas")}} using any of the canvas rendering methods. + +`EncodedVideoChunk` represents the encoded (compressed) version of the same frame, tied to binary data in regular memory and the same metadata. +The only difference is that it has one additional field: `type`, which can be "key" or "delta", representing whether or not it corresponds to a [key frame](https://webcodecsfundamentals.org/basics/encoded-video-chunk/#key-frames). An `EncodedVideoChunk` typically stores 10 to 100 times less data than its corresponding raw `VideoFrame`. + +![VideoFrame and EncodedVideoChunk](video-frame.png) + +### Audio + +An `AudioData` object represents a number of individual audio samples (1024 is a typical number). Audio sample data can be extracted as a {{jsxref("Float32Array")}} via the `copyTo` method. There is no direct integration with the [Web Audio API](/en-US/docs/Web/API/Web_Audio_API); however, the extracted `Float32Array` samples can be copied directly into a {{domxref("AudioBuffer")}} for playback. + +Likewise, the `EncodedAudioChunk` represents the encoded (compressed) version of an `AudioData` object, containing compressed audio sample data. + +![AudioData and EncodedAudioChunk](audio-data.png) + +### Processing model + +The WebCodecs API uses an asynchronous [processing model](https://w3c.github.io/webcodecs/#codec-processing-model-section). Each instance of an encoder or decoder maintains an internal, independent processing queue. When queueing a substantial number of encoded chunks for decoding or frames/samples for encoding, it's important to keep this model in mind. + +Methods named {{domxref("VideoEncoder/configure", "configure()")}}, {{domxref("VideoEncoder/encode", "encode()")}}, {{domxref("VideoDecoder/decode", "decode()")}}, and {{domxref("VideoEncoder/flush", "flush()")}} operate asynchronously by appending control messages to the end of the queue, while methods named {{domxref("VideoEncoder/reset", "reset()")}} and {{domxref("VideoEncoder/close", "close()")}} synchronously abort all pending work and purge the processing queue. After `reset()`, more work may be queued following a call to `configure()`, but `close()` is a permanent operation. These methods work for both Audio and Video decoders/encoders. + +The `flush()` method can be used to wait for the completion of all work that was pending at the time `flush()` was called. However, it should generally only be called once all desired work is queued — it is not intended to force progress at regular intervals. Calling it unnecessarily will affect encoder quality and cause decoders to require the next input to be a key frame. + +### Codecs + +A codec is a specific algorithm for encoding (compressing) and decoding (decompressing) video and audio. There are several industry standard codecs for video, and a separate set of codecs for audio. Here are the major ones supported by the WebCodecs API: + +#### Video codecs + +- H.264 (AVC) + - : The most widely supported video codec. Most MP4 files use H.264. +- VP9 + - : Open source, developed by Google. Better compression than H.264. Commonly used on YouTube and in WebM files. +- AV1 + - : The newest open source codec, with better compression than VP9. Broad decoder support; hardware encoder support is still limited. +- H.265 (HEVC) + - : Better compression than H.264, but with significant gaps in browser support outside of Apple platforms. + +#### Audio codecs + +- Opus + - : Open source, low-latency. The recommended choice for most WebCodecs audio encoding. +- AAC + - : Widely supported. Common in MP4 files. +- MP3 + - : Broadly supported for decoding, but not available as an encoder in WebCodecs. +- PCM + - : Uncompressed audio. No quality loss, but large file sizes. + +The WebCodecs specification supports a particular set of codecs, and individual devices and browsers may only support a subset of those. Encoders and decoders must be configured with fully specified codec strings (such as `"vp09.00.40.08.00"` for VP9 or `"avc1.4d0034"` for H.264) instead of ambiguous codec names like `"vp9"` or `"h264"`. The [Codec selection guide](/en-US/docs/Web/API/WebCodecs_API/Codec_selection) provides guidance on choosing an appropriate codec string (see the [Codec Support Table](https://webcodecsfundamentals.org/datasets/codec-support-table/) (webcodecsfundamentals.org) for a complete list of codec strings and their browser support). + +### Muxing and demuxing + +The WebCodecs API only deals with encoding and decoding, with encoded chunks just representing binary data. It does not provide a built-in way to read `EncodedVideoChunk` objects from a video file, or write them to a playable video file. + +Reading encoded chunks from a video file is a completely different process called demuxing, and to fetch `EncodedVideoChunk` objects from a video file, you will need to use a demuxing library such as [Mediabunny](https://mediabunny.dev/) or [web-demuxer](https://github.com/bilibili/web-demuxer). + +![Demuxer](decoder-demuxer.png) + +These libraries will follow the video container specifications (e.g., webm, mp4) to extract the track data and byte offsets for each encoded chunk, and provide methods for extracting the actual chunks from the raw file. + +Likewise, to write to a playable video file, you will need a muxing library, with [Mediabunny](https://mediabunny.dev/) being the primary option. Muxing libraries handle formatting the binary encoded data, and placing it in the correct byte position in the output file stream according to the container specification, so that the output video is playable. + +You can find more information on muxing and demuxing in the [Muxing and Demuxing guide](https://webcodecsfundamentals.org/basics/muxing/) (webcodecsfundamentals.org). + +## Guides + +- [Video processing concepts](/en-US/docs/Web/API/WebCodecs_API/Video_processing_concepts) + - : A brief primer on video processing, covering codecs and containers, muxing and demuxing, and conceptual information that explains how the WebCodecs API implements these concepts. +- [Using the WebCodecs API](/en-US/docs/Web/API/WebCodecs_API/Using_the_WebCodecs_API) + - : In depth guide to how to actually use the WebCodecs API, including how to instantiate and configure encoders and decoders, how to create and consume video frames, and how to extract samples from `AudioData`. +- [Codec selection](/en-US/docs/Web/API/WebCodecs_API/Codec_selection) + - : The WebCodecs API requires codec strings — precise identifiers that specify not just the codec family but also the profile, level, and other parameters. This guide explains how codec strings work and how to choose the right codec for common use cases. ## Interfaces @@ -73,34 +135,49 @@ used to demux audio and video data into {{domxref("EncodedAudioChunk")}} and {{d ## Examples -In the following example, frames are returned from a {{domxref("MediaStreamTrackProcessor")}}, then encoded. -See the full example and read more about it in the article [Video processing with WebCodecs](https://developer.chrome.com/docs/web-platform/best-practices/webcodecs). +### Basic usage + +To instantiate a `VideoEncoder`, we pass an object that specifies a callback function that will be called when `EncodedVideoChunk` instances are available for processing, and an error function that will be called if there are errors. +This is shown in the following code: ```js -let frameCounter = 0; -const track = stream.getVideoTracks()[0]; -const mediaProcessor = new MediaStreamTrackProcessor(track); -const reader = mediaProcessor.readable.getReader(); -while (true) { - const result = await reader.read(); - if (result.done) break; - let frame = result.value; - if (encoder.encodeQueueSize > 2) { - // Too many frames in flight, encoder is overwhelmed - // let's drop this frame. - frame.close(); - } else { - frameCounter++; - const insertKeyframe = frameCounter % 150 === 0; - encoder.encode(frame, { keyFrame: insertKeyframe }); - frame.close(); - } +const encoder = new VideoEncoder({ + output(chunk, meta) { + // Do something with chunk, typically send to muxing library + }, + error(e) { + console.warn(e); + }, +}); +``` + +You then need to configure the encoder with the codec parameter and various other fields. + +```js +encoder.configure({ + codec: "vp09.00.40.08.00", // See codec selection guide + width: 1280, + height: 720, + bitrate: 1_000_000, // 1 Mbps + framerate: 30, +}); +``` + +You can then start encoding frames to the encoder. You can construct a `VideoFrame` from a `Canvas`. + +```js +for (let i = 0; i < 60; i++) { + const frame = new VideoFrame(canvas, { timestamp: (i * 1e6) / 30 }); //30 fps, in microseconds + encoder.encode(frame, { keyFrame: i % 60 === 0 }); } ``` +See [Using the WebCodecs API](/en-US/docs/Web/API/WebCodecs_API/Using_the_WebCodecs_API) for more examples. + ## See also - [Video processing with WebCodecs](https://developer.chrome.com/docs/web-platform/best-practices/webcodecs) - [WebCodecs API Samples](https://w3c.github.io/webcodecs/samples/) +- [WebCodecsFundamentals](https://webcodecsfundamentals.org/) - [Real-Time Video Processing with WebCodecs and Streams: Processing Pipelines](https://webrtchacks.com/real-time-video-processing-with-webcodecs-and-streams-processing-pipelines-part-1/) - [Video Frame Processing on the Web – WebAssembly, WebGPU, WebGL, WebCodecs, WebNN, and WebTransport](https://webrtchacks.com/video-frame-processing-on-the-web-webassembly-webgpu-webgl-webcodecs-webnn-and-webtransport/) diff --git a/files/en-us/web/api/webcodecs_api/using_the_webcodecs_api/index.md b/files/en-us/web/api/webcodecs_api/using_the_webcodecs_api/index.md new file mode 100644 index 000000000000000..524806c0f5d0f9e --- /dev/null +++ b/files/en-us/web/api/webcodecs_api/using_the_webcodecs_api/index.md @@ -0,0 +1,429 @@ +--- +title: Using the WebCodecs API +slug: Web/API/WebCodecs_API/Using_the_WebCodecs_API +page-type: guide +--- + +{{DefaultAPISidebar("WebCodecs API")}} + +This guide covers the basic usage patterns of the WebCodecs API, including how to encode and decode video and audio, as well as how to use {{domxref("VideoFrame")}} and {{domxref("AudioData")}}. + +## Encoding video + +The basic usage pattern for {{domxref("VideoEncoder")}} starts with instantiation, where you define the `output` and `error` callback functions. The `output` callback receives an `EncodedVideoChunk` and a `metadata` parameter — an `EncodedVideoChunkMetadata` dictionary which contains an optional [decoderConfig](/en-US/docs/Web/API/VideoEncoder/VideoEncoder#decoderconfig) property. This metadata is needed by muxing libraries when muxing to a video file. + +```js +const encoder = new VideoEncoder({ + output(chunk, meta) { + // Do something with chunk, typically send to muxing library + }, + error(e) { + // Handle the error + }, +}); +``` + +You then need to configure the encoder with the codec parameter and various other encoding parameters such as width, height, bitrate and framerate. See the [Codec selection](/en-US/docs/Web/API/WebCodecs_API/Codec_selection) guide for guidance on choosing a codec. + +```js +encoder.configure({ + codec: "vp09.00.40.08.00", // See codec selection guide + width: 1280, + height: 720, + bitrate: 1_000_000, // 1 Mbps + framerate: 30, +}); +``` + +You would then start encoding `VideoFrame` objects, where you would not only specify the `VideoFrame` to be encoded, but also the `keyFrame` parameter indicating whether or not the frame should be encoded as a key frame. + +```js +for (let i = 0; i < 60; i++) { + const timestamp = (i * 1e6) / 30; //30 fps, in microseconds + const frame = new VideoFrame(canvas, { timestamp }); + encoder.encode(frame, { keyFrame: i % 60 === 0 }); + frame.close(); +} +``` + +The first frame encoded should be a key frame — while `VideoEncoder` will automatically force the first frame to be a key frame even if not explicitly flagged, it is good practice to set it explicitly. Typical key frame intervals are once every 30 or 60 frames. Using more key frames increases video file size, while using fewer key frames can result in unstable video playback by some video players. + +It is important to close `VideoFrame` objects as soon as they are sent for encoding to avoid memory leaks. `VideoFrame` objects are large enough that applications can crash with fewer than 100 active frames in memory. + +Note that `VideoEncoder` also has a queue of frames to encode called the `encodeQueue`. If you are rendering an animation at 30 fps, run `encoder.encode(frame)` on each render, but the encoder is only able to encode at 10 fps, the encoder queue will eventually grow until it runs out of video memory and the process crashes. + +You therefore need to manage how and when you send frames to the encoder, checking {{domxref("VideoEncoder.encodeQueueSize")}} within your render loop, and ensuring that it does not grow unbounded. + +It is possible to use the `dequeue` event to detect when the encode queue is reduced, to avoid the need to poll `encodeQueueSize`. + +```js +encoder.addEventListener("dequeue", (event) => { + // Queue up more encoding work +}); +``` + +Once you are done sending all frames for encoding, you should call the `flush()` method. + +```js +await encoder.flush(); +``` + +Depending on the device/browser, the encoder may not return the last few `EncodedVideoChunk` objects until `flush()` is called. Once you are done using the `VideoEncoder` completely, you should call the `close()` method to free up system resources. + +```js +encoder.close(); +``` + +A `VideoEncoder` may throw an error during the process of encoding for a number of different reasons, such as if the user switches tabs and the browser reclaims the resources. When an error occurs, the encoder transitions permanently to the `"closed"` state. It is not possible to reconfigure a closed encoder — a new `VideoEncoder` instance must be created. The first frame encoded by the new encoder must be a key frame. + +```js +if (encoder.state === "closed") { + // Close the old encoder, instantiate and configure a new encoder +} + +encoder.encode(frame, { keyFrame: true }); +``` + +## Decoding video + +Likewise, for decoding video, you start by instantiating the {{domxref("VideoDecoder")}} with the `output` and `error` callback functions, where the `output` callback receives `VideoFrame` objects returned by the decoder. + +```js +const decoder = new VideoDecoder({ + output(frame) { + // Do something with the VideoFrame + }, + error(e) { + /** Handle the error */ + }, +}); +``` + +You then need to configure the decoder. If you are decoding a video file, a demuxing library can provide the correct decoder config (see [Muxing and Demuxing](/en-US/docs/Web/API/WebCodecs_API#muxing_and_demuxing)). If streaming video between a WebCodecs sender and receiver, the decoder config would be identical to the meta returned by the `VideoEncoder` which generated the encoded chunks. + +```js +decoder.configure(/**config */); +``` + +If you are decoding a video file, you will need a demuxing library to extract video chunks. You can then submit the chunks for decoding. Keep in mind that you should not send just one chunk for decoding and wait for the frame to be output before feeding the next chunk. Depending on the browser/device and video itself, you may need to send multiple chunks before the decoder begins returning frames, and the minimum number of chunks will depend on the device. + +```js +let chunk_index = 0; +// Process chunks in batches, not one at a time nor all at once +for (let i = 0; i < BATCH_LENGTH; i++) { + decoder.decode(chunks[chunk_index]); + chunk_index++; +} +``` + +Similar to `VideoEncoder`, `VideoDecoder` maintains a decode queue which needs to be managed. If you send thousands of chunks to the `VideoDecoder` at once, the decoder might close or fail, so your application will need to ensure that {{domxref("VideoDecoder.decodeQueueSize")}} does not grow unbounded. Like with the encoder, you can also listen for the `dequeue` event to aid in managing the decode queue. + +```js +decoder.addEventListener("dequeue", (event) => { + // Queue up more decoding work +}); +``` + +Once you are finished sending all frames for decoding, you can run `flush`. + +```js +await decoder.flush(); +``` + +Depending on the device/browser, the decoder may not return the last few `VideoFrame` objects until `flush()` is called. Once you are done using the `VideoDecoder` completely, you should call the `close()` method to free up system resources. + +```js +decoder.close(); +``` + +A `VideoDecoder` may throw an error decoding for a variety of reasons, such as corrupted or missing data in a source `EncodedVideoChunk`. When a decoder fails, it transitions permanently to the `"closed"` state and a new `VideoDecoder` instance must be created. The first chunk decoded by the new decoder must be a key frame, so it is necessary to seek forward from the current position to the next key frame before resuming. + +```js +let chunk_index = 0; + +for (let i = 0; i < BATCH_LENGTH; i++) { + // Check if decoder failed + if (decoder.state === "closed") { + // Seek forward to the next key frame from the current position + for (let j = chunk_index; j < chunks.length; j++) { + if (chunks[j].type === "key") { + chunk_index = j; + break; + } + } + // Close the old decoder, instantiate and configure a new decoder + } + decoder.decode(chunks[chunk_index]); + chunk_index++; +} +``` + +## VideoFrame + +A {{domxref("VideoFrame")}} represents a single uncompressed video frame, including its pixel data and metadata such as its timestamp. It is both returned by the `VideoDecoder` when decoding encoded video, and generated from a variety of source images. + +### Creating video frames + +A `VideoFrame` can be constructed from any image source. Keep in mind that timestamps are in microseconds. + +```js +const bitmapFrame = new VideoFrame(imgBitmap, { timestamp: 0 }); +const imageFrame = new VideoFrame(htmlImageEl, { timestamp: 0 }); +const videoFrame = new VideoFrame(htmlVideoEl, { timestamp: 0 }); +const canvasFrame = new VideoFrame(canvasEl, { timestamp: 0 }); +``` + +Constructing a `VideoFrame` from a `Canvas` is typically how you would encode video in a video editing application, where source video and images are used within a canvas context, applying effects and transformations, and the `Canvas` can both be previewed by the user and used as the image source for a `VideoFrame` to be encoded. + +You can also directly create a `VideoFrame` from binary data, such as an `ArrayBuffer`; however, you will need to specify the `format` and metadata and ensure that the data being used to construct the frame follows the specified [format](/en-US/docs/Web/API/VideoFrame/format). + +```js +const rgbaFrame = new VideoFrame(rgbaData, { + timestamp: 0, + format: "RGBA", + codedWidth: 1920, + codedHeight: 1080, +}); +``` + +`VideoFrame` objects are tied to data on graphics memory. When creating a `VideoFrame` from a `Canvas`, `Bitmap`, `Video` or `Image`, data is copied from graphics memory to graphics memory, which is relatively more efficient. + +A `VideoFrame` constructed from binary data (e.g., `ArrayBuffer` or `Uint8ClampedArray`) will incur a CPU→graphics memory copy operation, which can be a performance penalty if done repeatedly. + +Finally, `VideoFrame` objects can also be generated by decoding `EncodedVideoChunk` objects via a `VideoDecoder`, as shown in the [Decoding video](#decoding-video) section above. + +### Consuming video frames + +Decoded video can also be played back in the browser by rendering `VideoFrame` objects to a `Canvas` via any of the Canvas rendering methods. Different rendering methods have different performance characteristics, which may be relevant when running compute-intensive video processing operations. + +#### Canvas2D + +Frames can be drawn to a {{domxref("CanvasRenderingContext2D")}} using the `drawImage` method: + +```js +const canvas = new OffscreenCanvas(width, height); +const ctx = canvas.getContext("2d"); +ctx.drawImage(frame, 0, 0); +``` + +While the 2D canvas context has a simple yet flexible API, browsers use different implementations under the hood, resulting in inconsistent and generally worse performance across browsers. + +#### BitmapRenderer + +Frames can also be rendered to a canvas via the {{domxref("ImageBitmapRenderingContext")}} by creating an {{domxref("ImageBitmap")}} from the frame, and rendering it to the canvas via the `transferFromImageBitmap` method. + +```js +const canvas = new OffscreenCanvas(width, height); +const ctx = canvas.getContext("bitmaprenderer"); + +const bitmap = await createImageBitmap(frame); +ctx.transferFromImageBitmap(bitmap); +frame.close(); +``` + +This method involves making a single copy of the frame in graphics memory, resulting in more consistent and generally better performance across browsers than the Canvas2D API while also being relatively simple. + +#### WebGPU + +The most efficient way to render a `VideoFrame` to a canvas is via the [importExternalTexture](/en-US/docs/Web/API/GPUDevice/importExternalTexture) method in WebGPU. + +```js +const externalTexture = device.importExternalTexture({ source: frame }); +``` + +`importExternalTexture` is efficient as it incurs a zero-copy operation, using the exact same `VideoFrame` object in memory within a WebGPU pipeline. It is the most performant method for rendering a `VideoFrame`, but also the most complex to set up. + +### Memory + +Because `VideoFrame` objects can consume significant GPU memory, and video processing involves manipulating many frames per second, extra care should be taken to manage memory and avoid memory leaks in order to avoid application crashes. + +First and foremost, frames must be explicitly released when no longer needed. + +```js +frame.close(); +``` + +When encoding, you can close the frame as soon as you send it for encoding. + +```js +encoder.encode(frame, { keyFrame: true }); +frame.close(); +``` + +You should also close the frames right after rendering. + +```js +ctx.drawImage(frame, 0, 0); +frame.close(); +``` + +When transferring a `VideoFrame` between threads (e.g., a worker), it should be transferred as a [transferable object](/en-US/docs/Web/API/Web_Workers_API/Transferable_objects). + +```js +worker.postMessage(frame, [frame]); +``` + +## Audio + +WebCodecs supports encoding and decoding audio via {{domxref("AudioEncoder")}} and {{domxref("AudioDecoder")}}, using the Opus and AAC codecs. Before working with audio, there are a few important caveats to be aware of: + +- **Pass-through**: If you are transcoding video and do not need to modify the audio, you do not need to decode and re-encode the audio at all. `EncodedAudioChunk` objects can be passed directly from a demuxing library to a muxing library, which is significantly more efficient. +- **Playback**: The WebCodecs API has no built-in audio playback. For playback, use the [Web Audio API](/en-US/docs/Web/API/Web_Audio_API). +- **Format support**: WebCodecs only supports encoding Opus and AAC. For MP3 or other formats, a third-party library is required. + +### Playback + +There is no direct bridge between WebCodecs and the Web Audio API. {{domxref("AudioData")}} objects cannot be passed directly to the Web Audio API, which uses {{domxref("AudioBuffer")}} to represent raw audio. + +The recommended approach for playback is to mux `EncodedAudioChunk` objects into an in-memory buffer using a muxing library, then decode that buffer via {{domxref("BaseAudioContext/decodeAudioData", "AudioContext.decodeAudioData()")}}: + +```js +// mux encoded chunks to an ArrayBuffer using a muxing library +const buffer = await muxAudioToBuffer(encodedChunks); +const audioBuffer = await audioContext.decodeAudioData(buffer); +const source = audioContext.createBufferSource(); +source.buffer = audioBuffer; +source.connect(audioContext.destination); +source.start(); +``` + +Alternatively, you can extract raw samples from `AudioData` via `copyTo()` and construct an `AudioBuffer` manually, but this requires a CPU-side data copy for each chunk and is slower. + +### Encoding + +Audio encoding is simpler than video encoding — there are no key frames, no hardware acceleration concerns, and each `AudioData` produces exactly one `EncodedAudioChunk`. The encoder can be treated as a straightforward async pipeline. + +```js +const encoder = new AudioEncoder({ + output(chunk) { + // send to muxer + }, + error(e) { + console.error(e); + }, +}); + +encoder.configure({ + codec: "opus", + sampleRate: 48000, + numberOfChannels: 2, +}); + +for (const audioData of rawAudio) { + encoder.encode(audioData); + audioData.close(); +} + +await encoder.flush(); +``` + +See the [Codec selection](/en-US/docs/Web/API/WebCodecs_API/Codec_selection#audio-codecs) guide for guidance on choosing between Opus and AAC. + +### Decoding + +Audio decoding follows the same pattern as encoding. The decoder configuration is typically provided by the demuxing library rather than chosen by the developer. + +```js +const decoder = new AudioDecoder({ + output(audioData) { + // process AudioData + audioData.close(); + }, + error(e) { + console.error(e); + }, +}); + +// config comes from demuxer library +decoder.configure(decoderConfig); + +for (const chunk of encodedChunks) { + decoder.decode(chunk); +} + +await decoder.flush(); +``` + +### AudioData + +An {{domxref("AudioData")}} object represents a segment of raw audio, typically covering 0.2–0.5 seconds. Raw samples are extracted as `Float32Array` data using the {{domxref("AudioData.copyTo()")}} method. The extraction pattern depends on the `format` property of the `AudioData` object. + +The most common format is `f32-planar`, where each channel is stored in a separate plane. Use `planeIndex` to copy each channel independently: + +```js +// f32-planar: each channel stored separately +const leftChannel = new Float32Array(audioData.numberOfFrames); +audioData.copyTo(leftChannel, { planeIndex: 0 }); + +const rightChannel = new Float32Array(audioData.numberOfFrames); +audioData.copyTo(rightChannel, { planeIndex: 1 }); +``` + +The less common `f32` format stores all channels interleaved in a single array (`[L, R, L, R, ...]`). In this case, copy the full interleaved buffer and de-interleave manually: + +```js +// f32: channels interleaved in a single array +const interleaved = new Float32Array( + audioData.numberOfFrames * audioData.numberOfChannels, +); +audioData.copyTo(interleaved, { planeIndex: 0 }); + +const leftChannel = new Float32Array(audioData.numberOfFrames); +const rightChannel = new Float32Array(audioData.numberOfFrames); + +for (let i = 0; i < audioData.numberOfFrames; i++) { + leftChannel[i] = interleaved[i * 2]; + rightChannel[i] = interleaved[i * 2 + 1]; +} +``` + +To handle both formats: + +```js +if (audioData.format.includes("planar")) { + // f32-planar: copy each channel by planeIndex +} else { + // f32: copy interleaved, then de-interleave +} +``` + +To construct an `AudioData` from raw samples, the data for all channels must be concatenated into a single `Float32Array` with each channel's samples placed sequentially (matching `f32-planar` layout), and the `numberOfFrames` set to the number of samples per channel: + +```js +const framesPerChunk = 1024; +const data = new Float32Array(framesPerChunk * 2); // 2 channels +data.set(leftChannel, 0); +data.set(rightChannel, framesPerChunk); + +const audioData = new AudioData({ + format: "f32-planar", + sampleRate: 48000, + numberOfFrames: framesPerChunk, + numberOfChannels: 2, + timestamp: sourceAudioData.timestamp, + data, +}); +``` + +Note that certain AAC codec strings (`mp4a.40.5`, `mp4a.40.05` and `mp4a.40.29`) correspond to configurations that use a technique called Spectral Band Replication (SBR), which causes the decoder to output audio at double the sample rate specified in the decoder configuration. Always read `audioData.sampleRate` directly rather than assuming it matches the configured value. + +Like `VideoFrame`, `AudioData` objects must be explicitly closed to free memory: + +```js +audioData.close(); +``` + +While `AudioData` requires much less memory than a `VideoFrame`, raw audio still has a significant memory footprint — an hour of stereo audio at 48kHz is approximately 1.4 GB. For large files, audio should be decoded and processed in batches rather than all at once. + +## See also + +- [Video processing concepts](/en-US/docs/Web/API/WebCodecs_API/Video_processing_concepts) +- [Codec selection](/en-US/docs/Web/API/WebCodecs_API/Codec_selection) +- {{domxref("VideoEncoder")}} +- {{domxref("VideoDecoder")}} +- {{domxref("AudioEncoder")}} +- {{domxref("AudioDecoder")}} +- {{domxref("VideoFrame")}} +- {{domxref("AudioData")}} +- {{domxref("EncodedVideoChunk")}} +- {{domxref("EncodedAudioChunk")}} diff --git a/files/en-us/web/api/webcodecs_api/video-encoder-decoder.png b/files/en-us/web/api/webcodecs_api/video-encoder-decoder.png new file mode 100644 index 000000000000000..65b2a1d33beff55 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video-encoder-decoder.png differ diff --git a/files/en-us/web/api/webcodecs_api/video-frame.png b/files/en-us/web/api/webcodecs_api/video-frame.png new file mode 100644 index 000000000000000..fb0f3fcf9eaae11 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video-frame.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/bitrate-ladder.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/bitrate-ladder.png new file mode 100644 index 000000000000000..7862c29ab45fded Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/bitrate-ladder.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/containers.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/containers.png new file mode 100644 index 000000000000000..c388c9860a388a4 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/containers.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/dct.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/dct.png new file mode 100644 index 000000000000000..11e37a4ea620ea2 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/dct.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/decoder-demuxer.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/decoder-demuxer.png new file mode 100644 index 000000000000000..ffb4fbc3ffc29b1 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/decoder-demuxer.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/frame-diff.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/frame-diff.png new file mode 100644 index 000000000000000..3590682e015c760 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/frame-diff.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/index.md b/files/en-us/web/api/webcodecs_api/video_processing_concepts/index.md new file mode 100644 index 000000000000000..8f9a51255c8964c --- /dev/null +++ b/files/en-us/web/api/webcodecs_api/video_processing_concepts/index.md @@ -0,0 +1,115 @@ +--- +title: Video processing concepts +slug: Web/API/WebCodecs_API/Video_processing_concepts +page-type: guide +--- + +{{DefaultAPISidebar("WebCodecs API")}} + +Before working with the WebCodecs API, it is helpful to understand some foundational concepts around how video works, how it is compressed, and how video files are structured. +This guide covers the key concepts: video frames, codecs, encoding and decoding, containers, and muxing and demuxing. + +## Video frames + +A video is a sequence of images displayed in rapid succession. Each image in the sequence is called a **video frame**, and each frame has an associated timestamp indicating when it should be displayed. + +![Video Frames](video-frames.png) + +Each pixel in a video frame is represented by a set of numeric color channel values. Uncompressed, a single 4K frame (~8 million pixels) is approximately 25 MB. At 30 frames per second, one hour of uncompressed 4K video would be over 3 TB, which is impractically large for storage or streaming. + +Codecs were developed in order to compress video, typically by 1-2 orders of magnitude, to be able to practically store and stream video content given typical device network and storage constraints. + +## Codecs + +A **codec** (short for encode/decode) is an algorithm for compressing and decompressing video data. Codecs reduce file size dramatically — typically by a factor of 100 or more through a variety of different techniques. While there are a number of video codecs used within the browser, such as `vp9`, `av1` and `h264`, they all apply some form of the following techniques: + +### Spatial compression + +Codecs selectively remove high-frequency detail from each frame — fine textures and sharp edges that are less perceptible to the human eye. + +![Throwing away high detail information](dct.png) + +The amount of detail removed is controlled by two things: the **bitrate**, which determines how much data the output stream uses, and the **codec string**, which specifies the profile and level that govern the encoding logic. Higher bitrates and more capable profiles preserve more detail at the cost of larger file sizes. The following shows the tradeoff between quality and bitrate, using baseline `vp9` on a 1080p video: + +![Bitrate ladder](bitrate-ladder.png) + +### Temporal compression + +Successive frames in a video are typically visually similar to one another. Instead of encoding each video frame as an independent image, video codecs calculate the difference between frames, and encode just the frame differences in a compact binary representation. Codecs typically use a number of techniques such as motion compensation to reduce the amount of data required to encode frame differences. + +![Frame differences](frame-diff.png) +Codecs will then store the first video frame in a sequence as a key frame, and then store subsequent frames as just frame differences (called delta frames). + +![Key frames vs delta frames](key-frames.png) + +Videos are typically encoded with key frames at regular intervals. To construct the full current frame for display of a given delta frame, you have to decode the previous key frame and all the subsequent delta frames (in order) up to the current delta frame. +In WebCodecs, the `EncodedVideoChunk` interface has a `type` property, which can take the value `"key"` or `"delta"` to indicate whether or not the chunk represents a key frame or a delta frame. + +Because delta frames depend on all previous frames since the last key frame, a decoder cannot start decoding from an arbitrary point in a video — it must always start from a key frame. This has two practical implications: **seeking** to a specific timestamp requires finding the nearest preceding key frame and decoding every frame in order up to the target, and **error recovery** requires skipping forward to the next key frame before resuming decoding. + +When encoding with a `VideoEncoder`, it is possible to determine when to set a frame as a key frame or a delta frame by using the `keyFrame` parameter in the encoder method. + +```js + encoder.encode(frame, { keyFrame: /* */ }) +``` + +## Encoding and decoding + +### Codec compatibility + +For codecs to be useful, you have to be able to both encode video (turn raw video into compressed binary data) with a codec, and to be able to decode the same video (turn the compressed binary data back into raw video frames) with the same codec. The video industry has therefore coalesced around a handful of standard codecs such as `vp9`, `h264`, `hevc` and `av1`. + +Applications that primarily create video content (e.g., video editing tools), and therefore primarily encode video, typically choose a video codec for encoding in order to maximize compatibility with video player software. + +Applications that primarily consume video content (e.g., video player software) and therefore primarily decode video will typically try to support as many codecs as possible. + +Applications that control both encoding and decoding (e.g., a video streaming website) have much more flexibility on codec choice, and can therefore choose codecs based on factors such as cost and encoding speed. + +### Encoding is expensive + +Encoding is significantly more computationally expensive than decoding, typically by 1-2 orders of magnitude. Video conferencing applications will often use older codecs such as `vp8` because, although it results in lower quality video for the same bitrate, it is also less computationally expensive than newer codecs like `vp9`. + +### Hardware acceleration + +Most consumer devices include specialized hardware specifically designed to encode and decode video. Leveraging these specialized chips for encoding and decoding is called hardware acceleration, and can speed up encoding tasks by 2 orders of magnitude compared to standard CPU-based encoding. + +H.264 and H.265 encoding are most commonly hardware accelerated, while hardware-accelerated encoding of VP9 and AV1 is less common. Hardware-accelerated decoding is broadly available for all major codecs, though AV1 decode acceleration is still more limited given its relative newness. + +One of the key advantages of the WebCodecs API is the ability to use hardware accelerated encoding, making applications like video editing and high-performance streaming practical on consumer devices. + +## Containers + +Codecs only deal with encoding raw media data into a binary compressed form and vice-versa. A video file, such as a WebM, MP4 or MKV file, contains both metadata such as track information, duration etc., and encoded media data. + +![Containers](containers.png) + +Each type of video file has its own container spec, such as the [WebM spec](https://www.w3.org/TR/mse-byte-stream-format-webm/) and the [MP4 Spec](https://github.com/alfg/quick-dive-into-mp4), which specifies how metadata and encoded media should be formatted and stored within the file stream. + +A given container format can actually support a variety of different codecs. Here are the most common containers and the codecs they support: + +| Container | Video codecs | Audio codecs | +| ------------- | ---------------------- | -------------------- | +| MP4 (.mp4) | H.264, H.265, AV1 | AAC, MP3, Opus | +| WebM (.webm) | VP8, VP9, AV1 | Vorbis, Opus | +| MKV (.mkv) | H.264, H.265, VP9, AV1 | AAC, MP3, Opus, FLAC | +| MPEG-TS (.ts) | H.264, H.265 | AAC, MP3 | +| OGG (.ogg) | Theora | Vorbis, Opus | + +A video player needs to both follow the container spec to extract metadata and encoded chunks (called demuxing), and decode the encoded video/audio in order to play the video file. + +While the {{domxref("HTMLVideoElement")}} handles both demuxing and decoding, and primarily supports MP4 and WebM formats, the WebCodecs API does not deal with container formats. + +To play a video with WebCodecs, it is necessary to both demux the file (typically using a demuxing library) and then decode the encoded chunks. + +![Demuxing](decoder-demuxer.png) + +Likewise, to write a video file with WebCodecs it is necessary to also follow the container spec, writing metadata and placing the encoded chunks at the correct position in the output file stream. This is called muxing, and is not handled natively by the WebCodecs API, instead requiring a third-party library like [Mediabunny](https://mediabunny.dev/). + +See the [Muxing and Demuxing](/en-US/docs/Web/API/WebCodecs_API#muxing_and_demuxing) section on the WebCodecs API overview page for library options for demuxing and muxing. + +## See also + +- [Video Codec Guide](/en-US/docs/Web/Media/Guides/Formats/Video_codecs) +- [WebCodecs API](/en-US/docs/Web/API/WebCodecs_API) +- [Using the WebCodecs API](/en-US/docs/Web/API/WebCodecs_API/Using_the_WebCodecs_API) +- [Codec selection](/en-US/docs/Web/API/WebCodecs_API/Codec_selection) diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/key-frames.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/key-frames.png new file mode 100644 index 000000000000000..5856b7e47f66438 Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/key-frames.png differ diff --git a/files/en-us/web/api/webcodecs_api/video_processing_concepts/video-frames.png b/files/en-us/web/api/webcodecs_api/video_processing_concepts/video-frames.png new file mode 100644 index 000000000000000..cc86599b1bbe78d Binary files /dev/null and b/files/en-us/web/api/webcodecs_api/video_processing_concepts/video-frames.png differ diff --git a/files/jsondata/GroupData.json b/files/jsondata/GroupData.json index c5fef3b50c16471..1c33e220e15c427 100644 --- a/files/jsondata/GroupData.json +++ b/files/jsondata/GroupData.json @@ -2088,6 +2088,11 @@ }, "WebCodecs API": { "overview": ["WebCodecs API"], + "guides": [ + "/docs/Web/API/WebCodecs_API/Video_processing_concepts", + "/docs/Web/API/WebCodecs_API/Using_the_WebCodecs_API", + "/docs/Web/API/WebCodecs_API/Codec_selection" + ], "interfaces": [ "AudioData", "AudioDecoder",