- **Want to support only one format?** WebVTT is used by Apple Podcasts for ingest, and also natively supported by web browsers. Because the WebVTT format is the most flexible, it's an ideal choice if you can only support one format.
The examples given below are just for convenience. In production you should ensure you are conforming to the actual spec for each format as defined in its own documentation.
The [Web Video Text Tracks Format (WebVTT)](https://www.w3.org/TR/webvtt1/) is designed for use in HTML on the web. You can use the [<track> element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/track) in your own web-based players to make closed-captions appear on a web-page.
A VTT file contains medium-fidelity timestamps. It differs from the SRT format (below) because you can optionally add speaker names, including them in a voice span tag `<v>` at the beginning of each caption when they change, as in the snippet below. Apple Podcasts supports these speaker names, and will ingest them into its transcript tool.
While there is no defined maximum line-length, to ensure that displaying WebVTT as closed-captions can work well, a maximum line-length of 65 characters is recommended. If you're using whisper-cpp or equivalent, `--split-on-word --max-len 65` would be a method of achieving this.
The full specification includes formatting features; these are typically not used in podcasting applications.
This example code will add an audio player on a web page, and display the accompanying WebVTT file as the audio plays. (Note that this basic code will not show speaker names).
The SRT format was designed for video captions but provides a suitable solution for podcast transcripts. The SRT format contains medium-fidelity timestamps and are a
The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. At the most precise, it enables word-by-word highlighting. This format for podcast transcripts should adhere to the following specifications.
The HTML transcript format provides a solution when a transcript is available but no or limited timecode data is available. HTML transcript files are considered low-fidelity and are designed to serve as an accessibility aid and provide searchable episode content. The HTML format used for podcast transcripts should adhere to the following specifications.