Transcripts and Captions for Digital Video

Joshua MorinAccess, Diversity, Equity, Accessibility, and Inclusion, Featured1 Comment

A screenshot of a video with closed captions

Here at the Alliance we are committed to finding ways to improve the accessibility and reach of our content (hey it’s in our strategic plan!). One place of particular interest for us has been video. We had previously settled for the built-in accessibility features in YouTube, but decided to dive in and experiment with ways to improve access using other tools.

Why Transcripts and Captions Matter

Transcripts and captions are similar but differ on purpose and context. A transcript is the text version of a multimedia file and is not necessarily attached to the video itself—you could read the transcript without playing the video file. Captions are built into the video file or player and show while the video is playing to provide a synced text version of the audio. There are two types of captions, open and closed. Open captions cannot be turned off while closed can be toggled on/off. Closed captions are the most common for digital video.

Video transcripts and captions are vital for your video content to be accessible. WebAim, a non-profit organization based at the Center for Persons with Disabilities at Utah State University, explains:

In order to be fully accessible to the maximum number of users, web multimedia should include both synchronized captions AND a descriptive transcript.

The need to caption video is pretty straightforward—it allows anyone who cannot hear the audio to follow along by reading synced captions. YouTube has the ability to do auto-captioning but a quick Google search uncovers the limitations of YouTube’s ASR (auto speech recognition) technologies. In order to improve accessibility we needed to find a way to create and upload our own captioning file.

Providing a transcript, on the other hand, is a bit more opened ended in that it also offers access to people without bandwidth for video playback (from WebAim):

Transcripts also provide an important part of making web multimedia content accessible. Transcripts allow anyone that cannot access content from web audio or video to read a text transcript instead.

In addition to helping with accessibility, transcripts extend the reach by improving Search Engine Optimization (SEO) of the video since the search engines will index both the video and all the text included in the transcript. On top of that you can link from the transcript and add notes and references which also help to improve the SEO.

Experimenting in the lab.

Our experiment

For this experiment we selected two videos that were released with the Building Cultural Audiences website.

The first step was to select our transcription and captioning services. Based on a Google search for “video transcriptions,” we identified five services. Needless to say these are not the only services out there but I believe we selected are a good representation of what is commonly available*.

*After the experiment ended, Katie Gilligan commented on Rob’s Accessibility blog post about Cielo24. I can’t wait to check them out for our next round of videos.

We decided to compare four services:

  1. Scribie –
  2. Speechpad –
  3. REV –
  4. Castingwords –

The requirements were pretty simple, we wanted the ability to upload an MP4 video file and receive files  in useable format with embedded timecode.

What We Learned

In terms of comparing services, all four services met our basic requirements with very similar turn-around times and costs. In order to see the differences in the text that was provided by each vendor, I ran a couple online text diff (short for difference) checks on the transcript files. The differences centered on punctuation and how numbers display, some wrote out the numeric value and others spelled out the word.

Two pictures, side by side, showing the differences in the text

Each service dealt with transcription timecode differently. For us this became the determining factor for why we  settling on Scribie.  Its transcripts looked the best and required the least amount of styling (you can see the transcript below). Most, if not all, the companies will provide sample transcript files that should allow you to have an idea of what you’ll get back.


One thing that we learned is that each service processes transcripts with timecode and captions differently and you need orders for each. While at the beginning I understood that captions and transcripts have different purposes, I made the assumption that one process could yield both files (they are both just written text with time code, right?) While it is indeed possible to manipulate one to create the other it isn’t feasible for us. There are walk-throughs on how to manually create captions and tools to create them it’s a lot of work! Some of the services offer captioning in a “transcript format” (essentially just a text file version) that can be manipulated with any text editor.

We ended up using a different service,, for the captioning. They offer a YouTube integration and once you allow access you can quickly choose any number of videos to be captioned and they will even deliver the captions directly into Youtube so you don’t need to upload once they are done. You also have the ability to download the .SRT file as well.

Next Steps (The Fun Stuff!)

Now that we have captions and transcripts in place we are pleased to meet the standards when it comes to accessibility while also improving the SEO of the video. The next steps will be to enhance the transcript with links and references and to add functionality. We are currently experimenting with YouTube api to make the timecode of the transcript clickable so it jumps to that time of the video. Here is what we are working on (also available directly on JSFiddle).

What is your museum doing with video? What tools and resources are you using make your digital video accessible?


One Comment on “Transcripts and Captions for Digital Video”

  1. The Foundation of the American Institute for Conservation recently added closed captioning to all the webinar and course videos from Connecting to Collections Care ( Over 120 videos on our YouTube channel C2CC playlist will have captions by the end of November. Not only do captions help those with hearing issues, but can help clarify unusual terms or passages in which presenters may not have articulated clearly. Now that I read your post, we will look at transcripts as well!

Leave a Reply