Subtitled Content is on the Rise
According to a report by the European Federation of Hard of Hearing People, most countries are improving in their provision of subtitles on Television. The rise of “On Demand” services such as Netflix, Hulu, and Disney+ has also led to an increase in subtitled content.
Services such as BBC iPlayer and Netflix do a good job of providing subtitles. However, progress could be made in the number of languages in which such captioning is provided.
While progress has been made in the amount of content being subtitled, some still believe that subtitling is an expensive, slow, and laborious process. However, subtitles cost very little compared to the overall production costs of programming.
What’s more, Automatic Speech Recognition (ASR) technology has improved dramatically over the years, making it much easier to create accurate subtitles. Nowadays, voice services can understand accents better and make more informed decisions to accurately distinguish between similar sounding words and phrases.
On the other hand, speech recognition software still has a lot of room for improvement.
While it has certainly grown by leaps and bounds over the decades, reliably recognizing speech in real-world acoustic environments is still a challenge for most systems. Also, most ASR technologies require well-trained language models as well as input from experts to keep the Word Error Rate (WER) to a minimum.
ASR needs Human Intervention
Accuracy is a vital component of subtitling as it’s key to ensuring that individuals who rely on captions get an accurate representation of the original spoken content. Word errors often result from less ideal conditions including poor audio quality, background noises, overlapping speech, and multiple speakers.
Formatting errors are also common with ASR technologies. Some of these include incorrect punctuation, misleading speaker labels, bad grammar, and a lack of inaudible tags and other notations.
According to a report on the state of AI driven subtitling, demand for subtitling solutions grew rapidly in 2020. The study revealed that several advancements in ASR technologies, as well as training, have been made in recent years. However, the report concluded that current ASR technologies are still inadequate for use as standalone systems without human input.
Thus, it’s essential that media production companies work closely with experts to consistently improve the language model and to reduce word and formatting errors. This will result in less time needed to review transcriptions, lower correction costs, and ultimately, lower subtitling costs.
Experts can Help Lower AI-Driven Subtitling Costs
To reduce word error rates, specific custom dictionaries can be used to help the Automatic Speech Recognition system recognize specific words. For example, a talk show about Formula 1 racing may contain the names of F1 drivers such as Hamilton or Verstappen as well as specific terms like DRS (Drag Reduction System) or Tankslapper.
Since these terms aren’t frequently used in regular day-to-day speech, an ASR system might fail to correctly transcribe them, leading to an increased word error rate. However, if these terms are uploaded into the ASR system beforehand, they’re more likely to be recognized. This will lead to fewer errors and reduce the need for a human operator to ensure accuracy, making AI-driven subtitling even cheaper.
Experts are also needed to ensure proper speaker diarization. Speaker diarization is the process of splitting an audio stream into different segments according to the identity of the speakers. Some ASR systems do not support this process and only produce large blocks of text, thus requiring a human operator to manually perform the diarization.
However, the best ASR systems have perfected this process and can accurately indicate the switches between speakers in an audio stream. As a result, the work of a human operator is greatly reduced making the process much faster and cheaper. Automatic speech diarization also makes it easier to sync subtitles with speech.
When companies work with experts to improve the language model, implement custom dictionaries, and provide human operators with a custom-designed user interface that incorporates lessons learned from studying human operator workflows, the cost of subtitles per program is reduced even further.
Dronyc.nl helps content distributors with AI driven subtitling and translations.