Automatic Subtitle Generator

AUTOMATIC SUBTITLE GENERATOR

By

Lohith Kumar MenchuManikanta ThumuRavinder Putta

ABSTRACT

Video has become one of the most popular multimedia artefacts used on PC’s and internet.

In a majority of cases within a video, the sound holds an important place.

For people with gaps in spoken language and auditory problems the most natural way lies in the use of subtitles.

ABSTRACT

Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people.

Here, we confine our research to the videos which has single speaker.

If we try to employ SR technology in conversations or meetings where people frequently interrupt each other, we’re likely to get extremely poor results.

PROJECT DESCRIPTION

The current thesis work principally tends to answer out problematic by presenting a potential system.

Three distinct modules have been defined, namely audio extraction, speech recognition, and subtitle generation(with time synchronization).

PROJECT DESCRIPTION

The system should take a video file as input and generate a subtitle file as output.

This extracted subtitles must also be synchronized with the video content.

Speaker independent model presents an accuracy greater than 90% with peaks reaching 98% under optimal conditions (quiet room, high quality microphone).

EXISTING SYSTEM

In the existing system whether it is a single speaker or multi speaker media the subtitles are generated manually by some linguistic.

However, manual subtitle creation is a long and boring activity and requires the presence of the user.

Moreover, the user need to know the language of video content in order to generate subtitles.

EXISTING SYSTEM

In present scenario we cannot generate subtitles for all languages.

Software generating subtitles without intervention of individual using speech recognition has not been developed.

PROPOSED SYSTEM

In the proposed system SR technology allows a computer to handle sound input through either a microphone or media file in order to be transcribed or used to interact with the machine.

This analog form of a signal is converted into digital format and then divided into small segments which are then matched with known phonemes in appropriate language.

PROPOSED SYSTEM

Speech recognition can be used to handle either a unique speaker or an infinite number of speakers.

The first case which is our area of interest , presents an accuracy greater than 90% with peaks reaching 98% under optimal conditions.

Various models are under construction but modern SR engines are based on the Hidden Markov Models.

HOW SPEECH RECOGNITION WORKS ??

Rule Based

Early speech recognition systems tried to apply a set of grammatical and syntactical rules to speech.

If the words spoken fit into a certain set of rules, the program could determine what the words were.

Accents, dialects and mannerisms can vastly change the way certain words or phrases are spoken, so this model has limited usage.


Statistical-Modelling Approach

We basically have a model that has three fundamental components to it that model different aspects of the speech signal.

Acoustic Model

Lexicon

Language Model

ACOUSTIC MODEL

Acoustic models require engineers to collect all the sounds made by speakers of a particular language.

We differentiate two acoustic models: Speaker Dependent. Speaker Independent.

PHONEMES

LEXICON

The next part of the model is called the lexicon, the dictionary. And what that is, is a definition for all of the words in the language of how they get pronounced.

LANGUAGE MODEL

The third piece of the model is the model of how we put words together into phrases and sentences in the language.

So for example, that model might learn that if the recognizer thinks it just recognized "the dog" and now it's trying to figure out what the next word is, it may know that "ran" is more likely than "pan" or "can" as the next word just because of what we know about the usage of language in English.


MODULES DESCRIPTION

Our scenario Automatic Subtitle Generator contains three important modules.

Audio Extraction

Speech Recognition

Subtitle Generation

AUDIO EXTRACTION

The audio extraction routine is expected to return a suitable audio format that can be used by the speech recognition module as pertinent material.

To facilitate the extraction of audio we use Java Media Framework API features. This API provides many interesting features for dealing with media objects.

SPEECH RECOGNITION

The speech recognition routine is the key part of the system. Indeed, it affects directly performance and results evaluation.

First, it must get the type (film, music, information, home-made, etc...) of the input file as often as possible. Then, if the type is provided, an appropriate processing method is chosen.

SUBTITLE GENERATION

The subtitle generation routine aims to create and write in a file in order to add multiple chunks of text corresponding to utterances limited by silences and their respective start and end times.

The module is expected to get a list of words and their respective speech time from the speech recognition module and then to produce a SRT subtitle file.

WORKING

CONCLUSION

In a cyber world where the accessibility remains insufficient, it is essential to give each individual the right to understand any media content.

During the last years, the internet has known a multiplication of websites based on videos of which most are from amateurs and of which transcripts are rarely available.

CONCLUSION

This thesis work was mostly orientated on video media and suggested a way to produce transcript of audio from video for the ultimate purpose of making content comprehensible by deaf persons.

Although the current system does not present enough stability to be widely used, it proposes one interesting way that can certainly be improved.

REFERENCES

Tutorial : Getting started with the java media framework. URL http://www.ee.iitm.ac.in/~tgvenky/JMFBook/Tutorial.pdf

How Stuff Works : http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm

Engineered Station. How speech recognition works. 2001. URL http://project.uet.itgo.com/speech.htm

http://www.ee.iitm.ac.in/~tgvenky/JMFBook/Tutorial.pdf

http://www.ee.iitm.ac.in/~tgvenky/JMFBook/Tutorial.pdf

http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm

http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm

http://project.uet.itgo.com/speech.htm

THANK YOU!

Documents

Automatic Subtitle Generator