|
"Transcription and Translation of Video Lectures" (transLectures) is a Seventh Framework Programme (FP7) project aimed to develop new, automatic and innovative methods for translations and transcriptions of video lectures. Background Current state of the art in ASR and Machine translation is dominated by statistical decision systems combining adequate probabilistic models learnt from training data. Given an acoustic signal, ASR systems decide its most likely transcription by combining an acoustic model and a language model. Similarly, given a text sentence to be translated from a source language into a target language, statistical MT (SMT) systems decide its most likely translation by combining language and translation models. Acoustic, language and translation models are usually implemented in terms of hidden Markov model (HMMs), n-grams and phrase tables, respectively. They are learnt by maximum likelihood estimation, though discriminative training is being increasingly used. On the other hand, system comparison is often carried out on the basis of automatic assessment metrics, such as WER for ASR systems, and BLEU or TER for MT systems. Translation of video lectures can be also seen as a task of subtitling, though in this case it is a more difficult task of interlingual subtitling. As with captions (intralingual subtitles), YouTube video owners can upload subtitles in different languages. Other users can view videos with owner-uploaded subtitles, or they can request real-time auto-translation by using the well-known SMT-based Google Translate service. It goes without saying that translation of unusable captions produces also unusable interlingual subtitles. Approach Intelligent interaction for transcription the project explores different user interaction models for transcription, from simple models requiring minimal user supervision, to manual-intensive schemes aimed at supervising full transcriptions. A simple model may just consist in eliciting quality marks for transcription segments, e.g. for interaction with a student or a casual user. In contrast, the authors of the lectures may be willing to fully supervise their own lectures by following a monotone, prefix-based interaction scheme, in which every word supervision means that the corresponding prefix is correct as it is, and that its suffix has to be updated in light of this information. Possible interaction models between these two options could let the system play an active role in deciding in which order non-validated parts are supervised. The idea is to develop active learning techniques to implement these interaction models, on the basis of confidence measures computed from word graphs. The project also tries to explore different user interaction models for translation. These models are to support two kinds of users: from students or casual users who are only consumers of the translations to users who are willing to help produce high-quality translations of the lectures. Partners Three of the partners (UPVLC, RWTH and EML) have large research experience in automatic speech recognition, and three of them (UPVLC, XRCE and RWTH) have large research experience in machine translation. Two of the partners (JSI and UPVLC) have also the adequate videolecture infrastructure to apply the developments of transLectures. The transcription experience will be provided by the partner DDS. *Universitat Politècnica de València *Xerox *Institut Jozef Stefan *Knowledge for All Foundation Ltd. *RWTH *European Media Laboratory GMBH *Deluxe Digital Studios Ltd
|
|
|