Research in EU-BRIDGE

Research in EU-BRIDGE is directed at advancing the state-of-the art in speech translation in order to create a service infrastructure that allows the easy creation of applications that make use of speech translation.

The goal of EU-BRIDGE is the development of a speech translation service infrastructure upon which several use-cases will be built. The services will not be one-size fits all solution for generic multilingual needs, but will be explored within the specific markets and businesses of the four use cases of this project:

Captioning and Translation of subtitles for TV programs
Simultaneous Translation of academic lectures
Speech translation services for the European Parliament
Unified Communication Translation.

In order to achieve this goal, research in EU-BRIDGE has four main objectives:

Develop better state-of-the art speech and MT capabilities in view of new and more challenging business use cases
Improve language portability and apply the technology to languages of interest to Europe
Reduce the dependency on data
Explore/direct/facilitate rapid market insertion and deployment

Performance: We will advance spoken language technologies so they process and transmit human information content from one language to another, in situations that could so far not be handled by automatic techniques. This includes specialized but varied topics (lectures, seminars, presentations), highly disfluent, conversational, accented and noisy speech (meetings, telephone calls). We will perform research in the areas of robustness, rapid adaptation in speech and translation, semantic modeling, content summarization. We will also develop personalization schemes that adapt systems to individual users and groups of users for more specific and targeted high performance operation that will address business needs better than a web-based one-size fits all.

Language Portability for Europe: Provide speech and translation capability for languages of main interest to Europe. Building on key efforts such as Euromatrix, Gale, TC-STAR, Quaero, and others, our team of partners is uniquely positioned and motivated to build one of the largest combined repertoires of languages available both in speech recognition and translation, and will stretch to do so robustly for all communication channels. We will include core European languages, under-resourced European languages, and reach out to languages of the BRIC economies. We will achieve this not just by a gargantuan engineering exercise, but by focused research efforts to improve portability itself. These efforts will lower the cost of moving capabilities effectively from one language to another.

Reduce the dependency and cost of data: If data is the “crude oil” of information processing, then solutions must make production cheaper and reduce our dependencies on it. First by making speech and MT components adaptive and language and style independent and by streamlining the process, we will significantly reduce data needs. By involving the users themselves in correcting and building the systems implicitly, i.e., by crowd-sourcing, the cost of data acquisition and thus building and improving the systems will be reduced. By taking better advantage of available but not well prepared data, the cost of data preparation can be reduced and the effective usable data increased. This includes comparable data, mono-lingual data, spoken and textual data, noisy data, and automatic methods for judging the quality of the data.

Rapid technology transition and market insertion: Our program will strive to transition research, development into commercial deployment more rapidly. This will be done by building distributed services instead of transferring software, and by making deployment part of the project. The systems will be applied to real-world data and we will carry out pilot experiments around four business opportunities.