The increasing availability of knowledge bases, generated by researchers within academia and industry, shows a growing interest in extending NLP applications that make use of the vast amount of background knowledge available in open formats. However, most of the information that we find in knowledge bases, like knowledge graphs, ontologies, taxonomies, thesauri, dictionaries or terminological datasets, have descriptions in natural language, i.e. labels or terms, represented in mostly one language, often English, which severely limits their exploitation. Consequently, applications that use these resources are therefore limited to the language in which the labels or terms are stored. To take advantage of the semantic knowledge captured in those resources and to make it accessible beyond language borders, we need to translate this knowledge into different languages. Since the multilingual enhancement of knowledge bases is a very time consuming and expensive process, machine translation (MT) can be applied to automatically translate the terminological expressions, but not without its challenges. Usually, domain ontologies, terminologies or dictionaries contain highly specialized information of a certain area of knowledge, which makes the translation task with machine translation rather challenging, due to the specific information, which appears infrequent in the required training data for MT.
On the other hand, growing attention has also been paid to the integration of domain-specific knowledge into MT systems or CAT tools in the context of document translation. An important open issue for this task is how to support translators with domain-specific information when dealing with highly specific texts, i.e. manuals coming from different domains (information technology, medical, law, etc.). MT systems such as Google Translate or open source MT systems trained on generic data are the most common solutions, but they often result in unsatisfactory translations of domain-specific vocabulary. To reduce the post-editing effort involved in the translation process, a valuable alternative is to enhance the systems with existing multilingual domain-specific knowledge, e.g. IATE, AgroVoc or ETB, which are agreed and curated resources that contain the specific terms that professionals use in their expert-to-expert communication. In this sense, the provision of multilingual knowledge bases that could be integrated better into the MT pipeline is a crucial step towards increasing the translation quality of high-specific texts, since unknown words are among the most common sources of translation errors.
We believe that in the same way that domain ontologies and terminologies can benefit from NLP tools and MT systems that help to overcome the language barrier, NLP tools and MT systems would also profit from the specialized semantic knowledge and the terms that are captured in multilingual ontologies or terminologies. The workshop, therefore, aims at building a bridge between the knowledge base and machine translation communities.
Mihael Arcan, National University of Ireland, Galway, Ireland
Elena Montiel-Ponsoda, Universidad Politécnica de Madrid, Spain
Darja Fišer, University of Ljubljana, Jožef Stefan Institute, CLARIN ERIC
Tatjana Gornostaja, Tilde, Latvia, ELRA, BDVA
The workshop, co-located with LREC 2018, is supported by the CLARIN research infrastructure.