Multilingual Web Workshop, Luxembourg 2012

From Agroknow

Jump to: navigation, search
Multilingual Web workshop

Contents

W3C Workshop: The MultilingualWeb – The Way Ahead

The MultilingualWeb workshop has been organized by the W3C, during 15 - 16 March 2012, at Luxembourg. It has been the fourth of four workshops that survey and share information about currently available best practices and standards that can help content creators and localizers address the needs of the multilingual Web, including the Semantic Web. In fact, this has been the closing event of the MultilingualWeb project: http://www.multilingualweb.eu/

Participation was around 130 attendants, almost equally divided between technological and linguistic backgrounds.

You can read more about the workshop and the Multilingual Web project at http://www.multilingualweb.eu/documents/luxembourg-workshop

The program page points to speakers’ slides and to the relevant parts of the IRC logs. Links to video recordings will follow shortly. Moreover, a couple of participants have blogged about the event:


On behalf of Agro-Know, Tasos attended the event, as an associate of the Organic.Lingua project. The following are Tasos' notes with a focus on interesting information for the ongoing work of the Organic.Lingua project.


Some interesting questions and related updates:

  • How is semantic related to multilingual web?
    • Linked data is also important because it can give information about the language
    • Linking alternative language versions of information across different web sites
  • SPARQL, the Query Language for RDF is updated to version 1.1
    • SQARQL now has insert update (not only query)
    • nested queries to lessen the need for many queries, subqueries, etc. over the web
  • [R2RML http://www.w3.org/TR/r2rml/] a new W3C technical report for mapping relational to RDF datasets.
    • Useful also for application graphs.


Notes from presentations

Lux12a.PNG

W3C activity related to ML

  • As stated by Richard Ishida (W3C), the new MultilingualWeb LT project will carry on the work of MultilingualWeb
  • Some new m17n related features in HTML5:
    • Interesting to research: internalization tag set (__ITS__)
    • translate attribute (Y/N) in HTML5 already supported by Google / MS translate

Joomla

  • latest version supports 57 languages
  • they try to help in the translation process for the web site, three options:
    • machine translation (widgets for Google translate, Bing, etc.)
      • problem, content is not indexed, __NOT INDEXED BY SEARCH ENGINES__
    • parallel translation
      • joomfish
      • falang
      • but, should we translate or provide local contentit depends (on your site)!
    • sites-within-a-site
      • supporting alternative versions, rather than just translated (Pipi Longstockshockings)


Wikimedia Foundation

  • (by Gerard Meijssen)
  • wikipedia is supported in 283 languages
  • request for 129 more languages
  • many languages include source, books, dictionary project as well
  • source very important, eg. for India
  • 12 million freely usable images
  • they are planning to link with Europeana
  • The **value** of localisation is HIGHER than the cost of developers!
  • Standards:
    • CLDR - the information is not accurate even for English
      • the developers have a hard time entering data! → it's very very hard!
      • it uses time boxes for data entry
      • not all languages are supported
    • translation memories from wikimedia


Presentation of 'Directorate General for Translation'

  • (by Spyridon Pilos)
  • interest in MLWeb project for DG Translation, because
  • Systran for 15 years, up to 2011, 10 languages with systran → 60 M€ !!!
  • Project started OCT 2010
  • Deployment expected summer 2013
  • Built around MOSES open source
  • 23 language pairs,
  • they will collect **post-editing** feedback for translation memory


Mozilla Pontoon

  • (Matjaz Horvat)
  • Pontoon — Live website localisation
  • problem #1: localisers don't know the content
  • problem #2: they don't know if their translations will fit
  • solution: content-editable html
  • in situ translation
  • http://pontoon-dev.mozillalabs.com


The MLWeb-LT project

  • (Dave Lewis)
  • LT = Language Technology
  • A new W3C working group, with support from EC
  • http://www.w3.org/International/multilingualweb/lt/
  • AIMS: define metadata for web content that facilitates its interaction with LT and localization processes.
  • Already have 28 participants from 20 organisations
  • Chairs: Felix Sasaki, David Filip, Dave Lewis
  • Feature freeze Nov 2012
  • Recommendation complete Dec 2013
  • Requirements workshop in Dublin in 11-12 June
  • Nice description of stakeholders and some use cases with very nice figures in the presentation

OPERA browser

  • nice screenshots, it supports multilingual alternative language versions!

MONNET project

  • (Pau Buitelaa - DERI)
  • Cross-Lingual Infromation Access
  • complicated query, eg.: EU wind energy companies; 2005-2010
  • business information in EN, DE, NL, ES, EL, etc.
  • they build a QUERY analysis tool
  • backend is an ontology with financial concepts
  • domain training
  • use-case: immigrants!
  • put a lexical layer
  • ODF-LITERAL, SKOS community / ontolex


Cross-lingual named entity disambiguation


The Multilingual Language Library

  • (Nicoletta Calzolari CNR-ILC)
  • make use of the sharing trend
  • Collaborative iResources
    • that are really built in collaborative way
    • new methodology of work
    • sense of value
    • Context:
      • the LT field is a very DATA-INTENSIVE field
      • annotation is at the core of various activities


Repurposing LR for ML websites

  • (Fernando Servan FAO)
  • the participate in the __MLWEB-LT__
  • English, Frech, Spanish, ... (6 official UN languages)
  • project last year
  • trying to use MT, specifically __MOSES__
  • reuse of legacy tranlsations for non-translated content
  • 5 language pairs (6 languages...)
  • ARABIC (very important) + SPANISH (had expertise)
  • terminology database
    • translation memories
    • ''HOWEVER:
      • the best format that would fit was TXT
      • generally additional work is needed
      • the standards that are needed are available but difficult to be integrated to work together
      • they abandoned the Arabic translation — too difficult!
      • acronyms a very big problem — they use it in UN but not used in Arabic!
      • not happy with quality, will not open it to the public yet
      • metadata: controlled vocabularies, conventions for URI, however lot's of metadata still need to be translated



Interesting projects

Monnet

Although knowledge processing on the Semantic Web is inherently language-independent, human interaction with semantically structured and linked data will be text or speech based – in multiple languages. Semantic Web development is therefore increasingly concerned with issues in multilingual querying and rendering of web knowledge and linked data. The Monnet project on 'Multilingual Ontologies for Networked Knowledge' provides solutions for this by offering methods for lexicalising and translating knowledge structures, such as ontologies and linked data vocabularies. The talk will discuss challenges and solutions in ontology lexicalisation and translation (localisation) by way of several use cases that are under development in the context of the Monnet project.


The ML language library

The Language Library is a quite new initiative – started with LREC 2012 – conceived as a facility for gathering and making available, through simple functionalities, all the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the Language Technology community. Its main characteristic is to be collaboratively built, with the entire community providing/enriching language resources by annotating/processing language data and freely using them. We exploit today's trend towards sharing for initiating a collective movement that works also towards creating synergies and harmonisation among different annotation efforts that are now dispersed. The Language Library could be considered as the beginning of a big Genome project for languages, where the community will collectively deposit/create increasingly rich and multi-layered linguistic resources, enabling a deeper understanding of the complex relations between different annotation layers.


CoSyne - ML Content Synchronisation with Wikis

FBK is among partners, synchronisation for wikis!


Accurat

The aim of the ACCURAT project is to research methods and techniques to overcome one of the central problems of machine translation (MT) – the lack of linguistic resources for under-resourced areas of machine translation.

The main goal is to find, analyze and evaluate novel methods that exploit comparable corpora on order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains.


Personal tools