table of contents
So far, we have limited our dataset to Sicilian-English parallel text. But there's no reason to do so. With a small modification, we can train a single model to translate between multiple languages, including some for which there is little or no parallel text.
For example, if we did not have any Sicilian-Italian parallel text at all, we could still develop a model that translates between Sicilian and Italian ("zero shot" translation) by adding Italian-English parallel text to our dataset.
And if we have some Sicilian-Italian parallel text, then it's even possible to achieve high translation quality between Sicilian and Italian.
The small modification is to add a directional token to the beginning of the source sequence. Johnson et al. (2016) show that that single addition enables multilingual translation in an otherwise conventional model.
It's an example of transfer learning. In our case, as the model learns to translate from Italian to English, it would also learn to translate from Sicilian to English. And as the model learns to translate from English to Italian, it would also learn how to translate from English to Sicilian.
More parallel text is available for some languages than others however, so Johnson et al. also studied the effect on translation quality and found that oversampling low-resource language pairs improves their translation quality, but at expense of quality among high-resource pairs.
Importantly however, the comparison with bilingual translators holds constant the number of parameters in the model. Arivazhagan et al. (2019) show that training a larger model can improve translation quality across the board.
Given such potential to expand the directions in which languages can be translated and to improve the quality with which they can be translated, an important question is what the model learns. Does it learn to represent similar sentences in similar ways regardless of language? Or does it represent similar languages in similar ways?
Johnson et al. examined two trained trilingual models. In one, they observed similiar representations of translated sentences, while in the second they noticed that the representations of zero-shot translations were very different.
Kudugunta et al. (2019) examined the question in a model trained on 103 languages and found that the representations depend on both the source and target languages and they found that the encoder learns a representation in which linguistically similar languages cluster together.
In other words, because similar languages learn similar representations, our model would learn Sicilian-English better from Italian-English data than from Polish-English data. And other Romance languages, like Spanish, would also be good languages to consider.
We can collect some of that parallel text from the resources at OPUS an open repository of parallel corpora.
And we will. But we still have several issues of Arba Sicula to assemble Sicilian-English parallel text from. So we'll finish collecting that, collect some Sicilian-Italian text and then we'll obtain further improvements in translation quality by adding Italian and Spanish to the model.
Copyright © 2002-2020 Eryk Wdowiak