13 April 2020

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization




One of the key challenges in natural language processing (NLP) is building systems that not only work in English but in all of the world’s ~6,900 languages. Luckily, while most of the world’s languages are data sparse and do not have enough data available to train robust models on their own, many languages do share a considerable amount of underlying structure. On the vocabulary level, languages often have words that stem from the same origin — for instance, “desk” in English and “Tisch” in German both come from the Latin “discus”. Similarly, many languages also mark semantic roles in similar ways, such as the use of postpositions to mark temporal and spatial relations in both Chinese and Turkish.

In NLP, there are a number of methods that leverage the shared structure of multiple languages in training in order to overcome the data sparsity problem. Historically, most of these methods focused on performing a specific task in multiple languages. Over the last few years, driven by advances in deep learning, there has been an increase in the number of approaches that attempt to learn general-purpose multilingual representations (e.g., mBERT, XLM, XLM-R), which aim to capture knowledge that is shared across languages and that is useful for many tasks. In practice, however, the evaluation of such methods has mostly focused on a small set of tasks and for linguistically similar languages.

To encourage more research on multilingual learning, we introduce “XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization”, which covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax or semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data. Among these are many under-studied languages, such as the Dravidian languages Tamil (spoken in southern India, Sri Lanka, and Singapore), Telugu and Malayalam (spoken mainly in southern India), and the Niger-Congo languages Swahili and Yoruba, spoken in Africa. The code and data, including examples for running various baselines, is available here.

XTREME Tasks and Languages
The tasks included in XTREME cover a range of paradigms, including sentence classification, structured prediction, sentence retrieval and question answering. Consequently, in order for models to be successful on the XTREME benchmarks, they must learn representations that generalize to many standard cross-lingual transfer settings.

Tasks supported in the XTREME benchmark.
Each of the tasks covers a subset of the 40 languages. To obtain additional data in the low-resource languages used for analyses in XTREME, the test sets of two representative tasks, natural language inference (XNLI) and question answering (XQuAD), were automatically translated from English to the remaining languages. We show that models using the translated test sets for these tasks exhibited performance comparable to that achieved using human-labelled test sets.

Zero-shot Evaluation
To evaluate performance using XTREME, models must first be pre-trained on multilingual text using objectives that encourage cross-lingual learning. Then, they are fine-tuned on task-specific English data, since English is the most likely language where labelled data is available. XTREME then evaluates these models on zero-shot cross-lingual transfer performance, i.e., on other languages for which no task-specific data was seen. The three-step process, from pre-training to fine-tuning to zero-shot transfer, is shown in the figure below.
The cross-lingual transfer learning process for a given model: pre-training on multilingual text, followed by fine-tuning in English on downstream tasks, and finally zero-shot evaluation with XTREME.
In practice, one of the benefits of this zero-shot setting is computational efficiency — a pre-trained model only needs to be fine-tuned on English data for each task and can then be evaluated directly on other languages. Nevertheless, for tasks where labelled data is available in other languages, we also compare against fine-tuning on in-language data. Finally, we provide a combined score by obtaining the zero-shot scores on all nine XTREME tasks.

A Testbed for Transfer Learning
We conduct experiments with several state-of-the-art pre-trained multilingual models, including: multilingual BERT, a multilingual extension of the popular BERT model; XLM and XLM-R, two larger versions of multilingual BERT that have been trained on even more data; and a massively multilingual machine translation model, M4. A common feature of these models is that they have been pre-trained on large amounts of data from multiple languages. For our experiments, we choose variants of these models that are pre-trained on around 100 languages, including the 40 languages of our benchmark.

We find that while models achieve close to human performance on most existing tasks in English, performance is significantly lower for many of the other languages. Across all models, the gap between English performance and performance for the remaining languages is largest for the structured prediction and question answering tasks, while the spread of results across languages is largest for the structured prediction and sentence retrieval tasks.

For illustration, in the figure below we show the performance of the best-performing model in the zero-shot setting, XLM-R, by task and language, across all language families. The scores across tasks are not comparable, so the main focus should be the relative ranking of languages across tasks. As we can see, many high-resource languages, particularly from the Indo-European language family, are consistently ranked higher. In contrast, the model achieves lower performance on many languages from other language families such as Sino-Tibetan, Japonic, Koreanic, and Niger-Congo languages.
Performance of the best-performing model (XLM-R) across all tasks and languages in XTREME in the zero-shot setting. The reported scores are percentages based on task-specific metrics and are not directly comparable across tasks. Human performance (if available) is represented by a red star. Specific examples from each language family are represented with their ISO 639-1 codes.
In general we made a number of interesting observations.
  • In the zero-shot setting, M4 and mBERT are competitive with XLM-R for most tasks, while the latter outperforms them in the particularly challenging question answering tasks. For example, on XQuAD, XLM-R scored 76.6 compared to 64.5 for mBERT and 64.6 for M4, with similar spreads on MLQA and TyDi QA.
  • We find that baselines utilizing machine translation, which translate either the training data or test data, are very competitive. On the XNLI task, mBERT scored 65.4 in the zero shot transfer setting, and 74.0 when using translated training data.
  • We observe that the few-shot setting (i.e., using limited amounts of in-language labelled data, when available) is particularly competitive for simpler tasks, such as NER, but less useful for the more complex question answering tasks. This can be seen in the performance of mBERT, which improves by 42% on the NER task from 62.2 to 88.3 in the few-shot setting, but for the question answering task (TyDi QA), only improves by 25% (59.7 to 74.5).
  • Overall, a large gap between performance in English and other languages remains across all models and settings, which indicates that there is much potential for research on cross-lingual transfer.
Cross-lingual Transfer Analysis
Similar to previous observations regarding the generalisation ability of deep models, we observe that results improve if more pre-training data is available for a language, e.g., mBERT compared to XLM-R, which has more pre-training data. However, we find that this correlation does not hold for the structured prediction tasks, part-of-speech tagging (POS) and named entity recognition (NER), which indicates that current deep pre-trained models are not able to fully exploit the pre-training data to transfer to such syntactic tasks. We also find that models have difficulties transferring to non-Latin scripts. This is evident on the POS task, where mBERT achieves a zero-shot accuracy of 86.9 on Spanish compared to just 49.2 on Japanese.

For the natural language inference task, XNLI, we find that a model makes the same prediction on a test example in English and on the same example in another language about 70% of the time. Semi-supervised methods might be helpful in encouraging improved consistency between the predictions on examples and their translations in different languages. We also find that models struggle to predict POS tag sequences that were not seen in the English training data on which they were fine-tuned, highlighting that these models struggle to learn the syntax of other languages from the large amounts of unlabelled data used for pre-training. For named entity recognition, models have the most difficulty predicting entities that were not seen in the English training data for distant languages — accuracies on Indonesian and Swahili are 58.0 and 66.6, respectively, compared to 82.3 and 80.1 for Portguese and French.

Making Progress on Multilingual Transfer Learning
English has been the focal point of most recent advances in NLP despite being spoken by only around 15% of the world’s population. We believe that building on deep contextual representations, we now have the tools to make substantial progress on systems that serve the remainder of the world’s languages. We hope that XTREME will catalyze research in multilingual transfer learning, similar to how benchmarks such as GLUE and SuperGLUE have spurred the development of deep monolingual models, including BERT, RoBERTa, XLNet, AlBERT, and others. Stay tuned to our Twitter account for information on our upcoming website launch with a submission portal and leaderboard.

Acknowledgements:
This effort has been successful thanks to the hard work of a lot of people including, but not limited to the following (in alphabetical order of last name): Jon Clark, Orhan Firat, Junjie Hu, Graham Neubig, and Aditya Siddhant.

RIP John Conway


RIP John Conway

Daily Crunch: Apple and Google announce contact tracing initiative


Apple and Google reveal a joint effort to track the spread of COVID-19, a new study shows how fringe coronavirus theories are making their way to the mainstream and — in happier news — we get some hints on Apple’s hardware plans for the fall. Here’s your Daily Crunch for April 13, 2020.

1. Apple and Google are launching a joint COVID-19 tracing tool for iOS and Android

Apple and Google’s engineering teams have banded together to create a decentralized contact tracing tool that will help individuals determine whether they have been exposed to someone with COVID-19.

The first phase of the project is an API that public health agencies can integrate into their own apps. The next phase is a system-level contact tracing system that will work across iOS and Android devices on an opt-in basis.

2. Coronavirus conspiracies like that bogus 5G claim are racing across the internet

According to Yonder, an AI company that monitors online conversations including disinformation, conspiracies that would normally remain in fringe groups are traveling to the mainstream faster during the epidemic. The company estimates that it would normally take six to eight months for a “fringe narrative” to make its way from the edges of the internet into the mainstream, while that interval looks like three to 14 days in the midst of COVID-19.

3. Apple said to be planning fall iPhone refresh with iPad Pro-like design

Apple is readying a new iPhone for fall to replace the iPhone 11 Pro this fall, Bloomberg reports, as well as follow-ups to the iPhone 11, a smaller HomePod and a locator tag accessory.

4. Amazon to hire 75,000 more to address increased demand due to coronavirus crisis

The company said its hiring efforts can help mitigate some of the job loss and furloughing that has resulted from the economic crisis that is also occurring as part of the COVID-19 pandemic. In fact, Amazon positioned its openings as an option for anyone looking to seek work “until things return to normal and their past employer is able to bring them back.”

5. Checking on Utah’s startup scene as the economy slips

TechCrunch is taking a closer look at a few startup markets as the world changes. Following our dive into Boston late last week, we’re widening our scope and taking a peek at the state of Utah. (Extra Crunch membership required.)

6. Tesla resurrects long-range RWD Model 3 for the Chinese market

Tesla is now producing and selling the long-range, rear-wheel-drive version of its Model 3 electric vehicle at its Shanghai factory. The move is notable because Tesla discontinued production of the long-range RWD Model 3 in the U.S. This also marks a shift from Tesla’s initial plan to sell a more basic version of the Model 3 in China.

7. This week’s TechCrunch podcasts

The latest full-length episode of Equity rounds up a bunch of different fintech stories (including SoFi’s $1.2 billion purchase of Galileo), while the Monday news roundup looks at some of SoftBank’s latest financial numbers. And over on Original Content, we had some pretty strong feelings about the initial content lineup at Quibi.

The Daily Crunch is TechCrunch’s roundup of our biggest and most important stories. If you’d like to get this delivered to your inbox every day at around 9am Pacific, you can subscribe here.


Read Full Article

The wonders of the molecular world, animated | Janet Iwasa

The wonders of the molecular world, animated | Janet Iwasa

Some biological structures are so small that scientists can't see them with even the most powerful microscopes. That's where molecular animator and TED Fellow Janet Iwasa gets creative. Explore vast, unseen molecular worlds as she shares mesmerizing animations that imagine how they might work.

Click the above link to download the TED talk.

Apple said to be planning fall iPhone refresh with iPad Pro-like design


Apple is readying a new iPhone for fall to replace the iPhone 11 Pro this fall, Bloomberg reports, as well as follow-ups to the iPhone 11, a new smaller HomePod, and a locator tag accessory. The top-end iPhone 11 Pro successors at least will have a new industrial design that more closely resembles the iPad Pro, with flat screens and sides instead of the current rounded edge design, and they’ll also include the 3D LIDAR sensing system that Apple introduced with the most recent iPad Pro refresh in March.

The new highs-end iPhone design will look more like the iPhone 5, Bloomberg says, with “flat stainless steel edges,” and the screen on the lager version will be slightly bigger than the 6.5-inch display found on the current iPhone 11 Pro Max. It could also feature a smaller version of the current ‘notch’ camera cutout in at the top end of the display, the report claims.

Meanwhile, the LIDAR tracking system added to the rear camera array will be combined with processor speed and performance improvements, which should add up to significant improvements in augmented reality (AR) performance. The processor improvements are also designed to help boost on-device AI performance, the report notes.

These phones are still planned for a fall launch and release, though some of them could be available “multiple weeks later than normal,” Bloomberg claims, owing to disruptions caused by the ongoing coronavirus pandemic.

Other updates to the company’s product line on the horizon include a new smaller HomePod that’s around 50 percent smaller than the current version, with a planned launch sometime later this year. It’ll offer a price advantage versus the current model, and the report claims it’ll also come alongside Siri improvements and expansion of music streaming service support beyond Apple’s own. There’s also Apple Tags, which Apple itself has accidentally tipped as coming – a Tile-like Bluetooth location tracking accessory. Bloomberg says that could come out this year.

Finally, the report says there are updates to the MacBook Pro, Apple TV, lower-end iPads and iMac on the way, which is not surprising given Apple’s usual hardware update cadence. There’s no timeline for release on any of those, and it remains to be seen how the COVID-19 situation impacts these plans.


Read Full Article

Google unveils Maps, Search and YouTube features in India to help people combat coronavirus


Google has launched a website dedicated to coronavirus updates in India and tweaked its search engine and YouTube to prominently display authoritative information and locally relevant details about the pandemic from the nation’s Ministry of Health and Family Welfare, the company said on Monday.

Additionally, Google is also showing more than 1,500 food and night shelters in about three dozen cities in India on Google Maps and Search to guide people in need, it said. Millions of migrant daily-wage workers in India recently started to head to their home towns as their work disappeared after New Delhi ordered a 21-day lockdown across the nation last month to fight the spread of the infectious disease.

People can also find these locations by also asking Google Assistant about “food shelters,” for instance, in English and Hindi. Assistant is available to users on smartphones, KaiOS-powered feature phones and through a Vodafone-Idea phone line. (The company said it is working on supporting additional Indian languages.)

The Mountain View-headquartered giant, which counts India among its key overseas markets, said it has published COVID-19 Community Mobility Reports to help health officials in the country in their decision-making. The reports capture how traffic and movement across public places such as parks, transit stations and grocery stores have changed in the country in recent weeks.

On Maps, Google has also introduced Nearby Spot on Maps to help people in the nation find local stores that are providing essential items such as groceries.

YouTube and Search are showing consolidated information including the top news stories, links to MoHFW resources, and other authoritative content on symptoms, prevention, and treatments, Google said. YouTube has additionally also launched a ‘Coronavirus News Shelf,’ featured atop the homepage, that provides the latest news from authoritative media outlets regarding the outbreak.

In recent weeks, Google’s Pay service, as well as Walmart’s PhonePe and Paytm, introduced simplified ways to donate to Indian Prime Minister Narendra Modi’s fund to fight the coronavirus. Google said people have used its payments service to donate north of $13 million to date.

These steps should help contain another outbreak that India has been grappling with for several years: False information. Messaging services are filled with speculative and false information about why the government is doing what it is doing, who is spreading the infectious disease, or the age-old Indian devised cure for the Covid-19. And more often than not, these hoaxes are presented as facts by select TV news networks that reach hundreds of millions of users.

WhatsApp, the most popular app in India, has also stepped up to inform users about the infectious disease.


Read Full Article

Is 5G Safe or Dangerous? Here’s Everything You Need to Know


Smartphones have changed the way we interact with the internet. Our cell networks have evolved over the years to keep up with our increasing demand, and 5G is the latest iteration of mobile internet.

Now that 5G networks are coming online, there has been speculation about the safety of the technology. You may have even heard some of these claims. So, it’s time to find out, is 5G safe?

What Is 5G?

In most homes, connections to the internet are usually made via Wi-Fi. This is common across offices, and even coffee shops and public spaces like shopping malls. Outside of those areas, the cellular networks operated by AT&T, Verizon, and similar providers connect us to the internet.

There have been technological developments to improve mobile internet speeds, reliability, and coverage to support the increase of internet-connected devices. One of the most significant developments came in the form of 4G and LTE, which allowed us to use our smartphones to stream music, video call, and even watch Netflix while on the go.

5G represents the evolution of cellular networks to handle the coming influx of devices. The technology promises broadband-like speeds while out and about, as well as supporting the future of the internet of Things (IoT) devices. These are just the headlines, though; there are many ways that 5G will make mobile internet faster.

Is 5G Dangerous?

Illustration of devices connected via a 5G network
mohamed_hassan/Pixabay

In 2019, there was a public debate about whether Huawei should be able to operate 5G networks. This may have made you concerned whether 5G is a security risk. It’s also possible that you’ve heard about some of the health concerns raised about the network, too. During the COVID-19 pandemic, 5G was wrongly implicated in spreading the disease, for example.

There are some extreme claims about the health impact of 5G networks and technologies. However, despite media coverage of such claims, there is no evidence to suggest that 5G is dangerous to your health. As the New York Times noted, many of the same issues were raised about 4G, too.

Similarly to those older technologies, there is no evidence that 5G networks are dangerous. Initial studies have shown that the amounts of radiation generated by 5G cell towers and 5G smartphones are well below official safety limits.

In March 2020, The Guardian recounted statements by The International Commission on Non-Ionizing Radiation Protection (ICNIRP), noting that 5G is safe, while the risk posed is no different from other wireless networks.

It is reasonable, though, to still be cautious—after all, absence of evidence is not evidence of absence. That’s why governments around the world keep the situation under review.

One complication is that scientific studies produce different results from one another. Study A may show no negative impact, while Study B shows a small possible impact. In this example, Study A would not be widely reported—it’s not very interesting to say nothing happened—but Study B would likely receive a fair amount of media coverage.

The Scientific Method

Close-up image of a microscope
kkolosov/Pixabay

The Scientific Method exists to deal with inconsistent results. This is a method of inquiry used by researchers, where data is observed without bias or assumption, as much as possible. In this scenario, someone has a question and sets about trying to answer it by developing a test and generating data.

The researchers then draw conclusions from the data. Once written up, the study is peer-reviewed, and, if found to be without error, will be accepted into a scientific journal for publication. This allows for public scrutiny of the study and the conclusions that were drawn from it.

Other scientists may ask the same or similar question, but have a different method of testing. This will likely give different results, too. Because of this, there could be a situation where two studies, broadly examining the same topic, give different results.

To combat this, scientists and governments look for consensus on a given issue. However, an agreement can be challenging to achieve. For example, two opinion articles in Scientific American exposed differing views on the safety of 5G.

Published first, an article by Joel M. Moscowitz makes the arguement that 5G is unsafe. At the same time, a follow-up piece by David Robert Grimes contends that personal idealogy and low-quality studies guide Moscowitz’s argument.

Is 5G Safe?

The British telecoms regulator, Ofcom, performed one of the country’s first studies into 5G networks. They took measurements at 16 locations in 10 UK cities. As reported by the BBC, their results showed that the maximum radiation output was 0.039 percent of official safety limits.

Scientific consensus is based on the data we currently have. So, of course, this may change in the future.

Studies into 5G are presently limited as the technology is still being rolled out. There are low numbers of users and compatible phones, too. As 5G becomes widely available, there will be more opportunities for studies and, importantly, long-term research into wireless technologies.

Based on our current understanding, though, 5G does not pose a risk to human health.

Wireless Networks and Cancer

Illustration of a DNA double helix
qimono/Pixabay

One of the long-standing claims about the new network is that 5G may cause cancer, so it’s worth taking a look at this particular assertion.

Cancer is the uncontrollable growth of our body’s cells. Our DNA contains instructions on how the cell should behave, and control the growth of the cell, too. If there is a change, or mutation, to these structures, then the instructions become incorrect, leading to abnormal growth and multiplication of cells.

Radiation can damage cells, leading to these mutations. There are multiple types and strengths of radiation. If the radiation has enough energy, it is able to interact with atoms, and detach electrons. This is ionizing radiation and is considered the most dangerous to humans. Despite the damage it can inflict, ionizing radiation is also used in cancer radiotherapy treatment.

Low-energy, non-ionizing radiation is not able to interact with atoms, and, as a result, our cells. Wireless technologies, like Wi-Fi, radio, and LTE, fall into this category. That is true of 5G, as well. However, since the introduction of mobile phones in the 1990s, there have been suggestions that the non-ionizing radiation emitted by these wireless devices can harm us.

Does 5G Cause Cancer?

The World Health Organization’s (WHO) 2014 guidance states that a “…large number of studies have been performed over the last two decades to assess whether mobile phones pose a potential health risk. To date, no adverse health effects have been established as being caused by mobile phone use.”

While this type of non-ionization radiation may not directly cause mutations, there have also been studies into the other effects of wireless radiofrequency radiation. For example, this low-energy radiofrequency radiation can cause increases in temperature. However, investigations into this effect have also shown there are no impacts to your health as a result.

Such was the case in Australia. As reported by ZDNet, network operators found that 5G networks were no more harmful than other household items like baby monitors and microwaves.

Is 5G the Future?

While it’s worth being cautious around new technologies, there’s no evidence to suggest that 5G is any more dangerous than 4G, Wi-Fi, or any other existing wireless systems. Even then, the impact of such networks is debatable, with most studies concluding there is insufficient evidence to report them as unsafe.

If you’re ready to dive into the mobile network of the future, then you’ll need a phone that supports it. So, be sure to check out the best 5G smartphones before you make your next upgrade.

Read the full article: Is 5G Safe or Dangerous? Here’s Everything You Need to Know


Read Full Article