03 October 2020

Google research lets sign language switch ‘active speaker’ in video calls


An aspect of video calls that many of us take for granted is the way they can switch between feeds to highlight whoever’s speaking. Great — if speaking is how you communicate. Silent speech like sign language doesn’t trigger those algorithms, unfortunately, but this research from Google might change that.

It’s a real-time sign language detection engine that can tell when someone is signing (as opposed to just moving around) and when they’re done. Of course it’s trivial for humans to tell this sort of thing, but it’s harder for a video call system that’s used to just pushing pixels.

A new paper from Google researchers, presented (virtually, of course) at ECCV, shows how it can be done efficiency and with very little latency. It would defeat the point if the sign language detection worked but it resulted in delayed or degraded video, so their goal was to make sure the model was both lightweight and reliable.

The system first runs the video through a model called PoseNet, which estimates the positions of the body and limbs in each frame. This simplified visual information (essentially a stick figure) is sent to a model trained on pose data from video of people using German Sign Language, and it compares the live image to what it thinks signing looks like.

Image showing automatic detection of a person signing.

Image Credits: Google

This simple process already produces 80 percent accuracy in predicting whether a person is signing or not, and with some additional optimizing gets up to 91.5 percent accuracy. Considering how the “active speaker” detection on most calls is only so-so at telling whether a person is talking or coughing, those numbers are pretty respectable.

In order to work without adding some new “a person is signing” signal to existing calls, the system pulls clever a little trick. It uses a virtual audio source to generate a 20 kHz tone, which is outside the range of human hearing, but noticed by computer audio systems. This signal is generated whenever the person is signing, making the speech detection algorithms think that they are speaking out loud.

Right now it’s just a demo, which you can try here, but there doesn’t seem to be any reason why it couldn’t be built right into existing video call systems or even as an app that piggybacks on them. You can read the full paper here.


Read Full Article

Twitter will make users remove tweets hoping Trump dies of COVID-19


President Donald Trump’s positive COVID-19 result has made Twitter a busy place in the past 24 hours, including some tweets that have publicly wished — some subtly and others more directly — that he die from the disease caused by coronavirus.

Twitter put out a reminder to folks that it doesn’t allow tweets that wish or hope for death or serious bodily harm or fatal disease against anyone. Tweets that violate this policy will need to be removed, Twitter said Friday. However, it also clarified that this does not automatically mean suspension. Several new outlets misreported that users would be suspended automatically. Of course, that doesn’t mean users won’t be suspended.

On Thursday evening, Trump tweeted that he and his wife and First Lady Melania Trump had tested positive for COVID-19. White House physician Sean Conley issued a memo Friday confirming the positive results of SAR-Cov-2 virus, which often more commonly known as COVID-19.  Trump was seen boarding a helicopter Friday evening that was bound for Walter Reed Medical Center for several days of treatment.

The diagnosis sent shares tumbling Friday on the key exchanges, including Nasdaq. The news put downward pressure on all major American indices, but heaviest on tech shares.


Read Full Article

Daily Crunch: Twitter confronts image-cropping concerns


Twitter addresses questions of bias in its image-cropping algorithms, we take a look at Mario Kart Live and the stock market takes a hit after President Trump’s COVID-19 diagnosis. This is your Daily Crunch for October 2, 2020.

The big story: Twitter confronts image-cropping concerns

Last month, (white) PhD student Colin Madland highlighted potential algorithmic bias on Twitter and Zoom — in Twitter’s case, because its automatic image cropping seemed to consistently highlight Madland’s face over that of a Black colleague.

Today, Twitter said it has been looking into the issue: “While our analyses to date haven’t shown racial or gender bias, we recognize that the way we automatically crop photos means there is a potential for harm.”

Does that mean it will stop automatically cropping images? The company said it’s “exploring different options” and added, “We hope that giving people more choices for image cropping and previewing what they’ll look like in the tweet composer may help reduce the risk of harm.”

The tech giants

Nintendo’s new RC Mario Kart looks terrific — Mario Kart Live (with a real-world race car) makes for one hell of an impressive demo.

Tesla delivers 139,300 vehicles in Q3, beating expectations — Tesla’s numbers in the third quarter marked a 43% improvement from the same period last year.

Zynga completes its acquisition of hyper-casual game maker Rollic — CEO Frank Gibeau told me that this represents Zynga’s first move into the world of hyper-casual games.

Startups, funding and venture capital

Elon Musk says an update for SpaceX’s Starship spacecraft development program is coming in 3 weeks —  Starship is a next-generation, fully reusable spacecraft that the company is developing with the aim of replacing all of its launch vehicles.

Paired picks up $1M funding and launches its relationship app for couples — Paired combines audio tips from experts with “fun daily questions and quizzes” that partners answer together.

With $2.7M in fresh funding, Sora hopes to bring virtual high school to the mainstream — Long before the coronavirus, Sora was toying with the idea of live, virtual high school.

Advice and analysis from Extra Crunch

Spain’s startup ecosystem: 9 investors on remote work, green shoots and 2020 trends — While main hubs Madrid and Barcelona bump heads politically, tech ecosystems in each city have been developing with local support.

Which neobanks will rise or fall? — Neobanks have led the $3.6 billion in venture capital funding for consumer fintech startups this year.

Asana’s strong direct listing lights alternative path to public market for SaaS startups — Despite rising cash burn and losses, Wall Street welcomed the productivity company.

Everything else

American stocks drop in wake of president’s COVID-19 diagnosis — The news is weighing heavily on all major American indices, but heaviest on tech shares.

Digital vote-by-mail applications in most states are inaccessible to people with disabilities — According to an audit by Deque, most states don’t actually have an accessible digital application.

The Daily Crunch is TechCrunch’s roundup of our biggest and most important stories. If you’d like to get this delivered to your inbox every day at around 3pm Pacific, you can subscribe here.


Read Full Article

Twitter is building ‘Birdwatch,’ a system to fight misinformation by adding more context to tweets


Twitter is developing a new product called “Birdwatch,” which the company confirms is an attempt at addressing misinformation across its platform by providing more context for tweets, in the form of notes. Tweets can be added to “Birdwatch” — meaning flagged for moderation — from the tweet’s drop-down menu, where other blocking and reporting tools are found today. A small binoculars icon will also appear on tweets published to the Twitter Timeline. When the button is clicked, users are directed to a screen where they can view the tweet’s history of notes.

Based on screenshots of Birdwatch unearthed through reverse engineering techniques, a new tab called “Birdwatch Notes” will be added to Twitter’s sidebar navigation, alongside other existing features like Lists, Topics, Bookmarks and Moments.

This section will allow you to keep track of your own contributions, aka your “Birdwatch Notes.”

The feature was first uncovered this summer in early stages of development by reverse engineer Jane Manchun Wong, who found the system through Twitter’s website. At the time, Birdwatch didn’t have a name, but it clearly showed an interface for flagging tweets, voting on whether or not the tweet was misleading, and adding a note with further explanations.

Twitter updated its web app a few days after her discovery, limiting further investigation.

This week, however, a very similar interface was again discovered in Twitter’s code, this time on iOS.

According to social media consultant Matt Navarra, who tweeted several more screenshots of the feature on mobile, Birdwatch allows users to attach notes to a tweet. These notes can be viewed when clicking on the binoculars button on the tweet itself.

In other words, additional context about the statements made in the tweet would be open to the public.

What’s less clear is whether everyone on Twitter will be given access to annotate tweets with additional context, or whether this permission will require approval, or only be open to select users or fact checkers.

Twitter early adopter and hashtag inventor Chris Messina openly wondered if Birdwatch could be some sort of “citizen’s watch” system for policing disinformation on Twitter. It turns out, he was right.

According to line items he found within Twitter’s code, these annotations — the “Birdwatch Notes” — are referred to as “contributions,” which does seem to imply a crowdsourced system. (After all, a user would contribute to a shared system, not to a note they were writing for only themselves to see.)

Image Credits: Chris Messina

Crowdsourcing moderation wouldn’t be new to Twitter. For several years, Twitter’s live-streaming app Periscope has relied on crowdsourcing techniques to moderate comments on its real-time streams in order to clamp down on abuse.

There is still much we don’t know about how Birdwatch will work from a non-technical perspective, however. We don’t know if everyone will have the same abilities to annotate tweets, how attempts to troll this system will be handled, or what would happen to a tweet if it got too many negative dings, for example.

In more recent months, Twitter has tried to take a harder stance on tweets that contain misleading, false or incendiary statements. It has even gone so far as to apply fact-check labels to some of Trump’s tweets and has hidden others behind a notice warning users that the tweet has violated Twitter’s rules. But scaling moderation across all of Twitter is a task the company has not been well-prepared for, as it built for scale first, then tried to figure out policies and procedures around harmful content after the fact.

Reached for comment, Twitter declined to offer details regarding its plans for Birdwatch, but did confirm the feature was designed to combat the spread of misinformation.

“We’re exploring a number of ways to address misinformation and provide more context for tweets on Twitter,” a Twitter spokesperson told TechCrunch. “Misinformation is a critical issue and we will be testing many different ways to address it,” they added.

 


Read Full Article

Google wakes up from its VR daydream


Daydream, Google’s mobile-focused virtual reality platform is losing official support from Google, Android Police reports. The company confirmed that it will no longer be updating the Daydream software, with the publication noting that “Daydream may not even work on Android 11” as a result of this.

This isn’t surprising to anyone who has been tracking the company’s moves in the space. After aggressive product rollouts in 2016 and 2017, Google quickly abandoned its VR efforts which, much like the Samsung Gear VR, allowed users to drop a compatible phone into a headset holster and use the phone’s display and compute to power VR experiences. After Apple’s announcement of ARKit, the company did a hard pivot away from VR, turning its specialty AR platform Tango into ARCore, an AR developer platform that has also not seen very much attention from Google in recent months.

Google bowing out of official support from Daydream comes after years without product updates to their own View headset and very little investment in their content ecosystem which wrecked the chances of Lenovo’s third-party effort the standalone Mirage Solo.

What went wrong? Once it became clear that Daydream wasn’t going to be an easy win, they kind of just abandoned the effort. Google’s hardware business is already peanuts to their search and ads business so it probably wasn’t clear what the point was, but virtual reality also quickly went from being the “it” technology to work on to clearly being a labor of love for a select few. Google determined it wasn’t the effort while Facebook continued to double down. It’s hard to fault them for it, in 2020, even with some very good hardware on the way from Oculus, it still isn’t clear what VR’s future looks like.

It is clear, however, that Daydream won’t be part of it.


Read Full Article

Macrometa, an edge computing service for app developers, lands $7M seed round led by DNX


As people continue to work and study from home because of the COVID-19 pandemic, interest in edge computing has increased. Macrometa, a Palo Alto-based startup that provides edge computing infrastructure for app developers, announced today it has closed a $7 million seed round.

The funding was led by DNX Ventures, an investment fund that focuses on early-stage B2B startups. Other participants included returning investors Benhamou Global Ventures, Partech Partners, Fusion Fund, Sway Ventures, Velar Capital and Shasta Ventures.

While cloud computing relies on servers and data centers owned by providers like Amazon, IBM, Microsoft and Google, edge computing is geographically distributed, with computing done closer to data sources, allowing for faster performance.

Founded in 2018 by chief executive Chetan Venkatesh and chief architect Durga Gokina, Macrometa’s globally distributed data service, called Global Data Network, combines a distributed NoSQL database and a low-latency stream data processing engine. It allows developers to run their cloud apps and APIs across 175 edge regions around the world. To reduce delays, app requests are sent to the region closest to the user. Macrometa claims that requests can be processed in less than 50 milliseconds globally, making it 50 to 100 times faster than cloud platforms like DyanmoDB, MongoDB or Firebase. One of the ways that Macrometa differentiates from competitors is that it enables developers to work with data stored across a global network of cloud providers, like Google Cloud and Amazon Web Services (for example), instead of a single provider.

As more telecoms roll out 5G networks, demand for globally distributed, serverless data computing services like Macrometa are expected to increase, especially to support enterprise software. Other edge computing-related startups that have recently raised funding include Latent AI, SiMa.ai and Pensando.

A spokesperson for Macrometa said the seed round was oversubscribed because the pandemic has increased investor interest in cloud and edge companies like Snowflake, which recently held its initial public offering.

Macrometa also announced today that it has added to its board of directors DNX managing partner Q Motiwala, former Auth0 and xnor.ai chief executive Jon Gelsey and Armorblox chief technology officer Rob Fry.

In a statement about the funding, Motiwala said, “As we look at the next five to ten years of cloud evolution, it’s clear to us that enterprise developers need a platform like Macrometa to go beyond the constraints, scaling limitations and high-cost economics that current cloud architecture impose. What Macrometa is doing for edge computing, is what Amazon Web Services did for the cloud a decade ago.”


Read Full Article

Massively Large-Scale Distributed Reinforcement Learning with Menger


In the last decade, reinforcement learning (RL) has become one of the most promising research areas in machine learning and has demonstrated great potential for solving sophisticated real-world problems, such as chip placement and resource management, and solving challenging games (e.g., Go, Dota 2, and hide-and-seek). In simplest terms, an RL infrastructure is a loop of data collection and training, where actors explore the environment and collect samples, which are then sent to the learners to train and update the model. Most current RL techniques require many iterations over batches of millions of samples from the environment to learn a target task (e.g., Dota 2 learns from batches of 2 million frames every 2 seconds). As such, an RL infrastructure should not only scale efficiently (e.g., increase the number of actors) and collect an immense number of samples, but also be able to swiftly iterate over these extensive amounts of samples during training.

Overview of an RL system in which an actor sends trajectories (e.g., multiple samples) to a learner. The learner trains a model using the sampled data and pushes the updated model back to the actor (e.g. TF-Agents, IMPALA).

Today we introduce Menger1, a massive large-scale distributed RL infrastructure with localized inference that scales up to several thousand actors across multiple processing clusters (e.g., Borg cells), reducing the overall training time in the task of chip placement. In this post we describe how we implement Menger using Google TPU accelerators for fast training iterations, and present its performance and scalability on the challenging task of chip placement. Menger reduces the training time by up to 8.6x compared to a baseline implementation.

Menger System Design
There are various distributed RL systems, such as Acme and SEED RL, each of which focus on optimizing a single particular design point in the space of distributed reinforcement learning systems. For example, while Acme uses local inference on each actor with frequent model retrieval from the learner, SEED RL benefits from a centralized inference design by allocating a portion of TPU cores for performing batched calls. The tradeoffs between these design points are (1) paying the communication cost of sending/receiving observations and actions to/from a centralized inference server or paying the communication cost of model retrieval from a learner and (2) the cost of inference on actors (e.g., CPUs) compared to accelerators (e.g., TPUs/GPUs). Because of the requirements of our target application (e.g., size of observations, actions, and model size), Menger uses local inference in a manner similar to Acme, but pushes the scalability of actors to virtually an unbounded limit. The main challenges to achieving massive scalability and fast training on accelerators include:

  1. Servicing a large number of read requests from actors to a learner for model retrieval can easily throttle the learner and quickly become a major bottleneck (e.g., significantly increasing the convergence time) as the number of actors increases.
  2. The TPU performance is often limited by the efficiency of the input pipeline in feeding the training data to the TPU compute cores. As the number of TPU compute cores increases (e.g., TPU Pod), the performance of the input pipeline becomes even more critical for the overall training runtime.

Efficient Model Retrieval
To address the first challenge, we introduce transparent and distributed caching components between the learner and the actors optimized in TensorFlow and backed by Reverb (similar approach used in Dota). The main responsibility of the caching components is to strike a balance between the large number of requests from actors and the learner job. Adding these caching components not only significantly reduces the pressure on the learner to service the read requests, but also further distributes the actors across multiple Borg cells with a marginal communication overhead. In our study, we show that for a 16 MB model with 512 actors, the introduced caching components reduce the average read latency by a factor of ~4.0x leading to faster training iterations, especially for on-policy algorithms such as PPO.

Overview of a distributed RL system with multiple actors placed in different Borg cells. Servicing the frequent model update requests from a massive number of actors across different Borg cells throttles the learner and the communication network between learner and actors, which leads to a significant increase in the overall convergence time. The dashed lines represent gRPC communication between different machines.
Overview of a distributed RL system with multiple actors placed in different Borg cells with the introduced transparent and distributed caching service. The learner only sends the updated model to the distributed caching services. Each caching service handles the model request updates from the nearby actors (i.e., actors placed on the same Borg cells) and the caching service. The caching service not only reduces the load on the learner for servicing the model update requests, but also reduces the average read latency by the actors.

High Throughput Input Pipeline
To deliver a high throughput input data pipeline, Menger uses Reverb, a recently open-sourced data storage system designed for machine learning applications that provides an efficient and flexible platform to implement experience replay in a variety of on-policy/off-policy algorithms. However, using a single Reverb replay buffer service does not currently scale well in a distributed RL setting with thousands of actors, and simply becomes inefficient in terms of write throughput from actors.

A distributed RL system with a single replay buffer. Servicing a massive number of write requests from actors throttles the replay buffer and reduces its overall throughput. In addition, as we scale the learner to a setting with multiple compute engines (e.g., TPU Pod), feeding the data to these engines from a single replay buffer service becomes inefficient, which negatively impacts the overall convergence time.

To better understand the efficiency of the replay buffer in a distributed setting, we evaluate the average write latency for various payload sizes from 16 MB to 512 MB and a number of actors ranging from 16 to 2048. We repeat the experiment when the replay buffer and actors are placed on the same Borg cell. As the number of actors grows the average write latency also increases significantly. Expanding the number of actors from 16 to 2048, the average write latency increases by a factor of ~6.2x and ~18.9x for payload size 16 MB and 512 MB, respectively. This increase in the write latency negatively impacts the data collection time and leads to inefficiency in the overall training time.

The average write latency to a single Reverb replay buffer for various payload sizes (16 MB - 512 MB) and various number of actors (16 to 2048) when the actors and replay buffer are placed on the same Borg cells.

To mitigate this, we use the sharding capability provided by Reverb to increase the throughput between actors, learner, and replay buffer services. Sharding balances the write load from the large number of actors across multiple replay buffer servers, instead of throttling a single replay buffer server, and also minimizes the average write latency for each replay buffer server (as fewer actors share the same server). This enables Menger to scale efficiently to thousands of actors across multiple Borg cells.

A distributed RL system with sharded replay buffers. Each replay buffer service is a dedicated data storage for a collection of actors, generally located on the same Borg cells. In addition, the sharded replay buffer configuration provides a higher throughput input pipeline to the accelerator cores.

Case Study: Chip Placement
We studied the benefits of Menger in the complex task of chip placement for a large netlist. Using 512 TPU cores, Menger achieves significant improvements in the training time (up to ~8.6x, reducing the training time from ~8.6 hours down to merely one hour in the fastest configuration) compared to a strong baseline. While Menger was optimized for TPUs, that the key factor for this performance gain is the architecture, and we would expect to see similar gains when tailored to use on GPUs.

The improvement in training time using Menger with variable number of TPU cores compared to a baseline in the task of chip placement.

We believe that Menger infrastructure and its promising results in the intricate task of chip placement demonstrate an innovative path forward to further shorten the chip design cycle and has the potential to not only enable further innovations in the chip design process, but other challenging real-world tasks as well.

Acknowledgments
Most of the work was done by Amir Yazdanbakhsh, Junchaeo Chen, and Yu Zheng. We would like to also thank Robert Ormandi, Ebrahim Songhori, Shen Wang, TF-Agents team, Albin Cassirer, Aviral Kumar, James Laudon, John Wilkes, Joe Jiang, Milad Hashemi, Sat Chatterjee, Piotr Stanczyk, Sabela Ramos, Lasse Espeholt, Marcin Michalski, Sam Fishman, Ruoxin Sang, Azalia Mirhosseini, Anna Goldie, and Eric Johnson for their help and support.


1 A Menger cube is a three-dimensional fractal curve, and the inspiration for the name of this system, given that the proposed infrastructure can virtually scale ad infinitum.