Posted by André Araujo and Tobias Weyand, Software Engineers, Google Research
Image classification technology has shown remarkable improvement over the past few years, exemplified in part by the
Imagenet classification challenge, where error rates continue to
drop substantially every year. In order to continue advancing the state of the art in computer vision, many researchers are now putting more focus on fine-grained and instance-level recognition problems – instead of recognizing general entities such as buildings, mountains and (of course) cats, many are designing machine learning algorithms capable of identifying the Eiffel Tower, Mount Fuji or Persian cats. However, a significant obstacle for research in this area has been the lack of large annotated datasets.
Today, we are excited to advance instance-level recognition by releasing Google-Landmarks, the largest worldwide dataset for recognition of human-made and natural landmarks. Google-Landmarks is being released as part of the
Landmark Recognition and
Landmark Retrieval Kaggle challenges, which will be the focus of the
CVPR’18 Landmarks workshop. The dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world (their geographic distribution is presented below), a number of classes that is ~30x larger than what is available in commonly used datasets. Additionally, to spur research in this field, we are
open-sourcing Deep Local Features (
DELF), an attentive local feature that we believe is especially suited for this kind of task.
|
Geographic distribution of landmarks in our dataset. |
Landmark recognition presents some noteworthy differences from other problems. For example, even within a large annotated dataset, there might not be much training data available for some of the less popular landmarks. Additionally, since landmarks are generally rigid objects which do not move, the intra-class variation is very small (in other words, a landmark’s appearance does not change that much across different images of it). As a result, variations only arise due to image capture conditions, such as occlusions, different viewpoints, weather and illumination, making this distinct from other image recognition datasets where images of a particular class (such as a dog) can vary much more. These characteristics are also shared with other instance-level recognition problems, such as
artwork recognition — so we hope the new dataset can benefit research for other image recognition problems as well.
The two Kaggle challenges provide access to annotated data to help researchers address these problems. The
recognition track challenge is to build models that recognize the correct landmark in a dataset of challenging test images, while the
retrieval track challenges participants to retrieve images containing the same landmark.
If you plan to be at
CVPR this year, we hope you’ll attend the
CVPR’18 Landmarks workshop. However, everyone is able to participate in the challenge, and access to the new dataset is available via the Kaggle website. We hope this resource is valuable to your research and we can’t wait to see the ideas you will come up with for recognizing landmarks!
Acknowledgments
Jack Sim, Will Cukierski, Maggie Demkin, Hartwig Adam, Bohyung Han, Shih-Fu Chang, Ondrej Chum, Torsten Sattler, Giorgos Tolias, Xu Zhang, Fernando Brucher, Marco Andreetto, Gursheesh Kour.