Open-Vocabulary Object Retrieval

Published at Robotics Science and Systems (RSS 2014).

Presented also at AAAI 2015.

Download .zip Download .tar.gz View on GitHub

Open-Vocabulary Object Retrieval.

Sergio Guadarrama, Erik Rodner, Kate Saenko, Ning Zhang,
Ryan Farrell, Jeff Donahue and Trevor Darrell.

We address the problem of retrieving objects based on open-vocabulary natural language queries: Given a phrase describing a specific object, e.g., "the corn flakes box", the task is to find the best match in a set of images containing candidate objects. When naming objects, humans tend to use natural language with rich semantics, including basic-level categories, fine-grained categories, and instance-level concepts such as brand names. Existing approaches to large-scale object recognition fail in this scenario, as they expect queries that map directly to a fixed set of pre-trained visual categories, e.g. ImageNet synset tags. We address this limitation by introducing a novel object retrieval method and we also propose a method for handling open-vocabularies, i.e., words not contained in the training data. Our method can combine category- and instance-level semantics in a common representation. Our approach can accurately retrieve objects based on extremely varied open-vocabulary queries.

We are currently preparing an extended version of the paper and code will be released afterwards.



Presentation slides

A very brief talk about the project was given by Erik at AAAI 2015. The slides can be found here.

Pre-trained models

Pre-trained ImageNet models can be found on the Caffe webpage.

Related open-source projects

  • Caffe - for category recognition with deep convolutional networks
  • GISS - google image-by-image search (scraper)
  • Google Freebase - for query expansion
  • For the experiments in the paper, we also made use of the iq-engines API for instance-level matching, which now belongs to yahoo


If you use the software provided on this webpage, please cite the following paper:
  author = {Sergio Guadarrama and Erik Rodner and Kate Saenko and Ning Zhang and Ryan Farrell and Jeff Donahue and Trevor Darrell},
  booktitle = {Robotics Science and Systems (RSS)},
  title = {Open-vocabulary Object Retrieval},
  year = {2014}