100M CoPhIR Visual Search

The MUFIN Image Search is a web system that allows to search for visualy similar images in a collection of 100 million images. The query image can be provided either by taking a random image from the collection, by applying a text-search, or by providing an external image (a Firefox plugin can be used). The system then delivers resulting most-similar images from the collection ranked by their similarity distance in real-time.

Image dataset and similarity

The image database is a collection of 100 million images from the CoPhIR dataset. Original images are from Flickr.

The similarity of images is expressed by a combination of five MPEG7 global image content descriptors:

  • ColorStructure (weight 3)
  • ColorLayout (weight 2)
  • ScalableColor (weight 2)
  • EdgeHistogram (weight 4)
  • HomogeneousTexture (weight 0.5)

The global descriptors capture the overall characteristics of the images. Adopting this approach, the similarity measure does not locate individual objects from the query image in other images but compares the images as a whole.

Indexing and searching

The underlying indexing engine uses distributed M-Index technique. The index preparation was done using the hardware infrastructure provided by MetaCentrum and the online demo is running on IBM machines provided by IBM Shared University Research Award. The system is currently utitlizing 16 CPU cores, 32GB RAM and about 1TB of disk capacity.

Download Firefox plugin

Installing the plugin extends the Firefox browser with a small icon shown when a mouse is pointed on any web image. Clicking on the icon executes the MUFIN image search with the given query image and a page with the most similar images from the collection is presented in a new window.

MUFIN Find Similar Plugin, version 1.1

Related publications

  • D. Novak, M. Batko, P. Zezula: Generic similarity search engine demonstrated by an image retrieval application. In Proceedings of SIGIR ’09 (p. 840). ACM Press, 2009. (description of the demo)
  • M. Batko, F. Falchi, C. Lucchese, D. Novák, R. Perego, F. Rabitti, J. Sedmidubský, P. Zezula: Building a Web-scale Image Similarity Search System. Multimedia Tools and Applications vol. 47(3), 2010. (origin of the CoPhIR data collection)
  • M. Batko, P. Kohoutkova, D. Novak: CoPhIR Image Collection under the Microscope. In Proceedings SISAP 2009 (pp. 47–54). IEEE Computer Society, 2009. (analysis of the CoPhIR data collection)
  • D. Novák, M. Batko, P. Zezula: Metric index: an efficient and scalable solution for precise and approximate similarity search. Information Systems vol. 36(4), 2011. (the similarity search index)