The Image Similarity Search Demo combines deep convolutional neural networks with our latest metric index to provide a unique user experience on a very large image collection.Â
Image collection and preprocessing
The demo is built on 20 million high quality images from the Profiset collection.
Each image has been preprocessed by a deep convolutional neural network provided by the Caffe framework. The neural net model has been trained on different data and we do not apply any training, but we use output from the last hidden layer of the network as a powerful image descriptor. Each of these descriptors is a 4096-dimensional float vector; altogether, the descriptors are over 320GB of uncompressed descriptor data. See the following publication for details on this approach:
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In Proceedings of the International Conference on Machine Learning, pages 647–655, 2014.
Indexing and searching
The neural network descriptors are organized by similarity index PPP-Codes which stores the descriptor data in a disk key-value store (in our case, its about 124GB of compressed descriptor data). Given a query object (image), the memory part of the index is exploited to generate a candidate set of image IDs; this relatively small candidate set of descriptors is than loaded from the disk and refined to obtain the final result of the similarity query.
Related publications
- Novak, D., Batko, M., & Zezula, P. (2015). Large-scale Image Retrieval using Neural Net Descriptors. In Proceedings of SIGIR ’15. (description of the demonstration)
- Novak, D., & Zezula, P. (2014). Rank Aggregation of Candidate Sets for Efficient Similarity Search. In Database and Expert Systems Applications: 25th Internation Conference, DEXA 2014.LNCS Vol. 8645, pp. 42-58. Springer. (Best paper of DEXA 2014, description of the index)
- Novak, D., Cech, J., & Zezula. P. (2015). Efficient Image Search with Neural Net Features. In Proceedings of SISAP ’15. (performance analysis of the index on this data collection)