Profiset Image Collection

Profiset is a collection of 20M high-quality images with rich and systematic annotations, which were obtained from Profimedia, a web-site selling stock images produced by photographers from all over the world. For each image, we have extracted visual descriptors that can be used to search the images by content. Each entry in the dataset consists of the following information:

  • a thumbnail image;
  • a link to the corresponding page on the Profimedia web-site;
  • two types of image annotation: a title (typically 3 to 10 words) and keywords (about 20 keywords per image in average) mostly in English (about 95 %);
  • DeCAF7 descriptor extracted from the original image content using deep neural network.
  • five MPEG-7 visual descriptors extracted from the original image content: Scalable Color, Color Structure, Color Layout, Edge Histogram and Region Shape.

To take a look at the collection, you can use the public demo and try our content-based searching in the images. If you will download and use the Profiset collection for research purposes, please, reference the following paper:

Budikova, P., Batko, M., and Zezula, P. (2011). Evaluation Platform for Content-based Image Retrieval Systems. In Proceedings of International Conference on Theory and Practice of Digital Libraries 2011, LNCS 6966, pages 130-142, Berlin: Springer. ISBN 978-3-642-24468-1.

If you use the DeCAF descriptors for efficiency evaluation, you can reference the following paper:

Novak, D., Batko, M., and Zezula, P. (2015). Large-scale Image Retrieval using Neural Net Descriptors. In Proceedings of SIGIR ’15, pages 1039-1040, ACM New York, NY, USA. ISBN 978-1-4503-3621-5.

Profiset download

The Profiset collection, test topics and ground truth can be downloaded and freely used for research purposes. Interested parties are only required to accept the following usage agreement and fill in some basic data about themselves. These will be used only for our evidence and will not be made available to any other party. After registration, the download access instructions will be mailed to the provided email address.

Profiset usage agreement

The Profiset collection has been created by the DISA research group from the Faculty of Informatics, Masaryk University, Brno, Czech Republic (the Holder), with the kind permission of the Profimedia company, the owner of the data. The resources contained in Profiset collection (dataset, query topics and ground truth) are made accessible to the User by the Holder who reserves the right to permit such access upon acceptance by the User of the terms and conditions below, for the exclusive purpose of conducting scientific experiments (e.g. on new search methods, on automatic classification of contents, etc.), and excluding any other use. The copyright on the images and any other related right, belong to the Authors of the images and, therefore, reproduction, communication to the public, rendering publicly accessible, leasing, lending, public execution and diffusion without the prior authorisation of the Author is forbidden.

The terms and conditions of the Profiset collection use are the following:

  • The User is permitted to use the Profiset data to develop prototypes and demonstrators in the context of its scientific research, and therefore, may allow third parties to use such prototypes and demonstrators, thus viewing part of the Profiset contents. Any instrument or software developed by the User in the context of its scientific experimentation must have the same purpose as specified herein and be subject to the restrictions set forth under this agreement. The User will be personally liable to the Holder for any non-authorised use of the Profiset data. In no case may the Profiset data be used for commercial, advertising and/or promotional purposes.
  • The User undertakes to communicate the provisions of this agreement to any eventual external user of the data and will be personally liable for any improper use.
  • As the User is allowed to access a reduced-size version of the photographic image, the so-called thumbnail, the User recognizes that such image is subject to copyright and that any moral or economic right belongs to the Author of the image, which can be visualized in its original size through a surface hyperlink to the corresponding web page.
  • Any application using the Profiset data and any research paper referring to experiments over the Profiset data has to acknowledge the Profimedia company. In case of applications, the Profimedia logo (available here) has to be part of the user interface.
  • The User is granted access to the Profiset collection using a user id and password. In the case of theft or loss of the user id and/or password, the User undertakes to promptly inform the Holder who will block access to the data. The User declares that the personal data made available to the Holder for the activation of the service (including the data relating to its representatives) are true, complete and updated, and undertakes for the entire duration of this agreement to communicate any modification of the same.

User registration form

Company or Institution (required):
Contact person (required):
Email (required):
Purpose (required):
I accept the Profiset usage agreement