Reproducible Experiments On Learned Metric Index

This page provides the necessary SW and data to reproduce the experiments conducted in the paper: Antol, Matej, Oľha, Jaroslav, Slanináková, Terézia a Dohnal, Vlastislav. Learned metric index – proposition of learned indexing for unstructured data. Information Systems. Elsevier, 2021, vol. 100, no. 1, pp. 101774-101786. ISSN 0306-4379. doi:10.1016/j.is.2021.101774.

Software

Git repository with SW containing necessary scripts is here. You may also download a ZIP archive of it here (commit SHA: bc9c1cd81ecaab9c25130bbe5504b858e44183b9).

Datasets

The gzipped tar file (9 GB) contains subsets of CoPhIR, Profimedia, and HDM? datasets. They contain 100k objects or 1mio objects.

Queries

The gzipped tar file (14 MB) contains sets of queries for each of the datasets. The queries were selected …

Ground Truths

The gzipped tar file (668 MB) contains answers to kNN queries (ground truth) for each of the query objects (see the previous section).

The format is JSON, where each query object ID has a JSON object associated, whose content is pairs of an answer object ID and its distance from the query object.

Miscelaneous

Labels gzipped tar file (699 MB) contains…

Pivots gzipped tar file (2 MB) contains…

Test gzipped tar file (40 KB) contains…