D-index: Distance Index

D-index is an index structure which is able to search in data modelled as a metric space. This structure is static with respect to the number of buckets and levels, so the split functions must be designed before the D-index is instantiated and loaded with data. When this prerequisite is fulfiled, the D-index is capable of storing nearly "unlimited" number of data objects due to the elasticity of individual buckets. In particular, buckets are capable of storing theoretically any amount of data.

eD-index: Extended distance index for similarity joins

eD-index is an index structure based on the original D-index however it efficiently evaluates similarity self-join queries. A similarity self-join query returns all pairs of objects of a dataset the distance between which is less than or equal to a threshold value mu (:mju).

eD-index is very efficient, so it can be utilized even for one-time run of join query, e.g. for data cleansing purposes. The costs to build the index structure is ammortized by the efficient evaluation of the join query.

The implementation of D-index includes eD-index too. It is only the sake of initialization parameters. In our papers, we compared eD-index with a naive approach that runs a series of range queries with r=mu for each object stored within the D-index. Such an evaluation is also available in the code.

Starting Points

Contact Information

Please, do not hesitate to contact me.

Vlastislav Dohnal, Faculty of Informatics, Masaryk University, Brno, Czech Republic
dohnal(at)fi(dot)muni(dot)cz (in Czech only) or

Bug Reports / Feature Requests

Bugs reported and features requested are reported and managed here.

If you have bumped to any issue, create a new ticket by clicking on the new ticket toolbar button.

Related Projects

Last modified 9 years ago Last modified on Sep 20, 2012, 2:42:50 PM