The broad scope of DISA is to explore theories and technologies for next generation similarity search. This should be able to deliver relevant information effectively and efficiently to interested parties in the presence of exponentially growing volume, variety, and velocity of digital data. The DISA objectives are foundational in nature – they address the theoretical limits of similarity search in context of the Big Data Problem, considering the multimedia data as the primary proof of concept platform. The laboratory includes not only active researchers working on projects, but closely cooperates with students and disseminates results in practice. The topics and objectives of the DISA lab can be summarized as follows:
  • databases and data processing
  • similarity indexing and searching
  • content-based multimedia searching
  • Big Data Phenomenon
  • processing of biometric data (face recognition, movement characteristics)

Big Data Problem

The complexity of next-generation retrieval systems originates from the requirement to organise massive and ever growing volumes of heterogeneous data and meta-data, together with the need to provide distributed management prevalently based on similarity matching. The problem starts with data acquisition of weakly structured or completely unstructured data, such as images and video, which necessarily need innovative techniques for information extraction and classification to increase their findability. In principle, we consider search and object findability as two principle and synergic aspects of retrieval. They both pose the effectiveness and efficiency challenges which need innovative theories and technologies, and must be studied together to converge to qualitatively new retrieval tools of the future. Fundamental to our approach is the development of scalable solutions.

Multimedia Retrieval

Multimedia Retrieval in general is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text and bio-signals as well as not perceivable sources such as bio-information, stock prices, etc. We mainly concentrate on application of similarity searching in collections of multi-modal information extractions, specializing in image, video, as well as face and motion biometric characterizations. We are also interested in cross-media searching and image annotation problems.