Intelligent Systems for Complex Data Research Group
We find patterns in data and mine information from complexity
Research & Research Areas
We are a team of researchers at Masaryk University in Brno, Czech Republic, specializing in complex data analysis. As part of the DISA laboratory and in close collaboration with the CERIT-SC centre, we aim to discover patterns within vast amounts of complex data.
We tackle unique challenges ranging from exploring patterns in images to the intricate analysis of complex biological structures such as proteins. Our ambition is to redefine the boundaries of effective and efficient processing of large datasets by leveraging our proficiency in machine learning, data mining, and clustering techniques.
We are constantly looking for new members, postdocs, students for bachelor's and master's theses and more!
We've introduced a Learned Metric Index – an index for complex, unstructured or high-dimensional data built as a structure of machine-learning models.
We develop our own application for searching in proteins by their structural similarity called AlphaFind.
Adhering to the Open Science principles, we publish our work without restrictions, and the Learned Metric Index Framework is available online on GitHub.
Best Student Paper at SISAP
Miriama Jánošová and David Procházka received a "Best Student Paper" award at the Similarity Search and Applications 2021 conference for their work on Organizing Similarity Spaces using Metric Hulls.
Interview with Terézia
Terézia provided an interview for the EOSC CZ initiative about the experience during her Ph.D. studies, Open Science and Open Source, and most importantly about research reproducibility in light of our LMI reproducibility publication.
Pre-print of AlphaFind
We have published a bioRxiv pre-print of AlphaFind - a search engine for protein structure similarity indexing the whole AlphaFold DB (214M proteins). We're currently awaiting the second round of reviews in the Nucleic Acid Research journal.
In our team, we strive to develop cooperation with other research teams from both Czechia and abroad. Our most prominent partners are:
Information Systems and Data Mining Research Group, CAU University of Keil, Germany. Together with Prof. Dr. Peer Kruger and his group, we investigate modern techniques for indexing, searching and solving (reverse) kNN data retrieval techniques.
Biological Data Management and Analysis Core Facility of the Central European Institute of Technology. Together, we work on the application of learned indexing to searching in protein structures produced by Alphafold.
Loading map…
2024
2023
-
Reproducible experiments with Learned Metric Index Framework
Information systems, year: 2023, volume: 118, edition: 1, DOI
-
SISAP 2023 Indexing Challenge – Learned Metric Index
Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289, year: 2023
2022
-
Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques
Year: 2022
-
Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques
Similarity Search and Applications, 15th International Conference, SISAP 2022, Bologna, Italy, October 5–7, 2022, Proceedings, year: 2022
2021
-
Data-driven Learned Metric Index: an Unsupervised Approach
14th International Conference on Similarity Search and Applications (SISAP 2021), year: 2021
-
Learned metric index - proposition of learned indexing for unstructured data
Information Systems, year: 2021, volume: 100, edition: 101774, DOI
-
Metric hull as similarity-aware operator for representing unstructured data
Pattern Recognition Letters, year: 2021, volume: 149, edition: September 2021, DOI
-
Organizing Similarity Spaces using Metric Hulls
14th International Conference on Similarity Search and Applications (SISAP 2021), year: 2021
doc. RNDr. Vlastislav Dohnal, Ph.D.
Group co-lead
Vlasta is and associate professor at the Faculty of Informatics, Masaryk University, and one of the founding members of the Laboratory of Data Intensive Systems and Applications (DISA). He has a long history in research areas of data management of unstructured data for content-based retrieval, and similarity analytics. He is the co-author of about 40 research publications and a seminal book on similarity searching using metric spaces.
RNDr. Matej Antol, Ph.D.
Group co-lead
Research activities conducted by Matej span from core research topics on optimizing index structures and the querying process in metric similarity search to research applications regarding the processing and managing research data, managing sensitive data and working with life-science-specific data and tools such as genomic or protein data. As an executive director of the national large research infrastructure CERIT-SC, he is responsible for integrating national e-infrastructure e-INFRA CZ. He is also one of the national leaders of Czech efforts towards the adoption of Open Science principles and implementation of EOSC in the Czech republic.
Outside of science, he is interested in the world of finance and investing, sings and plays sax and guitar in a small garage band, and, in winter times, enjoys playing squash and skiing.
RNDr. Terézia Slanináková | Ph.D. Student
Terka's Ph.D. study is dedicated to exploring how best to apply machine learning, and more specifically learned indexing into the realm of complex data for the purposes of fast search.
Simultaneously, she is a researcher at Tom Rebok's research group at CERIT-SC, where she uses machine learning to solve real-world problems and co-leads a project focused on creating a national platform for analyzing geospatial data. She is passionate about reproducible research, open source software, and leading bachelors/masters students.
When she's not researching, she likes spending time outdoors by doing virtually any sport, meeting with friends, or putting together a good dish.
Mgr. et Mgr. Jaroslav Oľha | Ph.D. Student
Jaroslav holds two master's degrees in computer science and in biochemistry, and he has published research in the areas of high-performance computing, similarity searching, and computational chemistry. He is currently finishing his dissertation thesis on HPC kernel autotuning, but his research focus is shifting towards organization and analysis of complex data, particularly data generated by various life sciences.
He likes to spend his spare time with his mischievous toddler, as well as playing the guitar and the bass in a few bands, painting miniatures, playing board games and running the occasional marathon.
Mgr. David Procházka | Ph.D. Student
David focuses on producing high-quality research at the intersection of data indexing, similarity search, machine learning, and motion processing while integrating good software engineering practices. He began his academic journey with an award-winning undergraduate thesis on indexing metric spaces using metric hulls and has since made several contributions to the field of human motion classification. A long and successful cooperation with assoc. prof. Vlastislav Dohnal culminated in David's enrollment in a Ph.D. program under his supervision.
Fueled by collaboration with bright minds, David strives to elevate those around him and develop scalable solutions for searching complex data. When not pushing the boundaries of current knowledge, he enjoys good movies and has a deep appreciation for simple yet elegant design.
Mgr. Miriama Jánošová | Ph.D. Student
Miriama's PhD studies are concerned with enhancing the field of analytics for unstructured data, especially finding convenient representations for metric regions and their applicability. She is also involved in mocap data research for rehabilitation therapy. In recent years, she has been responsible for designing exercises for a student's Data Warehousing project.
Apart from her studies, she works as a software engineer at Ataccama. She is responsible for developing parsers of SQL-like technologies and extracting data lineage from various BI reporting tools.
In her free time, she trains her dog Brutus, enjoys pilates or spends time with her close friends. Besides, she is enthusiastic about preparing some exotic dishes.
Young IT professional with broad experience in technical aspects of research infrastructures -- data storage, processing, AAI, cloud computing and more. He is also a PhD student focused on the management of scientific data.
Katarína is a Genomics and Deep Learning PhD student with a background in computer science and bioinformatics, specializing in using Machine Learning to model small RNA binding rules.
Starting his career journey in computer science, Jakub thrives on leveraging his extroverted nature to uncover nuanced narratives hidden in data, remolding it into tangible and expedient solutions.
Bc. Lucie Novotná
Lucie is finishing her master's degree in Artificial intelligence and data processing, with her thesis focused on similarity searching of protein structures. She works as a software engineer at Red Hat.
Bc. Jakub Žovák
A data scientist specializing in vector databases, he is writing a thesis on its role in similarity searching and related applications.
Interested in Our Research? Join Us!
We are searching for new colleagues for various positions who would work with us on exciting projects, develop unique software and solve unconventional problems. In case of interest, contact is at dohnal(at)fi.muni.cz or antol(at)muni.cz.