Data Integration / Analytic Discovery / Intelligent Systems

We build data systems that turn complex information into actionable insight.

The DANAIS (Data Nexus for Analytics & Intelligent Systems) Lab develops data and AI systems that turn large, complex datasets into reliable insight. Our research spans data management, analytics, and intelligent systems, with a focus on scalable and trustworthy technologies that support data exploration and decision-making. By bridging theory and real-world applications, DANAIS enables data-driven intelligence across scientific, industrial, and societal domains.

6 Active projects
2017 Founded

News

1. One paper has been accepted to MLSys 2026. Jan 30, 2026
2. One paper has been accepted to AAAI 2026. Nov 10, 2025
3. Two papers have been accepted to ADC 2025. Oct 28, 2025
4. Two papers have been accepted to ICDE 2026 (Round 1). Oct 20, 2025

Projects

Selected projects from our research program.

Automated Database Tuning

This project focuses on automatically optimizing database performance, reducing reliance on costly expert intervention. We investigate both physical design tuning (e.g., indexes and materialized views) and configuration tuning, leveraging techniques such as multi-armed bandits and large language models.

[ ICDE'21 , ICDM'21 , VLDB'22 , TKDE'23 , KAIS'23 , ICDM'24 , SIGMOD'26 , MLSys'26 ]

(Machine) Learned Indexes

In this line of work, we exploit data distributions to construct compact, ML-based indexes for faster and more efficient data retrieval. We study the efficiency and practicality of existing learned index models and propose new algorithms suitable for on-disk deployment, a critical requirement for industrial adoption. This line of work also includes learned spatial indexes and spatial query processing, including semantic search.

[ ADC'20 , SIGMOD'23 , ICDE'24 , VLDB'25 , ADC'25 x 2 , ICDE'26 ]

Semantics-Driven Prefetching

This project investigates how to anticipate and preload relevant data to reduce query latency and improve exploration efficiency. We leverage data and query semantics to predict complex access patterns that arise during exploratory data analysis, particularly in scientific domains.

[ VLDB'24 ]

Cardinality Estimation

This project aims to predict query result sizes in order to improve query planning and execution performance. Our approach leverages copulas from statistical machine learning theory to model complex data correlations more accurately.

[ VLDB'25 , ICDE'26 ]

Predicting Next Actions in Data Exploration

This project focuses on learning and recommending relevant next steps during interactive data exploration, guiding users through complex analysis workflows and reducing the cognitive and technical burden of exploratory analysis.

[ CIKM'25 ]

Learning-Based Traffic Optimisation

This line of work develops learning-based methods for traffic modelling, prediction, and optimisation in intelligent transportation systems. We leverage data-driven and machine learning techniques to support traffic management, routing, and decision making under dynamic and uncertain conditions.

[ SIGSPATIAL'20 , ECML/PKDD'20 , SIGSPATIAL'22 x 3 , IV'22 , TIST'23 , PAKDD'24 , SIGSPATIAL'25 ]

People

Current Members

Guanli Liu

Guanli Liu

Postdoc (2024-)

I am a Postdoctoral Research Fellow at the University of Melbourne, working on AI for databases— including spatial indexing, reinforcement learning-based query optimization, and LLM applications. My recent research explores the use of space-filling curves for efficient spatial data organization and cost modeling Previously, I was a Senior Software Engineer at Baidu, and received my M.S. and B.Eng. degrees from Northeastern University in China.
Nadeeshan Dissanayake

Nadeeshan Dissanayake

PhD Student (2024-)

Co-supervised with S. Karunasekera and E. Tanin.

Intelligent Intersection Management for Improved Traffic Safety: Unpredictable pedestrian behaviour, such as jaywalking, poses a major challenge for safe and efficient traffic signal control at urban intersections. This project aims to fully automate adaptive traffic signal management using deep reinforcement learning with an explicit probabilistic model of pedestrian uncertainty Our approach moves beyond fixed, vehicle-centric strategies and continuously adapts to evolving pedestrian behaviours that exploit signal timings to minimise their own travel time. By modelling interactions between pedestrians, traffic signals, and connected autonomous vehicles, the proposed framework learns robust control policies that improve both intersection efficiency and safety. This research establishes a new foundation for intelligent, pedestrian-aware traffic management in next-generation smart cities.
Dinuka de Zoysa

Dinuka de Zoysa

PhD Student (2024-)

Co-supervised with J. Bailey.

David Adams

David Adams

PhD Student (2024-)

Co-supervised with N. Lipovetzky.

Automated Scientific Insight Discovery with Goal Directed AI. Exploratory Data Analysis (EDA) is essential for uncovering insights in large-scale datasets, yet current systems disrupt analytical workflows by requiring explicit feedback through tuple rankings and example validation This project introduces implicit recommendation systems that understand analyst intent through natural exploration behavior, observing queries, interactions, and exploration sequences without interrupting cognitive flow. Our approach employs a novel multi-dimensional interestingness contribution scheme combining association rule mining, diversity-based summarization, and temporal behavioral modeling to predict analytical interest. Unlike similarity-based systems, our framework simultaneously captures structural patterns, distributional anomalies, temporal trends, and compositional relationships, adapting in real-time as goals evolve. Validation on SDSS astronomical query logs and SIMBA benchmarks demonstrates consistent improvements over single-objective approaches, particularly for complex analytical workloads. This vision addresses fundamental challenges in real-time interest adaptation, interpretability, and cross-domain transfer for database exploration.
Lankadinee Rathuwadu

Lankadinee Rathuwadu

PhD Student (2023-)

Co-supervised with C. Leckie.

Wentao Gao

Wentao Gao

PhD Student (since 2023)

Co-supervised with T. Pham and M. Fu.

Fuzzing and Human-in-the-loop fuzzing. My research addresses the coverage-plateau problem in grey-box fuzzing by leveraging query-based code databases to introspect fuzzing results By identifying and bypassing fuzz blockers that impede discovery, I aim to make vulnerability detection more effective. Furthermore, I integrate Large Language Models (LLMs) to automate blocker analysis, reducing manual overhead and advancing the efficiency of human-in-the-loop fuzzing workflows.
Dimuthu Kariyawasan

Dimuthu Kariyawasan

PhD Student (2022-)

Co-supervised with S. Karunasekera.

Farzaneh Zirak

Farzaneh Zirak

PhD Student (2022-)

Co-supervised with F. Choudhury.

Semantics-Based Learning Approaches for Efficient and Interactive Data Exploration. My research focuses on making database systems faster and more robust by using data and query semantics— information about what queries touch and why they tend to access certain data together—to improve prefetching and caching decisions Across my PhD projects, I developed SeLeP to learn semantic signals from data accessed by the queries for more accurate prefetching, and GrASP to generalize semantic prefetching to more realistic mixed read/write workloads where access patterns and data evolve over time. Building on this foundation, I continue to bring the same semantic awareness into cache eviction and to refine semantics-based prefetching and caching techniques that adapt as workloads change. Overall, my work turns semantic signals from queries into actionable storage decisions that reduce I/O stalls and improve end-to-end query latency.
Yiyan Li

Yiyan Li

Visiting PhD Student

Research on Intelligent Database Performance Tuning. My research focuses on intelligent and automated database performance tuning, aiming to reduce the reliance on manual expertise in configuring complex database systems I study how machine learning and large language models can be used to understand workloads, system states, and configuration interactions, and to make adaptive tuning decisions in a principled manner. By modeling and automating the iterative reasoning process traditionally performed by human DBAs, my work seeks to improve database performance across diverse workloads while lowering tuning cost and improving robustness. Ultimately, this research aims to build practical, learning-based tuning systems that can generalize across benchmarks and real-world environments.

Alumni

Publications

2026

MLSys Practical Adversarial Multi-Armed Bandits with Sublinear Runtime. K. Overgaard Mortensen, A. B. Bainson, M. R. Tversted, K. S. Graem, R. Borovica-Gajic, A. Paudice, D. Mottin, and P. Karras. (to appear)
AAAI Scalable Semi-supervised Community Search via Graph Transformer on Attributed Heterogeneous Information Networks. L. Ding, Z. Zhao, M. Li, Y. Pan, X. Wang, R. Borovica-Gajic. PDF (to appear)
SIGMOD AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Y. Li, H. Li, J. Zhang, R. Borovica-Gajic, S. Wang, J. Chen, R. Shi, C. Li, H. Chen. PDF (to appear)
ICDE CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF. L. Rathuwadu, G. Liu, C. Leckie, and R. Borovica-Gajic. PDF (to appear)
ICDE Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts. G. Liu, R. Borovica-Gajic, H. Lan, and Z. Bao. PDF (to appear)

2025

KBS Adaptive anchor-based attention networks for large-scale sparse bipartite graph embedding. L. Ding, Y. Han, M. Li, N. Cui, X. Wang, and R. Borovica-Gajic. Link BibTex
CIKM ExplorAct: Context-Aware Next Action Recommendations for Interactive Data Exploration. D. De Zoysa, J. Bailey and R. Borovica-Gajic. PDF Link
ADC Advancing Spatial Keyword Queries: From Filters to Unified Vector Embeddings. K. Gocmen, G. Liu and R. Borovica-Gajic. PDF (to appear)
ADC LLM-Enhanced Processing of Complex Spatial Queries. R. Hao, G. Liu and R. Borovica-Gajic. PDF (to appear)
SIGSPATIAL TrajNS: Numerical and Semantic Modeling Framework for Realistic and Controllable Trajectory Generation. D. Lakmal, R. Borovica-Gajic and S. Karunasekera. PDF (to appear)
Encyclopedia of GIS Traffic Simulators. N. Dissanayake, R. Borovica-Gajic, E. Tanin and S. Karunasekera. PDF (to appear)
VLDB Cardinality Estimation for Similarity Search on High-Dimensional Data Objects: The Impact of Reference Objects. H. Lan, S. Huang, Z. Bao, and R. Borovica-Gajic. PDF Link BibTex
VLDB Efficient Cost Modeling of Space-filling Curves. G. Liu, L. Kulik, C. Jensen, T. Li, R. Borovica-Gajic, and J. Qi. PDF Link BibTex

2024

ICDM Warm-Starting Contextual Bandits under Latent Reward Scaling. B. Oetomo, M. Perera, R. Borovica-Gajic, and B. I. P. Rubinstein. PDF Link BibTex
PAKDD Spatial-Temporal Bipartite Graph Attention Network for Traffic Forecasting. D. Lakmal, K. Perera, R. Borovica-Gajic, and S. Karunasekera. PDF Slides Link BibTex Code
VLDB SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads. F. Zirak, F. Choudhury, and R. Borovica-Gajic. PDF Slides Link BibTex Code
ICDE A Fully On-disk Updatable Learned Index. H. Lan, Z. Bao, S. Culpepper, R. Borovica-Gajic, and Y. Dong. PDF Slides Link BibTex Code
aiDM Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM). R. Bordawekar, O. Shmueli, Y. Amsterdamer, R. Borovica-Gajic, and D. Firmani. PDF Link BibTex
KBS Maximal size constraint community search over bipartite graphs. M. Li, R. Borovica-Gajic F. Choudhury, N. Cui, L. Ding. Link BibTex

2023

SIGMOD Record Reminiscences on Influential Papers. R. Borovica-Gajic. PDF Link BibTex
TKDE No DBA? No regret! Multi-armed bandits for index tuning of analytical and HTAP workloads with provable guarantees. M. Perera, B. Oetomo, B. I. P. Rubinstein, and R. Borovica-Gajic. PDF Link BibTex
TIST Real-time Road Network Optimization with Coordinated Reinforcement Learning. U. Gunarathna, H. Xie, E. Tanin, S. Karunasekera, and R. Borovica-Gajic. PDF Link BibTex
SIGMOD Updatable Learned Indexes Meet Disk-Resident DBMS - From Evaluations to Design Choices. H. Lan, Z. Bao, S. Culpepper, and R. Borovica-Gajic. PDF Slides Link BibTex Code
DBML@ICDE Efficient Index Learning via Model Reuse and Fine-tuning. G. Liu, J. Qi, L. Kulik, K. Soga, R. Borovica-Gajic, and B. I. P. Rubinstein. PDF Slides Link BibTex
KAIS Cutting to the Chase with Warm-Start Contextual Bandits. B. Oetomo, M. Perera, R. Borovica-Gajic and B. I. P. Rubinstein. PDF Link BibTex

2022

VLDB HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning. M. Perera, B. Oetomo, B. I. P. Rubinstein, and R. Borovica-Gajic. PDF Slides Link Video BibTex Code
Dagstuhl Reports 12(3) Database Indexing and Query Processing (Dagstuhl Seminar 22111). R. Borovica-Gajic, G. Graefe, A. Lee, C. Sauer, and P. Tozun. PDF Link BibTex
CSUR Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards. R. Muralidhar, R. Borovica-Gajic, and R. Buyya. PDF Link BibTex
DEBS A Multi-level Caching Architecture for Stateful Stream Computation. T. Islam, R. Borovica-Gajic, and S. Karunasekera. PDF Slides Link Video BibTex (Best Student Paper)
SIGSPATIAL Dynamic Graph Combinatorial Optimization with Multi-Attention Deep Reinforcement Learning. U. Gunarathna, R. Borovica-Gajic, S. Karunasekera, and E. Tanin. PDF Slides Link Video BibTex
SIGSPATIAL e-SMARTS: A System to Simulate Intelligent Traffic Management Solutions (Demo Paper). U. Gunarathna, R. Borovica-Gajic, S. Karunasekera, and E. Tanin. PDF Slides Link Video BibTex
SIGSPATIAL A Simulation Study on Prioritizing Connected Freight Vehicles at Intersections for Traffic Flow Optimization (Industrial Paper). H. Xie, R. Borovica-Gajic, E. Tanin, S. Karunasekera, U. Gunarathna, G. Oppy, and M. Sarvi. PDF Slides Link Video BibTex
IV Real-Time Intelligent Autonomous Intersection Management Using Reinforcement Learning. U. Gunarathna, S. Karunasekera, R. Borovica-Gajic, and E. Tanin. PDF Slides Link BibTex
IJGIS Can you fixme? An intrinsic classification of contributor-identified spatial data issues using topic models. R. C. Sundaram, E. Naghizade, R. Borovica-Gajic, and M. Tomko. PDF Link BibTex

2021

ICDM Cutting to the Chase with Warm-Start Contextual Bandits. B. Oetomo, M. Perera, R. Borovica-Gajic and B. I. P. Rubinstein. PDF Slides Link Video BibTex
ICDE DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. M. Perera, B. Oetomo, B. I. P. Rubinstein, and R. Borovica-Gajic. PDF Slides Link Video BibTex Code

2020

SIGSPATIAL Highly Efficient and Scalable Multi-hop Ride-sharing. Y. Xu, L. Kulik, R. Borovica-Gajic, A. Aldwyish, and J. Qi. PDF Slides Link BibTex
ECML/PKDD Real-time Lane Configuration with Coordinated Reinforcement Learning. U. Gunarathna, H. Xie, E. Tanin, S. Karunasekera, and R. Borovica-Gajic. PDF Slides Link BibTex
SSDBM GeoPrune: Efficiently Matching Trips in Ride-sharing Through Geometric Properties. Y. Xu, J. Qi, R. Borovica-Gajic, and L. Kulik. PDF Slides Link BibTex
ICDE CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs. M. Li, F. M. Choudhury, R. Borovica-Gajic, Z. Wang, J. Xin, and J. Li. PDF Slides Link BibTex
TGIS Harnessing spatio-temporal patterns in data for nominal attribute imputation. R. C. Sundaram, E. Naghizade, R. Borovica-Gajic, and M. Tomko. PDF Link BibTex
ADC Function Interpolation for Learned Index Structures. N. F. Setiawan, B. I. P. Rubinstein, and R. Borovica-Gajic. PDF Slides Link BibTex (Best Student Paper Honorable Mention)

2019

(Book chapter) Database System Concepts, 7th edition, by Silberschatz, Korth and Sudarshan Chapter 32 on PostgreSQL. R. Borovica-Gajic and I. Alagiannis. PDF Bibtex
CACM The five minute rule thirty years later and its impact on the storage hierarchy. R. Appuswamy, R. Borovica-Gajic, G. Graefe, and A. Ailamaki. PDF Link BibTex
arXiv:1902.07500 A Note on Bounding Regret of the C2UCB Contextual Combinatorial Bandit. B. Oetomo, M. Perera, R. Borovica-Gajic, and B. I. P. Rubinstein. PDF Link BibTex

2018

VLDB Journal Smooth Scan: Robust Access Path Selection without Cardinality Estimation. R. Borovica-Gajic, S. Idreos, A. Ailamaki, M. Zukowski and C. Fraser. PDF* Link BibTex
DASFAA Finding All Nearest Neighbors with a Single Graph Traversal. Y. Xu, J. Qi, R. Borovica-Gajic, and L. Kulik. PDF Slides Link BibTex

Contact

School of Computing and Information Systems, Faculty of Engineering and Information Technology, The University of Melbourne.

Email

renata [dot] borovica [at] unimelb.edu.au

Join Us

Open positions and how to apply.

Location

700 Swanston Street, Carlton, Melbourne.