Skip to main content

AvantGraph: the Next-Generation Graph Analytics Engine


SCILAKE TECHNICAL COMPONENTS

AvantGraph: the Next-Generation Graph Analytics Engine

By Stefania Amodeo

In a webinar for SciLake partners,Nick Yakovets, Assistant Professor at the Department of Mathematics and Computer Science, Information Systems WSK&I at Eindhoven University of Technology (TU/e), introducedAvantGraph, a next-generation knowledge graph analytics engine.Yuanjin Wu andDaan de Graaf, graduate students in Nick's research group, presented a demo of the tool. 

Developed by TU/e researchers, AvantGraph aims to provide a unified execution platform for graph queries, supporting everything from simple questions to complex algorithms. In this blog post, we will delve into the philosophy behind AvantGraph, its query processing pipeline, and its impact on graph analytics.

 

The Philosophy: Questions over Graphs

The fundamental purpose of a database is to answer questions about data. For a graph database like AvantGraph, the focus is on asking questions over graphs. We can categorize these questions based on theirexpressiveness, and the degree to which databases canoptimize their execution. Expressiveness refers to the richness and difficulty of the questions being asked, while optimization refers to how easy or difficult it is for databases to answer these questions. Based on this categorization, the range of questions that can be asked over graphs varies in complexity, as shown in the graphic below, from simple local look-ups to general algorithms that introduce iterations:

  • Local look-ups (e.g., the properties of data associated with a full text)
  • Neighborhood look-ups
  • Subgraph isomorphism (matching specific patterns of the graph)
  • Recursive path queries (introducing connectivity)
  • General algorithms (introducing iterations, e.g.,PageRank)

Optimization level as a function of questions’ complexity.

AvantGraph aims to cover this full spectrum of questions, allowing users to optimize the execution of their queries and explore the richness of their data. It utilizes cutting-edge technologies to enable efficient processing of very large graphs on personal laptops.

AvantGraph Query Processing Pipeline

AvantGraph query processing pipeline, adapted fromDOI:10.14778/3554821.3554878 

AvantGraph employs a standard database pipeline. It supports query languages likeCypher andSPARQL, and it features three additional main components to enable the execution of complex questions like algorithms:

  • theQuickSilver execution engine, a multi-thread execution system allowing for efficient query parallelization and hardware utilization;
  • the Magellan Planner, a query optimizer that returns efficient execution plans tailored to each query, taking into account the recursive and iterative nature of graph queries;
  • theBallPark cardinality estimator, a cost model that determines the best execution plan for different circumstances, optimizing query performance.

In addition, AvantGraph supports secondary storage, utilizing both memory and disk effectively. This allows it to process very large graphs on laptops without requiring excessive amounts of RAM.

Preparations for SciLake Pilots

As part of the SciLake project, AvantGraph is being extended with powerful data analytics capabilities and novel technologies to support research communities in defining graph algorithms.

Why do we need it?

Graph query languages such as Cypher or SPARQL are specifically designed for "subgraph matching". This makes them highly effective when you need to retrieve information such as "get me the neighbors of a specific node" or "find the shortest path between two nodes in the graph". However, unfortunately, these query languages are too limited for complex graph analytics like e.g., PageRank.

Traditional solutions to this issue involve the database vendor providing a library of built-in algorithms that can be applied to the graph. While this works well if the library includes the algorithm needed to solve the problem, it cannot accommodate simple variations or fully custom algorithms.

What AvantGraph offers

AvantGraph introducesGraphalg, a programming language designed specifically for writing graph algorithms. Graphalg is fully integrated into AvantGraph, meaning, for example, that it can be embedded into Cypher queries.

The language used in Graphalg is based on linear algebra, which makes the syntax and operations easy to learn. The goal for Graphalg is to be a high-level language that is both user-friendly and efficiently executed by a database. This is achieved by transforming queries and Graphalg programs into a unified representation that can be optimized effectively. This enables optimizations that cross the boundary between query and algorithm, that would not otherwise be possible.

AvantGraph supports the client-server model, which is commonly used by most modern database engines, including Postgres, MySQL, Neo4j, Amazon Neptune, Memgraph, and more. This allows AvantGraph databases to be queried through more than just a Command Line Interface.

As of now, AvantGraph databases can be queried from most major programming languages, including Python API, and will be expanded in the future with more algorithms and functionalities.

Conclusion

AvantGraph represents a significant advancement in knowledge graph analytics. By addressing the limitations of traditional graph query languages and introducing Graphalg, AvantGraph empowers users to perform complex graph analytics with ease. Its unified execution of simple questions to general algorithms, coupled with its efficient query processing pipeline, makes it a valuable tool for researchers and data scientists. As AvantGraph continues to evolve and gain traction within the research community, we can expect to see exciting advancements in graph analytics and a deeper understanding of complex data relationships.

Learn more

AvantGraph is presented in:

Leeuwen, W.V., Mulder, T., Wall, B.V., Fletcher, G., & Yakovets, N. (2022). AvantGraph Query Processing Engine.Proc. VLDB Endow., 15, 3698-3701.

DOI:10.14778/3554821.3554878

For more information about AvantGraph and its publications, visit https://avantgraph.io/

AvantGraph will be released under an open license soon. To test its functionalities and perform graph queries, check out the docker container available on GitHub at https://github.com/avantlab/avantgraph/.

GraphAlg Presented at Dutch-Belgian Database Day 2025


Workshop

GraphAlg Presented at Dutch-Belgian Database Day 2025

By Stefania Amodeo

SciLake partner Daan de Graaf from TU Eindhoven presented his work on graph algorithm support in AvantGraph at the Dutch-Belgian Database Day (DBDBD) 2025, held on December 12, 2025, at the University of Antwerp.

His paper, titled Graph Algorithms for Everyone, Everywhere, introduced GraphAlg, an open-source project developed within SciLake, designed to make graph algorithms accessible to researchers, developers, and students. During the conference, Daan presented a poster and conducted live demos, allowing participants to interact with the GraphAlg tutorial and explore its capabilities firsthand.

About GraphAlg

GraphAlg is a domain-specific language for graph algorithms that compiles to relational algebra, making advanced graph analysis accessible directly within database systems.

GraphAlg is now open source, making it easier than ever to work with graph algorithms. We built it for researchers, developers, and students who want to dive into graph theory without the usual steep learning curve. The project includes hands-on tools and tutorials to help you get started quickly.

Key Resources

Congratulations to the TUe team on this achievement! We are proud to see our research making an impact in the database and graph algorithm communities.

SciLake at GRADES-NDA ’24


Workshop

SciLake at GRADES-NDA ’24

By Stefania Amodeo, Daan de Graaf

SciLake recently participated in the 7th Joint Workshop on Graph Data Management Experiences Systems (GRADES) and Network Data Analytics (NDA), held on June 14, 2024, in Santiago, AA, Chile. This prestigious event unites researchers from academia, industry, and government sectors worldwide to discuss and share the latest breakthroughs in large-scale graph data management and graph analytics systems. It also provides a platform to discuss novel methods and techniques to address domain-specific challenges in real-world graphs.

Daan de Graaf (TU/e) at GRADES-NDA ‘24

Daan de Graaf (TU/e) at GRADES-NDA ‘24


Our SciLake partner, Daan de Graaf, had the opportunity to present an accepted article on behalf of authors Wilco van Leeuwen, George Fletcher, and Nikolay Yakovets, all from Eindhoven University of Technology (TU/e). The team showcased "HomeRun", a tool specifically designed for comparing different cardinality estimation techniques in graph databases.

For those new to the topic, the cardinality of a graph database refers to the number of elements in a set, such as the number of edges connected to a node or the total number of nodes in the database. Accurate cardinality estimation is crucial for optimising the performance of queries, as it helps plan the most efficient way to retrieve data.

One of HomeRun's key features is its ability to evaluate the performance of different cardinality estimation techniques in given usage scenarios. The tool generates visualisations automatically, helping users understand the trade-offs between various techniques. This tool is particularly useful for database developers when they face performance issues, like long-running queries, with specific query and dataset combinations.

In SciLake, HomeRun is being used to optimise the database system performance in the context of the WP2 Data Lake Search and Navigation.

For more information about HomeRun, you can refer to the paper:

  • Wilco van Leeuwen, George Fletcher, and Nikolay Yakovets. 2024. HomeRun: A Cardinality Estimation Advisor for Graph Databases. In Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA '24). Association for Computing Machinery, New York, NY, USA, Article 6, 1–9. https://doi.org/10.1145/3661304.3661902 

SciLake at VLDB 2025 PhD Workshop


Workshop

SciLake at VLDB 2025 PhD Workshop

By Stefania Amodeo

Daan de Graaf from Eindhoven University of Technology recently presented his doctoral research at the VLDB 2025 PhD Workshop, held as part of the 51st International Conference on Very Large Data Bases in London, United Kingdom, from September 1–5, 2025. His work, conducted within the SciLake project, addresses the challenge of integrating graph algorithms into database systems.

VLDB is one of the premier international forums for database researchers, developers, and users. Being selected to present at the PhD Workshop is a significant accomplishment that recognizes both the quality and potential impact of Daan's research. This achievement highlights the innovative work being conducted within the SciLake project and demonstrates how our research advances the state of the art in managing and analyzing large-scale scientific knowledge graphs.

GraphAlg: A New Language for Graph Algorithms

Daan's presentation focused on GraphAlg, a new language that makes it possible to run graph algorithms directly inside database systems. This work has important implications for the SciLake project and the broader scientific community. As graph databases become more popular for complex data analysis, current tools often lack flexibility, speed, and user-friendliness. GraphAlg solves these problems by building on well-established mathematical principles from linear algebra.

The language is designed to be easily analyzed and optimized, and it can be converted into a format that databases already understand (relational algebra). This combination makes GraphAlg both powerful and practical for real-world use.

Why This Matters for SciLake

GraphAlg is being developed specifically in the context of the SciLake project. As part of this project, the graph query engine AvantGraph will host the OpenAIRE Graph, a large scientific knowledge graph containing hundreds of millions of publications. The OpenAIRE Graph currently integrates the BIP! Ranker tool to enrich publication data with research impact indicators based on the citation graph, using algorithms typically derived from PageRank or simple citation counts.

With GraphAlg, these indicators can be computed directly within AvantGraph, replacing a complex pipeline running on a large cluster with a simpler and more efficient query with an embedded algorithm. This means:

  • Significantly improved performance and reduced infrastructure requirements
  • Greater flexibility for project partners to experiment with custom algorithms

This work directly supports SciLake's mission to provide advanced analytics capabilities for the scientific community.

Key Achievements and Future Directions

During his PhD research, supervised by Dr. N. Yakovets, Daan has accomplished several important milestones:

  • Created the language structure and rules for GraphAlg, building on MATLANG (a mathematical framework for working with matrices)
  • Built a compiler that translates GraphAlg into executable code
  • Integrated GraphAlg into AvantGraph, a state-of-the-art graph query engine

In the future, Daan will work on making GraphAlg faster and more efficient, adding support for other database systems.

About the Presentation

The workshop paper, titled "Algorithm Support in a Graph Database, Done Right," was presented as part of the VLDB 2025 PhD Workshop program. The full paper is available at: https://www.vldb.org/2025/Workshops/VLDB-Workshops-2025/PhD/PhD25_5.pdf

We congratulate Daan on this achievement and look forward to following the continued development of GraphAlg as it enhances our capabilities for scientific knowledge graph analysis.