#DeCoding Federated Analytics: unlocking knowledge across borders and datasets

How can researchers study complex diseases across Europe when sensitive and health data cannot leave hospitals or research centres and speaks several languages?

This is one of the key challenges HEREDITARY is addressing, and part of the answer lies in a powerful combination of a federated learning infrastructure, semantic integration and federated analytics.

In this new #DeCoding article, we take a closer look at how Work Package 3 (WP3) is building the foundations that make this possible: the Hereditary Ontology (HERO) and the Hereditary Data Network (HDN).

 

From federated learning to federated analytics

In a previous #DeCoding article, we explored federated learning, a method that allows AI models to be trained across multiple institutions without centralising raw data. But before researchers can analyse data across institutions, they need to ensure they are actually talking about the same things. In healthcare, the same clinical concept can be recorded differently depending on the hospital, the specialty, or even the country. This makes it difficult to combine or compare data.

To solve this, HEREDITARY has developed the Hereditary Ontology (HERO): a shared semantic layer that provides a common language for all partners. This allows different datasets to be understood in a consistent way. It enables researchers to formulate questions without needing to know local database structures, by integrating clinical, genomic and imaging data into a unified conceptual model, covering key neurological diseases domains, such as Amyotrophic lateral sclerosis (ALS) and Multiple sclerosis (MS), and designed to expand to others like Parkinson’s and Alzheimer’s.

This semantic integration is essential: without it, federated analytics wouldn’t be possible.

 

From data silos to a connected network

Building on this ontology, HEREDITARY has developed the Hereditary Data Network (HDN), a federated infrastructure that allows data to be analysed across institutions while remaining locally stored. Data stays where it is, but knowledge can travel. Instead of moving individual patient data to a central repository, HDN enables researchers to send queries to different institutions and receive aggregated results. It is based on a central component that coordinates queries, local endpoints at each institution that execute them on their own data and results that are returned and combined, without exposing sensitive information.

This approach represents a fully federated and privacy-by-design architecture. Privacy controls are integrated in the query processing layer of HDN:

  • Each query is assessed before running and it is automatically assigned with a privacy risk score.
  • Each institution decides what are the risk thresholds they can safely handle.
  • If a query exceeds that thresholds, no data is returned or privacy mitigation measures are applied.

This ensures that data owners remain in full control, while still enabling meaningful research across institutions.

 

How federated analytics works in practice? 

A researcher might ask a question like: “What is the average age at onset of ALS patients?”. 

Instead of accessing a central database, the system: 

  1. Translates the question into a standardised query using HERO.
  2. Sends it to multiple institutions.
  3. Executes it locally at each site.
  4. Returns aggregated results.
  5. Combines them into a single answer, obtaining a response that incorporate insights across different datasets while respecting privacy and institutional autonomy. 

 

Progress so far and what comes next 

By now, HEREDITARY has already made significant progress. The project has delivered the first version of its federated workflow execution methods (D3.2) and demonstrated how semantic integration and federated querying can work together. Also, the HDN prototype has shown that distributed queries can be executed across heterogeneous datasets, integrating privacy-aware query mechanisms. For those with a technical interest, various resources relating to these developments can be found on the project’s Open Hub. 

Looking ahead, the project is focusing on scaling and real-world deployment. Over the first half of 2026, HDN endpoints are being installed across several partners (University of TurinRadboud University Medical Centre and University of Colorado), enabling future live queries on real datasets. The goal is to have a fully operational federated query system running at consortium level by the end of 2026, along with a shared catalogue of queries and a clear maintenance plan.  

Ultimately, what HEREDITARY is building goes beyond technology. It is a new way of doing research in several fields: one where data does not need to move to generate knowledge, where institutions can collaborate without losing control and privacy, and where complexity is managed through shared understanding. The Federated analytics layer, powered by HERO and the HDN, is a key step in that direction.

 

Learn more about Federated Analytics in the following videos, where our coordinator, Gianmaria Silvello (University of Padova) and Daniele Dell’Aglio (Aalborg University) share their insights and perspectives on the topic:

HEREDITARY reaches midterm with strong scientific progress and successful review

The European Horizon Europe project HEREDITARY has successfully reached Month 24 of its execution, marking the halfway point of its four-year duration. This milestone confirms the project’s strong progress and consolidates the solid foundations laid during its first two years of activity, with major deliverables completed and progress achieved.

The end of 2025 closed with particularly positive news for the consortium. HEREDITARY successfully passed its first periodic review at Month 18, with all deliverables approved. Both the external reviewers and the Project Officer praised the high quality of the work, the coherence of the technical developments, and the overall advancement of the project in line with its ambitious objectives.

In December, the consortium reached another remarkable achievement: 14 deliverables were submitted in a single day, representing the highest delivery peak foreseen throughout the entire project. These deliverables span all core scientific and technical work packages, covering clinical use cases, federated and privacy-preserving data infrastructures, semantic integration, advanced analytics, visualisation tools, citizen engagement, project management, and exploitation and intellectual property planning. Altogether, they account for more than 400 pages of technical and scientific results, reflecting an extraordinary collective effort by all partners. At the end of the article, you can review the complete list of all the reports submitted. Check them all out in the Deliverables section of our website.

Among the key achievements at this midpoint, there are also two important milestones: the first operational version of the federated workflow execution engine, enabling secure and distributed analysis across institutions, on top of the federated data management infrastructure, and the progress in data FAIRification, strengthening the discoverability and alignment of HEREDITARY data resources with European initiatives and standards. Both can be consulted in Deliverables 3.2 and Deliverable 3.6, respectively.

Reaching Month 24 represents not only a quantitative success in terms of deliverables and milestones, but also a qualitative one. The results produced so far demonstrate that HEREDITARY is effectively advancing towards its vision of building a federated, interoperable and privacy-preserving ecosystem for the integration and analysis of multimodal health data, with a particular focus on neurodegenerative and gut–brain related disorders.

Looking ahead, the consortium enters the second half of the project with a clear roadmap. The coming period will focus on maturing core scientific contributions, integrating results across work packages, and consolidating HEREDITARY into a coherent and impactful ecosystem.

14 Deliverables Submitted at M24 (December 2025)

DeliverableTitleBrief descriptionDissemination level
D1.5Risk Management Plan, 2nd reportUpdated analysis of project risks identified after the second year of implementation, including mitigation and contingency measures.EU Classified
D2.4Linkage and feature extraction from gut–brain, intermediate evaluationIntegrated brain–gut linkage and behavioural phenotyping to extract features for federated learning, including an intermediate evaluation at M24.Public (PU)
D2.22UCD clinical studies documentationRegulatory, ethical and data access documentation required for the UCD-led clinical studies, including approvals and MTAs where applicable.Public (PU)
D3.2Federated workflow execution methods: first releaseFirst release of the federated query execution engine, including intermediate implementations, optimisations, documentation and testing.Public (PU)
D3.6FAIRification of participating data resourcesReport on improvements in FAIRness of HEREDITARY data sources, with emphasis on discoverability and alignment with EU initiatives.Public (PU)
D3.11Pilot of the genomics data science ontology interconversionPilot demonstrator of a clinical ontology conversion tool enabling interoperability with genomic and other biomedical data.Public (PU)
D4.1KDE datasets and methods: first releaseOpen dataset including newly predicted links from the HEREDITARY knowledge graph using several knowledge graph embedding methods.Public (PU)
D4.3Learning models and spatio-temporal harmonizationDesign and first implementation of multimodal learning algorithms, self-supervised methods, and initial harmonisation libraries.Public (PU)
D5.2Demonstrator of visualization components for sequences, networks, text, and high dimensional dataSoftware libraries implementing visualisation components for heterogeneous data types, including sequences, networks and text.Public (PU)
D5.4Prototype of the visualization components for spatial, image, and simulation dataPrototype visualisation libraries addressing spatial data, biomedical images and simulation-based datasets.Public (PU)
D5.7Requirement analysis and user studies: Initial resultsInitial requirements analysis and early evaluation results derived from user studies of WP5 visual analytics tools.Public (PU)
D5.10First evaluation challenge: report on the data, results, and integration with EOSCReport on the first evaluation challenge, including datasets, results, open lab proceedings and integration within EOSC.Public (PU)
D6.7World café outcome: Priorities and gapsSynthesis of stakeholder perspectives collected during the World Café, identifying priorities and gaps relevant to HEREDITARY.Public (PU)
D8.5Mid Term IPR planMid-term Intellectual Property Rights plan outlining preliminary protection and exploitation strategies for project results.Sensitive (SEN)

Check them all in the Deliverables section of the website.