HEREDITARY is receiving EOSC EU Node funding to advance Federated Learning experiments

The HEREDITARY project has been granted 40,000 credits from the European Open Science Cloud (EOSC EU Node) to support, among other things, its federated learning activities within a secure European research infrastructure.

About the EOSC EU Node

The European Open Science Cloud (EOSC EU Node) is the operational platform of the EOSC Federation, designed to facilitate open, collaborative and data-driven research in Europe. It supports multidisciplinary scientific work by providing access to digital research services such as computing and storage resources, containerized environments, and collaborative tools through institutional credentials. The platform promotes the sharing and reuse of research outputs in a secure, GDPR-compliant cloud ecosystem based on FAIR data principles and a credit-based access model.

The awarded credits will be used to deploy and maintain the central server required for federated learning experiments on EOSC virtual machines. Federated learning allows multiple institutions to collaboratively train machine learning models without sharing sensitive data. By using EOSC infrastructure, HEREDITARY can efficiently manage firewall configurations and incoming connections, overcoming common technical barriers associated with institutional IT restrictions.

The EOSC EU Node credits will support several weeks of experimentation, with server configurations adapted to different model sizes and algorithmic requirements. In parallel, HEREDITARY is exploring the development of an API-based solution that would allow researchers to deploy experiment-specific software containers through the EOSC Cloud Container platform. This approach aims to streamline workflows, facilitate testing and deployment, and potentially deliver open-source tools that could benefit the broader research community.

#DeCoding Federated Learning: from Infrastructure design to secure collaboration across Europe

How can hospitals collaborate on sensitive medical data without ever sharing the data itself? This is the core question behind Federated Learning (FL), and one of the key technological pillars of the HEREDITARY project.

Over the past two years, HEREDITARY has progressively designed, deployed and tested a federated learning infrastructure capable of connecting medical centres across Europe while ensuring that raw patient data never leaves its original location. What began as a technical design challenge has now evolved into a secure network supporting distributed machine learning experiments across heterogeneous datasets.

Building the Foundations: Computing Infrastructures

Federated Learning only works if each participating centre has the technical capacity to train models locally and communicate securely with the rest of the network. The first step was ensuring this. Under Deliverable D2.14 in Month 9 and lead by SURF, partners established secure computing infrastructures capable of handling sensitive clinical and genomic data, equipping centres with appropriate storage, processing power and secure communication channels. Thanks to this, data owners can process data locally, train models without centralising records and exchange model updates securely within the federation.

With local infrastructures in place, the next step was to design and validate the full federated learning architecture. Deliverable D2.11 in Month 18 presents a federated infrastructure that is secure, flexible and deployable across heterogeneous environments, including high-performance computing systems and cloud platforms. Encrypted communication via gRPC/TLS was implemented to protect model exchanges, while Secure Aggregation mechanisms (SecAgg/SecAgg+) were integrated to prevent the central server from accessing individual model updates.

The system was engineered to support both horizontal federated learning (same data types across centres) and vertical federated learning (different data modalities distributed across centres). Dedicated project workshops demonstrated that both approaches could run successfully across geographically distributed nodes, even when accounting for network latency between countries. By Month 18, HEREDITARY had a federated network capable of running both horizontal and vertical learning experiments on ALS data, without moving any raw records.

Securing the Communication: Communication Protocols

Security does not stop at this point. Deliverable D2.15 in Month 22 dives deeper into how model updates are protected during training. SURF analysed and validated advanced communication protocols within the federated learning framework. Three key mechanisms were the driving force behind this:

  • Secure Aggregation ensures that the server can combine model updates without seeing any individual contribution. Clients (Medical Centres) mask their updates using cryptographic techniques so that when all updates are aggregated, the masks cancel out, but no single update can be inspected independently. Tests showed no significant decrease in model performance, with only a modest increase in runtime due to additional communication steps.
  • Differential Privacy was also evaluated, introducing controlled noise to model updates to further reduce the risk of information leakage, again with minimal performance degradation.
  • Trusted Execution Environments were explored as an additional layer of security, though their hardware requirements make them less practical in heterogeneous clinical environments.

Beyond Simulation: paving the way for actual implementation

One key lesson emerging from this work is that federated learning is relatively straightforward in simulation, but deploying it across real institutions introduces new challenges: hardware variability, network latency across countries, IT coordination and regulatory compliance. Through interactive workshops and live experiments, HEREDITARY has moved beyond theoretical experimentation to operational deployment.

Today, the project operates a federated network linking multimodal clinical data without centralising any raw records. Advanced AI models can be trained across distributed datasets and privacy-enhancing technologies can be implemented with limited performance trade-offs. The infrastructure is reliable, secure and resilient. This “data stays at source” approach aligns closely with the principles of the European Health Data Space, demonstrating that privacy-preserving, cross-border health data collaboration is technically feasible.

The next step will arrive in June 2026, when the project moves from validated design to consolidated implementation. Deliverable D2.12 will formalise the full implementation of the federated infrastructure, while Federated Learning will demonstrate its clinical relevance through Deliverable D2.17, presenting intermediate results from the neurodegenerative use cases. Together, these upcoming milestone will mark a transition from infrastructure validation to scientific and clinical impact.

Aligning strategy, technology and clinical value: HEREDITARY’s 5th Plenary Meeting in Lisbon

On 5–6 February 2026, the HEREDITARY consortium gathered at Universidade Nova de Lisboa (UNL), Portugal, for its 5th Plenary Meeting and the first in-person meeting of the project’s third year. Over two intensive days, partners reviewed progress, aligned on strategic priorities, and advanced key technical developments that will shape the next phase of the project. 

The meeting followed directly after the Federated Learning Workshop (3–4 February), creating strong momentum around HEREDITARY’s core mission: enabling privacy-preserving, multimodal data analysis across European medical centres. 

Opening the meeting, Project Coordinator Gianmaria Silvello (UNIPD) provided a comprehensive overview of the project’s current status. With the first review completed and 41 Deliverables successfully delivered, the consortium is now fully focused on addressing reviewers’ recommendations and consolidating technical achievements into high-impact results. 

Throughout the first day, each Work Package presented its latest developments and next steps, demonstrating strong cross-WP integration and alignment with the project’s strategic objectives. The review of ongoing activities confirmed steady technical progress across data infrastructuresemantic integrationanalyticsvisualizationlegal frameworkandcitizen science, which reinforces the coordination between clinical, technical, social and legal dimensions. 

A central highlight of the meeting was the Federated Learning and Federated Analytics sessions. On the second day, SURF reported on the Federated Learning workshop and the evolution of infrastructure leadership. Discussions explored the idea of creating a living document to guide institutions in setting up secure federated learning environments. On the Federated Analytics side, the Hereditary Data Network (HDN) architecture and deployment roadmap were presented by UNIPD, ensuring a real HDN query system running by December 2026, with a clear maintenance plan, and preparing a demonstrator for reviewers in early 2027. These developments mark a decisive step towards operational federated workflow execution across heterogeneous clinical and genomic datasets. 

After this, the five HEREDITARY use cases were reviewed in detail, with particular emphasis on: data storage and sources clarification, strengthening the causal interpretation of results and ensuring robust legal alignment. The consortium reaffirmed that clinical relevance and methodological rigour must be a cenral topic in the project. 

Looking Ahead 

With federated learning infrastructure maturing, HDN endpoints being installed, FAIRification progressing, and use cases consolidating clinical relevance, the consortium is moving decisively towards delivering a scalable, privacy-preserving framework for multimodal health data analysis in Europe. 

The meeting concluded with a clear set of next action points: 

  • Online Plenary Meeting planned for June 2026. 
  • Steering Committee meeting planned for April 2026. 
  • Federated Learning Workshop at AAU (May 2026). 

The next two years will be key to the project’s results and impact, and HEREDITARY is aligned, coordinated and ready. Check out some photos from the event here:

Decoding HEREDITARY: Discover the project through its WP Leaders

Over the past few months, HEREDITARY has released a new series of 10 interviews on its YouTube channel, offering an inside look at the ongoing work and structure within the project. Recorded during the HEREDITARY consortium meeting held in Barcelona back in February 2025, this series brings together project partners who share insights into their own expertise and role as Work Packages Leaders, in order to explore and understand the scope of HEREDITARY’s activities.

The series includes seven interviews dedicated to the HEREDITARY Work Packages, in which consortium members explain their objectives, challenges and the main tasks currently under development. In addition, the series features three complementary interviews focusing on some key topics: Self-Supervised Learning, Visualization Techniques, and the European Health Data Space and the AI Act in the European context. Together, these videos provide a broader perspective on the methodological, technological, health-releated and regulatory aspects surrounding HEREDITARY’s research.

Below you can find the full list of interviews included in this series.

 

WP1 – Project Management

Giorgio Maria Di Nunzio (Università di Padova) explains how WP1 coordinates the HEREDITARY project, overseeing general, technical, ethics & risks, and data management to ensure the project progresses efficiently and ethically.

WP2 – Clinical Use Cases and Federated Networking Infrastructure

Umberto Manera (University of Turin) discusses WP2’s federated learning approach to analyze sensitive medical data and its five use cases covering ALS, Parkinson’s disease, and the gut-brain axis.

WP3 – Multimodal Semantic Integration Platform

Daniele Dell’Aglio (Aalborg University) presents WP3, which enables privacy-preserving data sharing across multiple data owners (such as hospitals and clinics) using knowledge graphs, ontologies, and federated methods to support analytics workflows.

WP4 – Multimodal Analytics & Learning Platform

Henning Müller and Manfredo Atzori (HES-SO Valais) describe WP4’s multimodal platform for integrating heterogeneous biomedical data, using self-supervised learning and spatio-temporal analytics to uncover new relationships.

WP5 – Visual Analytics and Interaction

Tobias Schreck (TU Graz) introduces WP5’s visual analytics platform, which combines machine learning and interactive visualizations to explore complex multimodal datasets and support decision-making.

WP6 – Citizen Science and Public Engagement

Chiara Lovati and Giuseppe Pellegrini (Observa) explain WP6’s Health Social Laboratories, which engage citizens and stakeholders to align research with real-life needs through collaborative dialogue.

WP7 – Legal, Ethical, and Regulatory Frameworks

Elisabetta Biasin (KU Leuven) highlights WP7’s role in ensuring HEREDITARY complies with privacy, data protection, AI, and security regulations throughout the project.

The Science Behind Self-Supervised Learning

Manfredo Atzori (Università di Padova) explains how self-supervised learning models extract patterns from raw multimodal biomedical data, helping identify subgroups and improve prognosis in neurodegenerative diseases.

Visualization Techniques for Multimodal Data

Tobias Schreck (TU Graz) presents HEREDITARY’s visual analytics methods, showing how integrating multiple data types into interactive platforms reveals hidden patterns and supports hypothesis generation.

European Health Data Space (EHDS) and AI Act in the European context

Lotte Cools (KU Leuven) discusses how EHDS and the AI Act shape access to health data and the use of AI in HEREDITARY, ensuring research remains innovative while legally and ethically compliant.

DETECH 2026: Pushing the boundaries of Medical Terminology

DETECH 2026, the DEfinition and Term Extraction Challenge, organized as part of the HEREDITARY project, will take place on June 24, 2026, at University of Zadar, Croatia, as a hybrid satellite event of MDTT 2026, Multilingual Digital Terminology Today: Design, representation formats and management systems.

The training data for DETECH 2026 is now available in GitHub, and the challenge is officially open to participation. We welcome anyone interested in automatic term extraction, definition generation, biomedical NLP, and medical terminology to take part in the event. Teams and individual researchers can register until March 13, 2026, and start experimenting with the dataset ahead of the evaluation phase.

What is DETECH?

DETECH focuses on automatic extraction of domain-specific terms and the generation of natural language definitions for medical concepts. The 2026 edition will focus on the gut–brain interplay, offering a real-world testbed for NLP methods in gastroenterology, neuroscience, and genetics.

Challenge Tasks

The challenge features two main tasks:

  • Task A – Term Extraction: Identify relevant single-word and multi-word terms from English texts on the gut–brain axis.
  • Task B – Definition Generation: Create natural language definitions for the extracted concepts, using corpus-based evidence or automatic text generation techniques.

Key dates for DETECH 2026

  • January 22: Training data release
  • March 13: Registration deadline for participation
  • March 20: Test data release
  • March 27: Submission of runs
  • April 7: Submission of reports
  • April 15: Results announced
  • April 21: Review feedback
  • May 15: Camera-ready report submission
  • June 15: Registration deadline for the event
  • June 24: Day of the challenge

Who Can Participate?

Researchers, academics, and industry teams working in NLP, biomedical informatics, terminology, or lexicography. Each team can submit up to five runs per subtask and external resources such as pre-trained models, lexicons, or ontologies are allowed but must be properly documented. Manual runs are also accepted but will not be ranked.

Submissions & Evaluation

All submissions must include a technical report detailing the approach, experiments, and results. Reports will be peer-reviewed and published in the CEUR-WS online open-access platform, which is indexed in Scopus. Accepted papers can later be extended for submission to journals or edited volumes, providing further visibility for participants’ research.

What is MDTT 2026?

The “Multilingual Digital Terminology Today: Design, Representation Formats and Management Systems” (MDTT 2026) is the fifth international conference dedicated to the design, representation, and management of digital terminology resources. This event focuses on methods for analyzing user needs, designing and validating terminological resources, and developing effective representation formats and management systems.

Stay tuned for registration details and submission instructions, which will soon be available. We look forward to seeing you at DETECH 2026, where innovation in explainable, data-driven medical terminology meets cutting-edge NLP research!

Programme (Wednesday 24 June, 2026)

14:30 – 16:00 | Session 1

14:30 – 15:00 | Opening

15:00 – 15:30 | “Adapting TBXTools to automatic terminology extraction with BERT”. Gonzalo López-Sánchez, Patricia Morales-Hurtado, Mercè Vàzquez, Albert Morales-Moreno, and Silvia Rodríguez Vázquez.

15:30 – 16:00 | “A QTT-Informed System for Biomedical Term Extraction and Definition Generation”. Diego A. Burgos, Antonio Tamayo Herrera, Giovanni Díaz, and Carlos Mario Pérez-Pérez.

16:00 – 16:30 | Coffee break
16:30 – 18:00 | Session 2

16:30 – 17:00 | “GutBrainTerm_Extractor: Generative AI for Medical Terminology Extraction in the Gut–Brain Domain”. Helena Ortiz Garduño and Esther Castillo Pérez.

17:00 – 17:30 | “TermHunter: Neural Biomedical Term Extraction and Ontology-Enhanced Definition Generation with Structured Prompting”. Nina Hosseini-Kivanani and Rossella Resi.

17:30 – 18:00 | “Description of the LISN system for extracting terms”. Thierry Hamon.

18:00 – 18:15 | Closing