How can hospitals collaborate on sensitive medical data without ever sharing the data itself? This is the core question behind Federated Learning (FL), and one of the key technological pillars of the HEREDITARY project.
Over the past two years, HEREDITARY has progressively designed, deployed and tested a federated learning infrastructure capable of connecting medical centres across Europe while ensuring that raw patient data never leaves its original location. What began as a technical design challenge has now evolved into a secure network supporting distributed machine learning experiments across heterogeneous datasets.
Building the Foundations: Computing Infrastructures
Federated Learning only works if each participating centre has the technical capacity to train models locally and communicate securely with the rest of the network. The first step was ensuring this. Under Deliverable D2.14 in Month 9 and lead by SURF, partners established secure computing infrastructures capable of handling sensitive clinical and genomic data, equipping centres with appropriate storage, processing power and secure communication channels. Thanks to this, data owners can process data locally, train models without centralising records and exchange model updates securely within the federation.
With local infrastructures in place, the next step was to design and validate the full federated learning architecture. Deliverable D2.11 in Month 18 presents a federated infrastructure that is secure, flexible and deployable across heterogeneous environments, including high-performance computing systems and cloud platforms. Encrypted communication via gRPC/TLS was implemented to protect model exchanges, while Secure Aggregation mechanisms (SecAgg/SecAgg+) were integrated to prevent the central server from accessing individual model updates.
The system was engineered to support both horizontal federated learning (same data types across centres) and vertical federated learning (different data modalities distributed across centres). Dedicated project workshops demonstrated that both approaches could run successfully across geographically distributed nodes, even when accounting for network latency between countries. By Month 18, HEREDITARY had a federated network capable of running both horizontal and vertical learning experiments on ALS data, without moving any raw records.
Securing the Communication: Communication Protocols
Security does not stop at this point. Deliverable D2.15 in Month 22 dives deeper into how model updates are protected during training. SURF analysed and validated advanced communication protocols within the federated learning framework. Three key mechanisms were the driving force behind this:
- Secure Aggregation ensures that the server can combine model updates without seeing any individual contribution. Clients (Medical Centres) mask their updates using cryptographic techniques so that when all updates are aggregated, the masks cancel out, but no single update can be inspected independently. Tests showed no significant decrease in model performance, with only a modest increase in runtime due to additional communication steps.
- Differential Privacy was also evaluated, introducing controlled noise to model updates to further reduce the risk of information leakage, again with minimal performance degradation.
- Trusted Execution Environments were explored as an additional layer of security, though their hardware requirements make them less practical in heterogeneous clinical environments.
Beyond Simulation: paving the way for actual implementation
One key lesson emerging from this work is that federated learning is relatively straightforward in simulation, but deploying it across real institutions introduces new challenges: hardware variability, network latency across countries, IT coordination and regulatory compliance. Through interactive workshops and live experiments, HEREDITARY has moved beyond theoretical experimentation to operational deployment.
Today, the project operates a federated network linking multimodal clinical data without centralising any raw records. Advanced AI models can be trained across distributed datasets and privacy-enhancing technologies can be implemented with limited performance trade-offs. The infrastructure is reliable, secure and resilient. This “data stays at source” approach aligns closely with the principles of the European Health Data Space, demonstrating that privacy-preserving, cross-border health data collaboration is technically feasible.
The next step will arrive in June 2026, when the project moves from validated design to consolidated implementation. Deliverable D2.12 will formalise the full implementation of the federated infrastructure, while Federated Learning will demonstrate its clinical relevance through Deliverable D2.17, presenting intermediate results from the neurodegenerative use cases. Together, these upcoming milestone will mark a transition from infrastructure validation to scientific and clinical impact.



Recent Comments