TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems

Authors: Marco Di Gennaro (Politecnico di Milano), Giovanni De Lucia (Politecnico di Milano), Stefano Longari (Politecnico di Milano), Stefano Zanero (Politecnico di Milano), Michele Carminati (Politecnico di Milano)

Volume: 2025
Issue: 4
Pages: 566–584
DOI: https://doi.org/10.56553/popets-2025-0145

Download PDF

Abstract: Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.

Keywords: Federated Learning, Privacy Attacks, Dataset Reconstruction Attack, Gradient Boosting Decision Trees

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.