Private Evaluation of Decision Trees using Sublinear Cost

Authors: Anselme Tueno (SAP SE), Florian Kerschbaum (University of Waterloo), Stefan Katzenbeisser (TU Darmstadt)

Volume: 2019
Issue: 1
Pages: 266–286
DOI: https://doi.org/10.2478/popets-2019-0015

Download PDF

Abstract: Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1.2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.

Keywords: Private Decision Tree Evaluation, Garbled Circuits, ORAM, Oblivious Transfer

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.