MIT researchers have made significant progress in addressing the challenge of protecting sensitive data encoded within machine-learning models. A team of scientists has developed a machine-learning model that can accurately predict whether a patient has cancer from lung scan images. However, sharing the model with hospitals worldwide poses a significant risk of potential data extraction by malicious agents. To address this issue, the researchers have introduced a novel privacy metric called Probably Approximately Correct (PAC) Privacy, along with a framework that determines the minimal amount of noise required to protect sensitive data.
Conventional privacy approaches, such as Differential Privacy, focus on preventing an adversary from distinguishing the usage of specific data by adding enormous amounts of noise, which reduces the model’s accuracy. PAC Privacy takes a different perspective by evaluating an adversary’s difficulty in reconstructing parts of the sensitive data even after the noise has been added. For instance, if the sensitive data are human faces, differential privacy would prevent the adversary from determining if a specific individual’s face was in the dataset. In contrast, PAC Privacy explores whether an adversary could extract an approximate silhouette that could be recognized as a particular individual’s face.
To implement PAC Privacy, the researchers developed an algorithm that determines the optimal amount of noise to be added to a model, guaranteeing privacy even against adversaries with infinite computing power. The algorithm relies on the uncertainty or entropy of the original data from the adversary’s perspective. By subsampling data and running the machine-learning training algorithm multiple times, the algorithm compares the variance across different outputs to determine the necessary amount of noise. A smaller variance indicates that less noise is required.
One of the key advantages of the PAC Privacy algorithm is that it doesn’t require knowledge of the model’s inner workings or the training process. Users can specify their desired confidence level regarding the adversary’s ability to reconstruct the sensitive data, and the algorithm provides the optimal amount of noise to achieve that goal. However, it’s important to note that the algorithm does not estimate the loss of accuracy resulting from adding noise to the model. Furthermore, implementing PAC Privacy can be computationally expensive due to the repeated training of machine-learning models on various subsampled datasets.
To enhance PAC Privacy, researchers suggest modifying the machine-learning training process to increase stability, which reduces the variance between subsample outputs. This approach would reduce the algorithm’s computational burden and minimize the amount of noise needed. Additionally, more stable models often exhibit lower generalization errors, leading to more accurate predictions on new data.
While the researchers acknowledge the need for further exploration of the relationship between stability, privacy, and generalization error, their work presents a promising step forward in protecting sensitive data in machine-learning models. By leveraging PAC Privacy, engineers can develop models that safeguard training data while maintaining accuracy in real-world applications. With the potential for significantly reducing the amount of noise required, this technique opens up new possibilities for secure data sharing in the healthcare domain and beyond.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Check Out 800+ AI Tools in AI Tools Club
The post MIT Researchers Achieve a Breakthrough in Privacy Protection for Machine Learning Models with Probably Approximately Correct (PAC) Privacy appeared first on MarkTechPost.