Balancing Utility and Fairness against Privacy in Medical Data

Andrew Chester, Yun Sing Koh, Quan Sun, Jörg Wicker, Junjae Lee: Balancing Utility and Fairness against Privacy in Medical Data. In: IEEE Symposium Series on Computational Intelligence, IEEE, Forthcoming.

Abstract

Abstract—Abstract—There are numerous challenges when designing algorithms that interact with sensitive data, such as, medical or financial records. One of these challenges is privacy. However, there is a tension between privacy, utility (model accuracy), and fairness. While de-identification techniques, such as generalisation and suppression, have been proposed to enable privacy protection, it comes with a cost, specifically to fairness and utility. Recent work on fairness in algorithm design defines fairness as a guarantee of similar outputs for "similar" input data. This notion is discussed in connection to de-identification. This research investigates the trade-off between privacy, fairness, and utility. In contrast, other work investigates the trade-off between privacy and utility of the data or accuracy of the model overall. In this research, we investigate the effects of two standard de-identification techniques, k-anonymity and differential privacy, on both utility and fairness. We propose two measures to calculate the trade-off between privacy-utility and privacy-fairness. Although other research has provided guarantees for privacy regarding utility, this research focuses on the trade-offs given set de-identification levels and relies on guarantees provided by the privacy preservation methods. We discuss the effects of de-identification on data of different characteristics, class imbalance and outcome imbalance. We evaluated this is on synthetic datasets and standard real-world datasets. As a case study, we analysed the Medical Expenditure Panel Survey dataset.

    BibTeX (Download)

    @inproceedings{chester2020balancing,
    title = {Balancing Utility and Fairness against Privacy in Medical Data},
    author = {Andrew Chester and Yun Sing Koh and Quan Sun and J\"{o}rg Wicker and Junjae Lee},
    year  = {2020},
    date = {2020-12-01},
    booktitle = {IEEE Symposium Series on Computational Intelligence},
    publisher = {IEEE},
    abstract = {Abstract\textemdashAbstract\textemdashThere are numerous challenges when designing algorithms that interact with sensitive data, such as, medical or financial records. One of these challenges is privacy. However, there is a tension between privacy, utility (model accuracy), and fairness. While de-identification techniques, such as generalisation and suppression, have been proposed to enable privacy protection, it comes with a cost, specifically to fairness and utility. Recent work on fairness in algorithm design defines fairness as a guarantee of similar outputs for "similar" input data. This notion is discussed in connection to de-identification. This research investigates the trade-off between privacy, fairness, and utility. In contrast, other work investigates the trade-off between privacy and utility of the data or accuracy of the model overall. In this research, we investigate the effects of two standard de-identification techniques, k-anonymity and differential privacy, on both utility and fairness. We propose two measures to calculate the trade-off between privacy-utility and privacy-fairness. Although other research has provided guarantees for privacy regarding utility, this research focuses on the trade-offs given set de-identification levels and relies on guarantees provided by the privacy preservation methods. We discuss the effects of de-identification on data of different characteristics, class imbalance and outcome imbalance. We evaluated this is on synthetic datasets and standard real-world datasets. As a case study, we analysed the Medical Expenditure Panel Survey dataset. },
    keywords = {accuracy, computational sustainability, data mining, fairness, imbalance, machine learning, medicine, privacy},
    pubstate = {forthcoming},
    tppubtype = {inproceedings}
    }