Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 14 submissions in the queue.

Submission Preview

Link to Story

Machine Learning Can be Fair and Accurate - News - Carnegie Mellon University

Accepted submission by upstart at 2021-10-25 23:34:00
News

████ # This file was generated bot-o-matically! Edit at your own risk. ████

Machine Learning Can Be Fair and Accurate - News - Carnegie Mellon University [cmu.edu]:

Machine Learning Can Be Fair and Accurate CMU Researchers Dispel Theoretical Assumption About ML Trade-Offs in Policy Decisions

By Aaron Aupperleeaaupperlee(through)cmu.edu

Media InquiriesAaron Aupperlee

  • School of Computer Science
  • aaupperlee(through)cmu.edu

Carnegie Mellon University researchers are challenging a long-held assumption that there is a trade-off between accuracy and fairness when using machine learning to make public policy decisions.

As the use of machine learning has increased in areas such as criminal justice, hiring, health care delivery and social service interventions, concerns have grown over whether such applications introduce new or amplify existing inequities, especially among racial minorities and people with economic disadvantages. To guard against this bias, adjustments are made to the data, labels, model training, scoring systems and other aspects of the machine learning system. The underlying theoretical assumption is that these adjustments make the system less accurate.

A CMU team aims to dispel that assumption in a new study, recently published in Nature Machine Intelligence [nature.com]. Rayid Ghani [rayidghani.com], a professor in the School of Computer Science's Machine Learning Department [cmu.edu] and the Heinz College of Information Systems and Public Policy [cmu.edu]; Kit Rodolfa [cmu.edu], a research scientist in ML; and Hemank Lamba, a post-doctoral researcher in SCS, tested that assumption in real-world applications and found the trade-off was negligible in practice across a range of policy domains.

"You actually can get both. You don't have to sacrifice accuracy to build systems that are fair and equitable," Ghani said. "But it does require you to deliberately design systems to be fair and equitable. Off-the-shelf systems won't work."

Ghani and Rodolfa focused on situations where in-demand resources are limited, and machine learning systems are used to help allocate those resources. The researchers looked at systems in four areas: prioritizing limited mental health care outreach based on a person's risk of returning to jail to reduce reincarceration; predicting serious safety violations to better deploy a city's limited housing inspectors; modeling the risk of students not graduating from high school in time to identify those most in need of additional support; and helping teachers reach crowdfunding goals for classroom needs.

In each context, the researchers found that models optimized for accuracy — standard practice for machine learning — could effectively predict the outcomes of interest but exhibited considerable disparities in recommendations for interventions. However, when the researchers applied adjustments to the outputs of the models that targeted improving their fairness, they discovered that disparities based on race, age or income — depending on the situation — could be removed without a loss of accuracy.

Ghani and Rodolfa hope this research will start to change the minds of fellow researchers and policymakers as they consider the use of machine learning in decision making.

"We want the artificial intelligence, computer science and machine learning communities to stop accepting this assumption of a trade-off between accuracy and fairness and to start intentionally designing systems that maximize both," Rodolfa said. "We hope policymakers will embrace machine learning as a tool in their decision making to help them achieve equitable outcomes."

Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy [nature.com]:

The growing use of machine learning in policy and social impact settings has raised concerns over fairness implications, especially for racial minorities. These concerns have generated considerable interest among machine learning and artificial intelligence researchers, who have developed new methods and established theoretical bounds for improving fairness, focusing on the source data, regularization and model training, or post-hoc adjustments to model scores. However, few studies have examined the practical trade-offs between fairness and accuracy in real-world settings to understand how these bounds and methods translate into policy choices and impact on society. Our empirical study fills this gap by investigating the impact of mitigating disparities on accuracy, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programmes across education, mental health, criminal justice and housing safety. Here we describe applied work in which we find fairness–accuracy trade-offs to be negligible in practice. In each setting studied, explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy. This observation was robust across policy contexts studied, scale of resources available for intervention, time and the relative size of the protected groups. These empirical results challenge a commonly held assumption that reducing disparities requires either accepting an appreciable drop in accuracy or the development of novel, complex methods, making reducing disparities in these applications more practical.

Access through your institution [springernature.com]Buy or subscribeAccess options

Subscribe to Journal

Get full journal access for 1 year

$99.00

only $8.25 per issue

Subscribe [nature.com]

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

Rent or Buy [nature.com]

All prices are NET prices.

Additional access options:

Data from the inmate mental health context were shared through a partnership and data use agreement with the county government of Johnson County, KS (which collected and made available data from the county- and city-level agencies in their jurisdiction as described in the Methods). Data from the housing safety context were shared through a partnership and data use agreement with the Code Enforcement Division in the city of San Jose, CA. Data from the student outcomes setting were shared through a partnership and data use agreement with the Ministry of Education in El Salvador. Although the sensitive nature of the data for these three contexts required that the work was performed under strict data use agreements and the data cannot be made publicly available, researchers or practitioners interested in collaborating on these projects or with the agencies involved should contact the corresponding author for more information and introductions. The education crowdfunding dataset is publicly available at https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/data [kaggle.com]. A database extract with model outputs and disparity mitigation results using this dataset is available for download (see replication instructions in the GitHub repository linked in the code availability statement).

The code used here for modelling, disparity mitigation and analysis for all four projects is available at https://github.com/dssg/peeps-chili [github.com] (ref. 44 [nature.com]). Complete instructions for replication of the education crowdfunding results reported here can be found in the README of this respository, along with a step-by-step jupyter notebook for performing the analysis.

  1. 1.

    Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data5, 153–163 (2017).

  2. 2.

    Skeem, J. L. & Lowenkamp, C. T. Risk, race, and recidivism: predictive bias and disparate impact. Criminology54, 680–712 (2016).

  3. 3.

    Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine bias. ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [propublica.org]

  4. 4.

    Raghavan, M., Barocas, S., Kleinberg, J. & Levy, K. Mitigating bias in algorithmic hiring: evaluating claims and practices. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 469–481 (ACM, 2020).

  5. 5.

    Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science336, 447–453 (2019).

  6. 6.

    Ramachandran, A. et al. Predictive analytics for retention in care in an urban HIV clinic. Sci. Rep.https://doi.org/10.1038/s41598-020-62729-x [doi.org] (2020).

  7. 7.

    Bauman, M. J. et al. Reducing incarceration through prioritized interventions. In Proc. 1st Conference on Computing and Sustainable Societies (COMPASS) (ed. Zegura, E.) 1–8 (ACM, 2018).

  8. 8.

    Chouldechova, A. et al. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Mach. Learn. Res.81, 134–148 (2018).

  9. 9.

    Potash, E. et al. Predictive modeling for public health: preventing childhood lead poisoning. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 2039–2047 (ACM, 2015).

  10. 10.

    Chen, I. Y., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 3539–3550 (NIPS, 2018).

  11. 11.

    Celis, L. E., Huang, L., Keswani, V. & Vishnoi, N. K. Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 319–328 (ACM, 2019).

  12. 12.

    Zafar, M. B., Valera, I., Rodriguez, M. G. & Gummadi, K. P. Fairness beyond disparate treatment and disparate impact: learning classification without disparate mistreatment. In 26th International World Wide Web Conference (eds Barrett, R. & Cummings, R.) 1171–1180 (WWW, 2017).

  13. 13.

    Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for group-fair and efficient machine learning. Proc. Mach. Learn. Res.81, 119–133 (2018).

  14. 14.

    Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D., von Luxburg, U., Garnett, R., Sugiyama, M. & Guyon, I.) 3315–3323 (NIPS, 2016).

  15. 15.

    Rodolfa, K. T. et al. Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 142–153 (ACM, 2020).

  16. 16.

    Heidari, H., Gummadi, K. P., Ferrari, C. & Krause, A. Fairness behind a veil of ignorance: a welfare analysis for automated decision making. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 1265–1276 (NIPS, 2018).

  17. 17.

    Friedler, S. A. et al. A comparative study of fairness-enhancing interventions in machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 329–338 (ACM, 2019).

  18. 18.

    Kearns, M., Roth, A., Neel, S. & Wu, Z. S. An empirical study of rich subgroup fairness for machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 100–109 (ACM, 2019).

  19. 19.

    Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. Proc. 20th International Conference on Artificial Intelligence and Statistics (eds Singh, A. & Zhu, J.) 962–970 (PMLR, 2017).

  20. 20.

    Ghani, R., Walsh, J. & Wang, J. Top 10 ways your Machine Learning models may have leakage (Data Science for Social Good Blog, 2020); http://www.rayidghani.com/2020/01/24/top-10-ways-your-machine-learning-models-may-have-leakage [rayidghani.com]

  21. 21.

    Verma, S. & Rubin, J. Fairness definitions explained. In Proc. 2018International Workshop on Software Fairness (eds Brun, Y., Johnson, B. & Meliou, A.) 1–7 (IEEE/ACM, 2018).

  22. 22.

    Gajane, P. & Pechenizkiy, M. On formalizing fairness in prediction with machine learning. Preprint at https://arxiv.org/abs/1710.03184 [arxiv.org] (2018).

  23. 23.

    Kleinberg, J. M., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In Proc.8th Innovations in Theoretical Computer Science Conference (ed. Psounis, K.) 1–43 (ITCS, 2017).

  24. 24.

    Krishna Menon, A. & Williamson, R. C. The cost of fairness in binary classification. In Proc. 1st Conference on Fairness, Accountability, and Transparency (eds Friedler, S. & Wilson, C.) 107–118 (PMLR, 2018).

  25. 25.

    Huq, A. Racial equity in algorithmic criminal justice. Duke Law J.68, 1043–1134 (2019).

  26. 26.

    Hamilton, M. People with complex needs and the criminal justice system. Curr. Iss. Crim. Justice22, 307–324 (2010).

  27. 27.

    James, D. J. & Glaze, L. E. Mental Health Problems of Prison and Jail Inmates (Department of Justice, Bureau of Justice Statistics, 2006); https://www.bjs.gov/content/pub/pdf/mhppji.pdf [bjs.gov]

  28. 28.

    Fuller Torrey, E., Kennard, A. D., Eslinger, D., Lamb, R. & Pavle, J. More Mentally Ill Persons Are in Jails and Prisons Than Hospitals: A Survey of the States (Treatment Advocacy Center and National Sheriffs’ Association, 2010); http://tulare.networkofcare.org/library/final_jails_v_hospitals_study1.pdf [networkofcare.org]

  29. 29.

    Holtzen, H., Klein, E. G., Keller, B. & Hood, N. Perceptions of physical inspections as a tool to protect housing quality and promote health equity. J. Health Care Poor Underserv.27, 549–559 (2016).

  30. 30.

    Klein, E., Keller, B., Hood, N. & Holtzen, H. Affordable housing and health: a health impact assessment on physical inspection frequency. J. Public Health Manage. Practice21, 368–374 (2015).

  31. 31.

    Athey, S. Beyond prediction: using big data for policy problems. Science355, 483–485 (2017).

  32. 32.

    Glaeser, E. L., Hillis, A., Kominers, S. D. & Luca, M. Crowdsourcing city government: using tournaments to improve inspection accuracy. Am. Econ. Rev.106, 114–118 (2016).

  33. 33.

    Levin, H. M. & Belfield, C. The Price We Pay: Economic and Social Consequences of Inadequate Education (Brookings Institution, 2007).

  34. 34.

    Atwell, M. N., Balfanz, R., Bridgeland, J. & Ingram, E. Building a Grad Nation (America’s Promise Alliance, 2019); https://www.americaspromise.org/2019-building-grad-nation-report [americaspromise.org]

  35. 35.

    Lakkaraju, H. et al. A machine learning framework to identify students at risk of adverse academic outcomes. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 1909–1918 (ACM, 2015).

  36. 36.

    Aguiar, E. et al. Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In Proc. Fifth International Conference on Learning Analytics and Knowledge (eds Baron, J., Lynch, G. & Maziarz, N.) 93–102 (ACM, 2015).

  37. 37.

    Bowers, A. J., Sprott, R. & Taff, S. A. Do we know who will drop out? A review of the predictors of dropping out of high school: precision, sensitivity, and specificity. High School J.96, 77–100 (2012).

  38. 38.

    Morgan, I. & Amerikaner, A. Funding Gaps 2018 (The Education Trust, 2018); https://edtrust.org/wp-content/uploads/2014/09/FundingGapReport_2018_FINAL.pdf [edtrust.org]

  39. 39.

    Hurza, M. What Do Teachers Spend on Supplies (Adopt a Classroom, 2015); https://www.adoptaclassroom.org/2015/09/15/infographic-recent-aac-survey-results-on-teacher-spending/ [adoptaclassroom.org]

  40. 40.

    Ghani, R. Triage (Center for Data Science and Public Policy, 2016); http://www.datasciencepublicpolicy.org/projects/triage/ [datasciencepublicpolicy.org]

  41. 41.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011).

  42. 42.

    Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography40, 913–929 (2017).

  43. 43.

    Ye, T. et al. Using machine learning to help vulnerable tenants in New York City. In Proc. 2nd Conference on Computing and Sustainable Societies (COMPASS) (eds Chen, J., Mankoff, J. & Gomes C.) 248–258 (ACM, 2019).

  44. 44.

    Rodolfa, K. T. & Lamba, H. dssg/peeps-chili: release for trade-offs submission. Zenodohttps://doi.org/10.5281/zenodo.5173254 [doi.org] (2021).

We thank the Data Science for Social Good Fellowship fellows, project partners and funders, as well as our colleagues at the Center for Data Science and Public Policy at University of Chicago for the initial work on projects that were extended and used in this study. We also thank K. Amarasinghe for helpful discussions on the study and drafts of this paper. Parts of this work were funded by the National Science Foundation under grant number IIS-2040929 (to K.T.R. and R.G.) and by a grant (unnumbered) from the C3.ai Digital Transformation Institute (to K.T.R., H.L. and R.G.).

Affiliations Contributions

K.T.R. And R.G. conceptualized the study. K.T.R. designed the methodology, contributed to the software and investigation and wrote the original draft. H.L. contributed to the software and investigation, and reviewed and edited the manuscript. R.G. supervised the study, acquired funding and edited and reviewed the manuscript.

Corresponding author

Correspondence to Rayid Ghani.

Competing interests

The authors declare no competing interests.

Peer review informationNature Machine Intelligence thanks Nikhil Garg, Kristian Kersting and Allison Koenecke for their contribution to to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information [springer.com]

Supplementary Discussion, Figs. 1–7 and Tables 1–4.

Reprints and Permissions [copyright.com]

Cite this article

Rodolfa, K.T., Lamba, H. & Ghani, R. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat Mach Intell3, 896–904 (2021). " rel="url2html-24389">https://doi.org/10.1038/s42256-021-00396-x

Journal Reference:
ChouldechovaAlexandra . Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, Big Data (DOI: 10.1089/big.2016.0047 [doi.org])
JENNIFER L. SKEEM, CHRISTOPHER T. LOWENKAMP. RISK, RACE, AND RECIDIVISM: PREDICTIVE BIAS AND DISPARATE IMPACT*, Criminology (DOI: 10.1111/1745-9125.12123 [doi.org])
Dissecting racial bias in an algorithm used to manage the health of populations, Science (DOI: 10.1126%2Fscience.aax2342 [doi.org])
Ramachandran, Arthi, Kumar, Avishek, Koenig, Hannes, et al. Predictive Analytics for Retention in Care in an Urban HIV Clinic [open], Scientific Reports (DOI: 10.1038/s41598-020-62729-x [doi.org])
Margaret Hamilton. People with Complex Needs and the Criminal Justice System, Current Issues in Criminal Justice (DOI: 10.1080/10345329.2010.12035888 [doi.org])
Holly Holtzen, Elizabeth G. Klein, Brittney Keller, et al. Perceptions of Physical Inspections as a Tool to Protect Housing Quality and Promote Health Equity, Journal of Health Care for the Poor and Underserved (DOI: 10.1353/hpu.2016.0082 [doi.org])
Journal of Public Health Management and Practice, (DOI: 10.1097/PHH.0000000000000138 [doi.org])
Beyond prediction: Using big data for policy problems, Science (DOI: 10.1126%2Fscience.aal4321 [doi.org])
Glaeser, Edward L., Hillis, Andrew, Kominers, Scott Duke, et al. Crowdsourcing City Government: Using Tournaments to Improve Inspection Accuracy, American Economic Review (DOI: 10.1257/aer.p20161027 [doi.org])
Alex J. Bowers, Ryan Sprott, Sherry A. Taff. Do We Know Who Will Drop Out?: A Review of the Predictors of Dropping out of High School: Precision, Sensitivity, and Specificity, The High School Journal (DOI: 10.1353/hsj.2013.0000 [doi.org])
David R. Roberts, Volker Bahn, Simone Ciuti, et al. Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure [open], Ecography (DOI: 10.1111/ecog.02881 [doi.org])
Kit Rodolfa, hemanklamba. dssg/peeps-chili: Release for trade-offs submission, (DOI: 10.5281/zenodo.5173254 [doi.org])
Rodolfa, Kit T., Lamba, Hemank, Ghani, Rayid. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy, Nature Machine Intelligence (DOI: 10.1038/s42256-021-00396-x [doi.org])


Original Submission