From the latest weather updates on Alexa to movie recommendations on Netflix to helping doctors diagnose cancer, AI (artificial intelligence) now impacts all aspects of our lives. But with great power comes great responsibility. It is becoming increasingly important to ensure that decisions made by AI are free from bias and prejudice.
We are seeing a growing need for building responsible AI systems that focus on the tenets of reproducibility, transparency, accountability, privacy and security. In this article we will talk about fairness as a major enabler for transparency.
Fairness is a generic concept not limited to AI. Any decision-making system can exhibit bias towards certain factors and thus ought to be evaluated for fairness. Fairness here is tested by verifying if a bias is ethically valid.
So, what is bias?
Bias is the influence of preconceived notions and prejudices in decision-making. Our brain demonstrates cognitive bias which are the mental shortcuts it takes to quickly make decisions. Examples include confirmation bias (focus on preconceived notions), bandwagon effect (influenced by decisions many others have made) and gambler’s fallacy (future probabilities affected by past events).
Machines or computers making decisions can also exhibit bias. Selection bias is a data bias that occurs when we subjectively collect training data, so it does not provide a true representation of the population. Confirmation bias is when we collect data specifically with a predetermined assumption in mind.
Not all bias is due to data though. Correlation fallacy is an algorithmic bias where we see correlation in data and assume causation. Labeling errors in data can also lead to bias.
How does bias in AI cause harm?
Let’s look at some examples of biased AI systems that have created headlines. The classic case is the ProPublica analysis of Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software. COMPAS was extensively used by the US Justice department to predict recidivism of crime by individuals based on 137 questions that are either answered by defendants or pulled from criminal records. COMPAS is a black-box proprietary system developed by Northpointe and we don’t really know how it makes the decision that could cause defendants to be jailed.
The study by ProPublica showed that this algorithm was biased, and African American offenders were seen almost twice as likely to be labelled a higher risk but not actually re-offend. This was a very popular and highly discussed study in the news and clearly highlighted the need to have systems that are explainable and fair.
Another example is the AI-based recruiting tool used by Amazon that rated male candidates favorably compared to females. Here Amazon did not use any protected attributes (race, gender, age) in the decision-making process. However, since historically this bias existed in the data that was collected it reflected in the algorithm. This proves a major point that fairness through unawareness doesn’t work. You cannot ignore protected attributes and assume your algorithm will not show bias.
So how do we define and measure fairness?
There are many definitions of fairness and many times they conflict with each other. The legal definitions are disparate treatment (intentional) and disparate impact (unintentional). The most common measure of disparate impact is the Four-Fifths Rule which states that if the selection rate for a certain group is less than 80 percent of that of the group with the highest selection rate, there is adverse impact on that group.
The definition more familiar to software folks is fairness through unawareness. Here you omit the protected attributes from your calculations and assume the model will be fair. As we saw in the Amazon case study this doesn’t work.
Many other features are intrinsically related to protected attributes like gender and race and the bias creeps back into your model. A better metric is statistical parity difference that looks at equalizing the selection rate between privileged and unprivileged groups. An alternate metric is equal opportunity where the focus is on trying to get same true positive rate across groups. Here along with selection rate focus, it also makes sure qualified individuals from privileged and unprivileged group get the same treatment.
Sometimes along with true positive rate an effort is made to equalize false positive rate across groups. This is measured through a metric called equalized odds.
It is important to understand the context of the problem you are solving and choose and acceptable fairness metrics in-line with ethical principles and guidelines of your organization. Fairness through unawareness does not work. Its time data scientists clearly define fairness and include appropriate metrics in their ML (machine learning) pipelines.
Dattaraj Rao is chief architect, research and innovation, Persistent Systems. The views in this article are his own.