Peter Varhol, Vacasa
We would like to think that AI-based machine learning systems always produce the right answer within their problem domain. However, in reality, their performance is a direct result of the data used to train them. The answers in production are only as good as that training data.
But data collected by human means, such as surveys, observations, or estimates can have built-in human biases, such as the confirmation bias or the representative bias. Even seemingly objective measurements can be measuring the wrong things or can be missing essential information about the problem domain.
The effects of biased data can be even more insidious. AI systems often function as black boxes, which means technologists are unaware of how an AI came to its conclusion. This can make it particularly hard to identify any inequality, bias, or discrimination feeding into a particular decision.
This presentation explains how AI systems can suffer from the same biases as human experts, and how that could lead to biased results. It examines how testers, data scientists, and other stakeholders can develop test cases to recognize biases, both in data and the resulting system, and how to address those biases.
Key takeaways include:
- How data influences how machine learning systems make decisions.
- How selecting the wrong data, or ambiguous data, can bias machine learning results.
- Why we don’t have insight into how machine learning systems make decisions.
- How we can identify and correct bias in machine learning systems.
Peter Varhol, 2019 Technical Presentation, Paper, Slides