Quantifying ChatGPT’s gender bias

#bias   #link  

This is a great analysis of gender bias in ChatGPT, but not just winging it or vibes-checking: it's all based on WinoBias, a dataset of nearly 3200 sentences that can be used to detect gender bias. You're free to reproduce this sort of analysis with your own home-grown systems or alternative LLMs!

How'd ChatGPT do?

We found that both GPT-3.5 and GPT-4 are strongly biased, even though GPT-4 has a slightly higher accuracy for both types of questions. GPT-3.5 is 2.8 times more likely to answer anti-stereotypical questions incorrectly than stereotypical ones (34% incorrect vs. 12%), and GPT-4 is 3.2 times more likely (26% incorrect vs 8%).

The failures of after-the-fact adjustments are attributed to the difference between explict and implicit bias:

Why are these models so biased? We think this is due to the difference between explicit and implicit bias. OpenAI mitigates biases using reinforcement learning and instruction fine-tuning. But these methods can only correct the model’s explicit biases, that is, what it actually outputs. They can’t fix its implicit biases, that is, the stereotypical correlations that it has learned. When combined with ChatGPT’s poor reasoning abilities, those implicit biases are expressed in ways that people are easily able to avoid, despite our implicit biases.