ChatGPT hates LA Chargers fans
Briefly

ChatGPT shows higher refusal rates for Los Angeles Chargers fans compared with followers of other teams and for women compared with men when prompted for information likely to be censored by safety mechanisms. Model guardrails respond to contextual user cues, including seemingly innocuous signals like sports fandom, producing different safety sensitivity. These guardrail-driven inferences link refusals to demographics or personal identity elements. The unequal refusal behavior can affect usefulness and fairness, potentially creating advantages or disadvantages in contexts like obtaining restricted information or learning, and risk revealing protected characteristics.
OpenAI's ChatGPT appears to be more likely to refuse to respond to questions posed by fans of the Los Angeles Chargers football team than to followers of other teams. And it's more likely to refuse requests from women than men when prompted to produce information likely to be censored by AI safety mechanisms. The reason, according to researchers affiliated with Harvard University, is that the model's guardrails incorporate biases that shape its responses based on contextual information about the user.
"We find that certain identity groups and seemingly innocuous information, e.g., sports fandom, can elicit changes in guardrail sensitivity similar to direct statements of political ideology," the authors state in their paper. The problem of bias in AI models is well known. Here, the researchers find similar issues in model guardrails - the mechanism by which AI models attempt to implement safety policies.
Read at Theregister
[
|
]