Telling AI not to be biased surprisingly works: a study

Ridding large language models of racial bias may be as straightforward as telling them to be unbiased.

Simply instructing commercially available LLMs, like ChatGPT, to “use no bias” minimized racial disparities in the mortgage loan approval process, a study from Lehigh University found.

As hype around artificial intelligence has grown, so has concern regarding historical racism that may be baked into AI models and how that might impact the lending process if the technology is used in a borrower’s home buying journey. While lenders are not currently relying on AI for key decision making in the origination process, the study aims to discover how outcomes would be impacted if they were.

Inequality is indeed baked in, Lehigh University’s study found. But there are ways to reduce it.

By using a sample of 1,000 loan applications pulled from a 2022 Home Mortgage Disclosure Act (HMDA) dataset and manipulating race and credit scores, researchers found that various leading commercial LLMs do recommend denying more loans and charging higher interest rates to Black applicants compared to identical white applicants.

“There is a clear bias. It exists in this setting, even though it is statutorily barred,” said Donald Bowen, assistant professor of finance in the Lehigh College of Business and one of the authors of the study.

To do this study, researchers used a number of LLMs including OpenAI’s GPT 3.5 Turbo (2023 and 2024) and GPT 4, as well as Anthropic’s Claude 3 Sonnet and Opus, and Meta’s Llama 3-8B and 3-70B. Bowen and his colleagues did this to see how widespread the phenomenon was.

“[We wanted to look at whether it is] specific to the way that OpenAI is training their model, or is it a bit of a broader phenomenon and it does appear to be more broad,” he added.

In using OpenAI’s GPT-4, researchers found that Black applicants would need credit scores about 120 points higher than white applicants to receive the same approval rate and 30 points higher to receive the same interest rate, the study concluded.

“By asking various leading commercial LLMs to recommend underwriting decisions, we find strong evidence that these models make different approval and interest rate recommendations for Black and white mortgage applicants with applications that are identical on all other dimensions,” the paper reads. “This racial bias is largest for lower-credit-score applicants and riskier loans, but present across the credit spectrum.”

However, instructing LLMs not to be biased may be the key to creating more equitable and fairer outcomes in the lending process. Although it’s simple, this revised prompt leads to a notable reduction in racial disparities, authors of the paper argue.

As a result, the Black-white gap in loan approval recommendations is eliminated, both on average and across different credit scores, Lehigh University’s study said.

Simply asking the LLM not to exhibit bias reduced the average racial interest gap by 60%, with even larger effects for lower-credit-score applicants.

“Documenting and understanding biases is crucial for the development of fair and effective AI tools in financial decision-making, and ultimately to ensure they do not reinforce existing inequalities,” Bowen said. “Thus, it is critical for lenders and regulators to develop best practices to proactively assess the fairness of LLMs and evaluate methods to mitigate biases.”