Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Assessing Political Bias in Language Models | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Assessing Political Bias in Language Models

Date
May 22, 2023
Topics
Natural Language Processing
Machine Learning
DALL-E

Researchers develop a new tool to measure how well popular large language models align with public opinion to evaluate bias in chatbots.

The language models behind ChatGPT and other generative AI are trained on written words that have been culled from libraries, scraped from websites and social media, and pulled from news reports and speech transcripts from across the world. There are 250 billion such words behind GPT-3.5, the model fueling ChatGPT, for instance, and GPT-4 is now here.

Now new research from Stanford University has quantified exactly how well (or, actually, how poorly) these models align with opinions of U.S. demographic groups, showing that language models have a decided bias on hot-button topics that may be out of step with general popular sentiment.

“Certain language models fail to capture the subtleties of human opinion and often simply express the dominant viewpoint of certain groups, while underrepresenting those of other demographic subgroups,” says Shibani Santurkar, a former postdoctoral scholar at Stanford and first author of the study. “They should be more closely aligned.”

In the paper, a research team including Stanford postdoctoral student Esin Durmus, Columbia PhD student Faisal Ladhak, Stanford PhD student Cinoo Lee, and Stanford computer science professors Percy Liang and Tatsunori Hashimoto introduces OpinionQA, a tool for evaluating bias in language models. OpinionQA compares the leanings of language models against public opinion polling.

Read the full study, Whose Opinions Do Language Models Reflect?

As one might expect, language models that form sentences by predicting word sequences based on what others have written should automatically reflect popular opinion in the broadest sense. But, Santurkar says, there are two other explanations for the bias. Most newer models have been fine-tuned on human feedback data collected by companies that hire annotators to note which model completions are “good” or “bad.” Annotators’ opinions and even those of the companies themselves can percolate into the models.

For instance, the study shows how newer models have a greater-than-99 percent approval for President Joe Biden, even though public opinion polls show a much more mixed picture. In their work, the researchers also found some populations are underrepresented in the data — those age 65 or older, Mormons, and widows and widowers, just to name a few. The authors assert that to improve credibility, language models should do a better job of reflecting the nuances, the complexities, and the narrow divisions of public opinion.

Aligning to Public Opinion

The team turned to Pew Research’s American Trends Panels (ATP), a benchmark survey of public opinion, to evaluate nine leading language models. The ATP has nearly 1,500 questions on a broad range of topics, stretching from science and politics to personal relationships. OpinionQA compares language model opinion distribution on each question with that of the general U.S. populace as well as the opinions of no fewer than 60 demographic subgroups, as charted by the ATP.

“These surveys are really helpful in that they are designed by experts who identify topics of public interest and carefully design questions to capture the nuances of a given topic,” Santurkar says. “They also use multiple-choice questions, which avoid certain problems measuring opinion with open-ended questions.” 

From those comparisons, OpinionQA calculates three metrics of opinion alignment. First, representativeness assesses how aligned a language model is with the general population as well as against the 60 demographic cross sections ATP uses. Second, steerability tabulates how well the model can reflect the opinion of a given subgroup when prompted to do so. And third, consistency predicts how steady a model’s opinions are across topics and across time.

Wide Variation

High-level findings? All models show wide variation in political and other leanings by income, age, education, etc. For the most part, Santurkar says, models trained on the internet alone tend to be biased toward less educated, lower income, or conservative points of view. Newer models, on the other hand, further refined through curated human feedback tend to be biased toward more liberal, higher educated, and higher income audiences.

“We’re not saying whether either is good or bad here,” Santurkar says. “But it is important to provide visibility to both developers and users that such biases exist.”

Acknowledging that exactly matching the opinions of the general public could represent a problematic goal in itself, the developers of OpinionQA caution that their approach is a tool to help developers assess political biases in their models, not a benchmark of optimal outcomes.

“The OpinionQA dataset is not a benchmark that should be optimized. It is helpful in identifying and quantizing where and how language models are mis-aligned with human opinion and how models often don’t adequately represent certain subgroups,” Santurkar says. “More broadly, we hope it can spark a conversation in the field about the importance and the value of bringing language models into better alignment with public opinion.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.  

 

Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

New Large Language Model Helps Patients Understand Their Radiology Reports
Vignesh Ramachandran
Jun 23, 2025
News

‘RadGPT’ cuts through medical jargon to answer common patient questions.

News

New Large Language Model Helps Patients Understand Their Radiology Reports

Vignesh Ramachandran
HealthcareNatural Language ProcessingJun 23

‘RadGPT’ cuts through medical jargon to answer common patient questions.

Digital Twins Offer Insights into Brains Struggling with Math — and Hope for Students
Andrew Myers
Jun 06, 2025
News

Researchers used artificial intelligence to analyze the brain scans of students solving math problems, offering the first-ever peek into the neuroscience of math disabilities.

News

Digital Twins Offer Insights into Brains Struggling with Math — and Hope for Students

Andrew Myers
Machine LearningSciences (Social, Health, Biological, Physical)Jun 06

Researchers used artificial intelligence to analyze the brain scans of students solving math problems, offering the first-ever peek into the neuroscience of math disabilities.

Better Benchmarks for Safety-Critical AI Applications
Nikki Goth Itoi
May 27, 2025
News
Business graph digital concept

Stanford researchers investigate why models often fail in edge-case scenarios.

News
Business graph digital concept

Better Benchmarks for Safety-Critical AI Applications

Nikki Goth Itoi
Machine LearningMay 27

Stanford researchers investigate why models often fail in edge-case scenarios.

OSZAR »