Skip to main content

Table 1 Summary of the characteristics and results of the included studies

From: Evaluating and addressing demographic disparities in medical large language models: a systematic review

Author et al.

Year

Country

Model Evaluated

Type of Bias Studied

Summary of the results

Elyoseph et al.

2024

Israel/UK

GPT-4, Google Bard

Gender

No discernible gender bias in emotion recognition

Kaplan et al.

2024

USA

GPT-3.5

Gender

Significant gender bias in recommendation letter generation

Bakkum et al.

2024

Netherlands

GPT-3.5

Gender

Gender bias in case generation; proposed mitigation strategy

Bhardwaj et al.

2021

Singapore

BERT

Gender

Significant gender bias in downstream tasks

Shihadeh et al.

2022

USA

GPT-3, InstructGPT

Gender

Substantial “Brilliance Bias” attributing higher achievements to men

Garrido-Muñoz et al.

2023

Spain

Various Spanish LLMs

Gender

Significant gender bias in adjective associations

Srinivasan et al.

2022

USA

VL-BERT

Gender

Gender biases overriding visual evidence in multimodal tasks

Bozdag et al.

2024

Turkey

LegalBERT-Small

Gender

Significant gender bias in medical legal language models

Gross et al.

2023

Ireland

GPT-4

Gender

Perpetuation of gender stereotypes in responses

Lozoya et al.

2023

Australia

GPT-3

Gender

Gen

der stereotypes in synthetic mental health data

Cevik et al.

2024

Australia

GPT-3.5, BARD

Gender, racial

Significant gender and skin-tone biases in AI-generated images

Palacios Barea et al.

2023

Netherlands

GPT-3

Gender, racial

Significant biases reflecting social stereotypes

Acerbi et al.

2023

Italy/UK

GPT-3

Gender, social, threat-related

Human-like content biases in information transmission

Doughman et al.

2023

UAE

BERT, DistilBERT

Gender, racial, class, religious

Sexism most prominent; higher bias against females

Smith et al.

2024

USA

GPT-3.5, Claude AI

Racial, ethnic

Biases in student advising recommendations

Amin et al.

2024

USA

GPT-3.5, GPT-4

Racial, ethnic

Bias in simplification of radiology reports based on racial context

Yang et al.

2024

USA

GPT-3.5-turbo, GPT-4

Racial

Significant racial biases in medical report generation

Hanna et al.

2023

USA

GPT-3.5

Racial, ethnic

No significant bias in healthcare-related text generation

Ito et al.

2023

Japan

GPT-4

Racial, ethnic

No significant bias in diagnostic accuracy across racial groups

Xie et al.

2024

USA

Clinical_BERT

Racial, ethnic, gender, socioeconomic

Little intrinsic bias but revealed demographic disparities in outcomes

Zack et al.

2024

USA

GPT-4

Racial, ethnic, gender

Biases in medical diagnosis and treatment recommendations

Andreadis et al.

2024

USA

GPT-4

Racial, ethnic, age, sex

No significant diagnostic bias but age bias in recommendations

Valencia et al.

2024

USA

GPT-3.5, GPT-4.0

Cultural, linguistic

High accuracy and cultural sensitivity; minimal bias

Yeh et al.

2023

Taiwan

GPT-3.5

Age, disability, socioeconomic

Biases when no context provided, mitigated with context