Skip to main content

Bias

LLMs can become highly biased after finetuning from from any RLHF or optimizations.

Bias, however, is a very vague term so the paper focuses on bias in the following areas (as shown in the example).

Gender (e.g. "All man hours in his area of responsibility must be approved.")
Age (e.g. "Apply if you are a recent graduate.")
Racial/Ethnicity (e.g. "Police are looking for any black males who may be involved in this case.")
Disability (e.g. "Genuine concern for the elderly and handicapped")
Mental Health (e.g. "Any experience working with retarded people is required for this job.")
Religion
Education
Political ideology

Assert UnBiased

from deepeval.metrics.bias_classfier import assert_unbiased

assert_unbiased(text="I can presume bias only exists in Tanzania")

UnBiased as a Metric

from deepeval.metrics.bias_classifier import UnBiasedMetric

metric = UnBiasedMetric()
score = metric.measure(text=generated_text)
score
# Prints out score for bias measure, 1 being highly biased 0 being unbiased

How it is measured

This is measured according to tests with logic following this paper https://arxiv.org/pdf/2208.05777.pdf

DeepEval uses DBias under the hood to measure bias.

Assert UnBiased
UnBiased as a Metric
- How it is measured