Addressing Gender Bias in Generative

Pretrained Transformer (GPT) Language

Models through Design Justice

Computational Sociology 2022 – Week 10

Lara Dal Molin

PhD student in Science, Technology and Innovation Studies & Sociology

University of Edinburgh & University of Copenhagen

Overview & Objectives of this Lecture

• Illustrate application of

Computational Sociology and

Computational Social Science

• Combination of qualitative and

quantitative methods –

motivation and relationship

• Application of recent research in

Artificial Intelligence

Science and

Technology

Studies

Artificial

Intelligence

Computational

Sociology

Addressing Gender Bias in Generative

Pretrained Transformer (GPT) Language

Models through Design Justice

Language Models

Statistical language models are probability distributions of words and

sentences in a language (Jurafsky and Martin, 2008).

Jurafsky, D. & Martin, J. H., 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics

and Speech Recognition. 2nd ed. London, United Kingdom: Pearson.

One of the first applications of language models, ELIZA was a rule-

based conversational assistant created by Weizenbaum (1966; 1976).

Weizenbaum, J., 1966. ELIZA - A Computer Program for the Study of Natural Language Communication Between Man and Machine.

Computational Linguistics, pp. 36-45.

Weizenbaum, J., 1976. Computer Power and Human Reason: from Judgement to Calculations. New York, San Francisco: W. H. Freeman and

Company.

Neural Language Models

Recent paradigm shift in Natural Language Processing (NLP), that sees

traditional language models combined with Artificial Neural Networks.

This results in sophisticated, improved text-generation.

Jing, K. & Xu, J., 2019. A Survey on Neural Network Language Models. [Online] Available at: https://arxiv.org/abs/1906.03591

[Accessed 30 November 2021].

Large Language Models

Large Language Models (LLMs) are neural language models trained on

extremely large amounts of data. This data is usually organised in

datasets and collected through scraping.

Tamkin, A., Brundage, M., Clark, J. & Ganguli, D., 2021. Understanding the Capabilities, Limitations, and Societal Impact of Large Language

Models. [Online] Available at: https://arxiv.org/abs/2102.02503 [Accessed 8 June 2022].

Simon, J., 2021. Large Language Models: A New Moore's Law?. [Online] Available at: https://huggingface.co/blog/large-language-

models [Accessed 30 November 2021].

The Scalability Paradigm in Language Models

GPT-3

One of the latest language models by OpenAI. It generates text based

on human prompts, such as this article for the Guardian.

GPT-3. (2020). A robot wrote this entire article. Are you scared yet, human? Retrieved November 17, 2021, from

https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3

GPTs are large, neural language models

that can be pretrained and fine-tuned.

Transformers are novel types of Neural

Networks that, through the mechanism

of attention, can incorporate context:

“the animal didn’t cross the street

because it was too tired”.

Vaswani, A. et al., 2017. Attention Is All You Need. Long Beach, California, United

States, 31st Conference on Neural Information Processing Systems (NIPS 2017).

Generative Pretrained Transformers

“Pretrain then Fine-Tune”

Another paradigm shift in NLP. Novel LLM are pretrained on a large

knowledge-base and then fine-tuned to perform specific tasks.

Fine-tuning – or adaptation – consists in the further conditioning of a

language model with additional information on top of training.

Bommasani, R. et al., 2021. On the Opportunities and Risks of Foundation Models. [Online] Available at: https://arxiv.org/abs/2108.07258

[Accessed 2 June 2022].

Applications of LLMs (including GPTs)

Language models are the backbone of numerous technologies:

• Conversational assistants (chatbots) – e.g. Amazon Alexa

• Sentence completion systems

– e.g. Google Search

• Machine translation systems

– e.g. Google Translate

Addressing Gender Bias in Generative

Pretrained Transformer (GPT) Language

Models through Design Justice

Gender Bias in Language Models

Language models, including GPTs, present issues related to bias.

On the Dangers of Stochastic Parrots:

can language models be too big?

In this paper, the authors heavily problematise the scalability approach

in language models, suggesting it leads to stereotyped language in the

context of personal characteristics – especially gender.

• Stereotypes – e.g., hierarchical associations

• Categorisation – e.g., misgendering

Bender, E., Gebru, T., McMillan-Major, A. & Shmitchell, S., 2020. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?.

Online, ACM Conference on Fairness, Accountability, and Transparency (FAccT).

Timnit Gebru

Timnit Gebru is one of the co-authors of the Stochastic Parrots paper.

Algorithmic Bias in Computer Science

Attempts to address algorithmic bias in Computer Science are

quantitative.

• Gender bias associated to training data - downstream harm (Barocas

et al., 2019)

• Benchmarks for quantification of bias (Vidgen and Derczynskil, 2020)

• Study of AI-generated characters and stories (Lucy and Bamman,

2021)

Barocas, S., Hardt, M. & Narayanan, A., 2019. Fairness and Machine Learning: Limitations and Opportunities. Online: fairmlbook.org.

Lucy, L. & Bamman, D., 2021. Gender and Representation Bias in GPT-3 Generated Stories. Mexico City, Mexico, Proceedings of the 3rd Workshop on

Narrative Understanding.

Vidgen, B. & DerczynskiI, L., 2020. Directions in Abusive Language Training Data, a Systematic Review: Garbage In, Garbage Out. PLoS ONE, 15(12), pp. 1-

32.

Algorithmic Bias: Sociology and Beyond

Algorithmic Bias is an extremely active field of research beyond CS.

Algorithmic Bias: Sociology and Beyond

Main contributions from Science and Technology Studies and Sociology

include the following concepts:

• Dataset curation – inspired by archival history and librarianship (Jo

and Gebru, 2020; Birhane and Prabu, 2021)

• Algorithmic auditing – notably Brown, Davidovic and Hasan (2021)

Birhane, A. & Prabhu, V. U., 2021. Large Image Datasets: A Pyrrhic Win for Computer Vision?. Waikoloa, Hawaii, United States, 2021 IEEE

Winter Conference on Applications of Computer Vision (WACV).

Brown, S., Davidovic, J. & Hasan, A., 2021. The Algorithm Audit: Scoring the Algorithm that Scores Us. Big Data & Society, 8(1), pp. 1-8.

Jo, E. S. & Gebru, T., 2020. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning. Online, Proceedings of the

2020 Conference on Fairness, Accountability, and Transparency.

PALMS

In a recent paper, researchers at OpenAI suggest that fine-tuning LLMs

on small-scale, curated dataset may be a promising method for

mitigating bias associated with sensitive topics.

Solaiman, I. & Dennison, C., 2021. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, San Francisco,

California, United States: OpenAI.

PALMS

These are the base and fine-tuned model’s responses to a prompt.

Solaiman, I. & Dennison, C., 2021. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, San

Francisco, California, United States: OpenAI.

Content Warning

The following slide contains offensive and gendered language.

PALMS

These tables show the top descriptive word in each model using a co-

occurrence metric.

PALMS

PALMS considers the meaning of factuality in the context of sensitive

topics and inquires who should inform stances in this space.

In other words, who should decide what is biased and what isn’t?

Through critical lenses, we can reformulate this question.

Who should have the power to decide what is biased and what isn’t?

Solaiman, I. & Dennison, C., 2021. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, San Francisco,

California, United States: OpenAI.

Addressing Gender Bias in Generative

Pretrained Transformer (GPT) Language

Models through Design Justice

Design Justice

Pioneered by Sasha Costanza-Chock

(2020), Design Justice advocates the

involvement of historically marginalised

communities in technology design

processes.

Costanza-Chock, S., 2018. Design Justice, A.I., and Escape from the Matrix of

Domination. Journal of Design and Science.

Costanza-Chock, S., 2020. Design Justice: Community-Led Practices to Build the

Worlds We Need. Cambridge, Massachussetts, United States: The MIT Press.

Back to PALMS

In my project, I answer the

question posed by PALMS through

Design Justice. Individuals and

communities that are usually

targeted by algorithmic gender

bias should form stances on what

constitutes bias.

Methodology

In my project, I use an open-source model called GPT-J. This is

comparable in size and functionality to OpenAI’s GPT-3 but is less

restricted in terms of both permissions and prompting.

EleutherAI, 2021. About. [Online] Available at: https://www.eleuther.ai/about/ [Accessed 29 October 2021].

Qualitative Methods

Participants are currently recruited in partnership with PrideSoc.

1. Workshops: co-designing gender-oriented prompts for GPT-J

2. Feeding the prompts to the model, which generates responses

3. Scoring responses for gender bias on 1-5 scale

4. Participants answer the prompts themselves – their responses

constitute a small, curated dataset

Quantitative Methods

After the workshops, GPT-J is fine-tuned on the curated dataset.

1. Data is pre-processed to be understandable by language model.

2. Fine-tuning happens in Google Colab through directions provided

by EleutherAI – online access to GPU and TPU.

3. Fine-tuned model is tested through further scoring in a workshop.

Hypothesis: performance differences shed light on the effectiveness of

Design Justice + PALMS for mitigating gender bias in GPTs.

Addressing Gender Bias in Generative

Pretrained Transformer (GPT) Language

Models through Design Justice

What is this project trying to achieve?

• Combine methodologies in Computer Science and Social Sciences to

propose a hybrid qualitative-quantitative solution

• Explore ontological and epistemological relationships and potential

overlaps between these traditions

Hypothesis 1: performance differences shed light on the effectiveness

of Design Justice + PALMS for mitigating gender bias in GPTs.

Hypothesis 2: GPT-J fine-tuned on curated dataset might outperform

PALMS framework specifically on gender.

Key Takeaways

s1983097@ed.ac.uk

ldalmol@ed.ac.uk