Hey Siri: Why does artificial intelligence work better for White men?

How language in AI reinforces bias and good practices for inclusive AI

By Genevieve Smith and Julia Nee

You are creating an artificial intelligence (AI)-powered tool called HireMagic to make the hiring process more efficient. Using machine learning (ML), the tool transcribes and analyzes candidates’ responses to several questions via phone interviews to determine job and organizational fit. In testing the product, you collect samples from men and women reading identical responses to the interview questions from a script. You expect the candidates to be scored equally, but are surprised to find that — despite both male and female respondents providing the same responses — the female candidates are consistently ranked below the male candidates. How did the AI system develop a bias against female applicants?

Language runs through AI — in data, data labels, and language-specific applications like natural language processing (NLP). These systems are susceptible to the same harms that occur in human communication, including reflecting and reinforcing harmful biases. But AI can also be leveraged to advance inclusion, equity, and justice.

Beyond being good for individuals and society, advancing language that supports equity and inclusion within AI and ML can also lead to a more inclusive product experience, while better reflecting a company’s mission, ethical principles, responsible innovation commitments, and stated product goals. This can enhance user trust and brand reputation, while mitigating risk — both reputational and regulatory.

So what are the strategies that leaders within organizations and businesses can take to leverage language for inclusion and equity in and through AI systems?

Some brief context

Before diving into solutions, it’s important to understand the ways that language can embed bias and discrimination, which can then be replicated and advanced through AI systems. We highlight 5 key ways here:

Words we choose matter — our language choices can reinforce stereotypes or help drive equity and inclusion. For example, based on unfounded stereotypes, Black women are often described as “angry.” This harmful and baseless stereotype can appear in the words annotators use to label images of Black women. Harmful biases like this are picked up and reinforced by AI systems learning from that data. In Algorithms of Oppression, Safiya Umoja Noble illustrated this issue in Google search. When typing: “Why are Black women so…” in the search box, top autocomplete suggestions included: “loud” and “lazy”.

AI systems learn from natural human language data that contains such subtle biases in word choice. Natural human language is not an objective reflection of reality, but reflects dominant biases and stereotypes. Large language models today learn from huge amounts of digital information from sources like Wikipedia, Reddit and Twitter. But whose voices are represented? 67% of Reddit users in the US are men and 70% are White. Meanwhile, marginalized individuals experience harassment on platforms like Twitter — including pervasive online abuse against women, particularly Black women. If only some groups are consistently represented in language datasets, the views and biases held by members of those groups may be reproduced by AI systems, whether or not they are true.

Machines pick up on subtle associations between certain words and specific social groups. These description-to-group associations are called “ indexes”. Even if AI systems do not explicitly take sensitive characteristics like race, religion, or gender into account, they can still pick up on indexes based on language use. Machines can then make associations resulting in discrimination and/or amplifying stereotypes. For example: Amazon’s resume-screening algorithm showed bias against women. It favored candidates who used words like “executed” or “captured”, words more common in male engineers’ resumes. The result: some forms of bias remained unchecked. Eventually, the company scrapped the project altogether.

Language is contextual and changes over time. But language datasets may not adequately include or reflect updated language conventions. Relatedly, NLP systems may be able to pick up patterns, but don’t necessarily understand context. This can be a problem. Slurs used to oppress and harm certain marginalized groups may be reclaimed by those groups. Content filtering / hate speech detection tools often filter out these terms with no caveats, inadvertently filtering out voices of marginalized people.

AI systems perform better for those who speak the language varieties that are well represented in the data it learns from. So, if language data the AI system learns from largely represents those who speak “standardized” varieties of English, it will perform best for them. And no surprise: this means performing best mostly for White men.

Solutions

Where do we go from here to advance language for equity and inclusion in and through AI systems?

Our Responsible Language in AI & ML Guide outlines nine practices for current and future business leaders across the product lifecycle (See Figure 1). These practices start with thinking about the purpose of an AI system and continue through risk mitigation; development; launch and go to market; and ongoing management and refinement.

We won’t delve into all nine, but highlight 3 here:

  1. Embed equity & inclusion in the product purpose from the get-go. The purpose for which a product is developed matters — it informs decisions in how teams design, develop, and manage a project. Business priorities and values are communicated within the organization early on and deeply influence product development. How might the product differ if the primary product purpose is to deliver communication in the most equitable and inclusive way (vs. the most human-like)?
  2. To mitigate risk prior to developing a product, help your team build critical thinking skills for developing AI products using responsible language practices. Support team members in practicing problem solving on issues that might arise (check out this case study we developed to help teams reflect on and discuss potential issues with an AI financial chatbot). As language changes over time — and these challenges are hard — support and cultivate growth mindsets among team members.
  3. When developing AI systems and using off-the-shelf language models or tools, demand transparency around how the data used to build the tool was collected and labeled. Just because a tool is used by other companies, don’t assume that it will work equitably in your specific context. Instead, recognize that you have a responsibility to ensure that the tool is used equitably in your specific application.

Check out all nine practices in the Responsible Language in AI & ML Guide and make a plan to incorporate the ones relevant for you and your team.

Call to action

The HireMagic team scrapped the original tool and started over. This time they established up front that the primary purpose of the tool is to enable the most equitable, inclusive hiring process. Centering this purpose, the firm asked whose voices may be missing in language models or datasets they were interested in using to train the tool. The firm identified and used a training dataset with equitable representation of diverse speakers who were ranked high in job and organizational fit. They were transparent about content and decisions related to the dataset and AI model. They then examined the tool’s performance for different groups. Recognizing this is a learning journey, the firm conducts regular learning activities for the team to build critical thinking skills for developing responsible AI products.

Ultimately responsible AI comes back to good leadership. We must ask: What is the purpose for which we are developing this AI system? Who is participating in design and development? Who benefits? Who may be harmed?

We must also look at ourselves and our teams. Creating an inclusive culture is a prerequisite for creating AI systems that advance equitable language. We must build teams that foster empathy and understanding, and that are able to have tough conversations about the challenges and limitations of tools being developed. It also means making sure that people from marginalized communities are not only heard, but hold positions of decision making and power. Leveraging language for equitable AI is one step, but an important step, to advancing equity in society.

This article is part of the ongoing Inclusive AI blog series in which the Women4AI Daring Circle calls upon expert members of our global Women’s Forum community to share their perspectives on the importance and implementation of Inclusive AI.

Originally published at https://www.linkedin.com.

At the heart of UC Berkeley's Business School, the Center for Equity, Gender, and Leadership educates equity-fluent leaders to ignite and accelerate change.