Living in a World of Biased Data

Anja Thiem
Limebit GmbH
Published in
4 min readJul 21, 2020

--

Intersectionality in Data Science

  • this article was written by my former colleague Stella Heine (♥︎)

Let’s do a little thought experiment: Think of your history classes back in school and try to come up with a single woman that was presented as overly influential or significant for a longer historical period- do you remember any? Did you have any role models or did you feel represented by any historical characters in your classes? If not then you are most probably a woman.

Caroline Criado Perez demonstrated in her book “Invisible Women: Exposing Data Bias in a World Designed for Men” from 2019 most strikingly that the history of humankind is a big data gap, more specifically a gender data gap. History writing is almost exclusively male and does not account sufficiently for women’s role in the world. When reading history books, one could overcome the impression that there were almost no women contributing to any historically significant events which is simply not reflecting reality. By using the example of a study in the US, Perez illustrates the male bias in history by demonstrating that only an average 10.8% pages per text referenced women in introductory political-science books.

Approximately one half of the world population constitutes women, nevertheless their presence is hardly acknowledged in history. There are several practical examples that are evidence that women’s existence is not sufficiently paid attention to. This can either concern the fact that phones are often too big for women’s hands or that several drugs were never tested on female bodies and many more. But what is the reason for that? Is it coincidence or a deliberate decision?

Perez points out that Simone de Beavoir’s analysis of the woman as ‘the other’ is still true for today. She argues that the male bias is not necessarily a deliberate decision but rather origins in a so-called ‘male default’. Disappointingly, the ‘male default’ is often confused with the actual truth which makes it even harder to escape the gender data gap.

When we talk about gender we mean the socially constructed ideas we ascribe to a human body. Perez explains precisely why gender and not sex is the reason women are excluded from data. According to her the gender data gap is ’cause and consequence of the unthinking that perceives humanity as almost exclusively male’. Thus, the male bias is not a deliberate exclusion of women. It rather happens because male-biased data is considered ‘objective’. Therefore, the male default in human history can be cause and consequence of the gender data gap.

Nowadays, human data is used to train algorithms, hence the gender data gap becomes increasingly significant as well as potentially harmful. Artificial Intelligence (AI) is helping doctors with diagnoses but what happens if the data they are trained with is almost exclusively collected from men? AIs are helping to conduct interviews with job applicants but what happens if they are trained with data that is structurally discriminating against people because of their skin color, age or gender?

As AIs only reproduce patterns of input, the ‘male default’ can be a cause of unintended discrimination. Joy Buolamwini and Timnit Gebru explore the bias in human data in their article ‘Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification’ from 2018. In their study they categorise people according to their gender and skin color. Subsequently, they compare three automated facial analysis algorithms and datasets with respect to those categories. Most strikingly women and people with darker skin colour had the highest error rates headed by dark skinned women. This is evidence for the structural discrimination which is deeply rooted in human data. Here, looking at intersectionality is overly important because discrimination does not accumulate but is rather intermeshed with other forms of discrimination which reveals new forms of discrimination.

The fact that women in general and people with darker skin colors are systematically discriminated against is not a coincidence. Kimberlé Crenshaw coined the term ‘intersectionality’ in the 1980s which means that different forms of discrimination can occur in one person. Recognising the importance of intersectionality when looking at human data is thus highly necessary in order to remove possible biases from the outcome.

Removing the gender data gap might not be possible because we lack gender specific data, however removing bias from AIs could be possible. As long as structural discrimination is inseparable from human data, AIs need to ignore those variables that are potentially discriminatory. Moreover, we need to pay more attention to women when collecting data in order to be able to raise awareness for female issues and to be able to potentially close the gender data gap in the future.

  • Caroline Criado Perez, Invisible Women: Exposing Data Bias In a world Designed For Men, (London: Penguin Random House, 2019)

Thanks you, Stella, for this beautiful article and your drawings!

--

--

Anja Thiem
Limebit GmbH

Organisational Development Manager at Limebit GmbH