Today, research has suggested that women are
significantly less likely to make the news compared to men. In the most recent
report published by the Global Media Monitoring Project
(GMMP), the largest and longest running research on gender in the world’s news media, women were found to
make up just 24% of news subjects and sources reported. According to this report, this number has not changed
since 2010.
In the context of news, headlines introduce, frame and contextualize a news
story. Furthermore, research within the fields of educational and experimental psychology has demonstrated that
news headlines can have a disproportionate impact on
the reader’s mind, and that misleading headlines can
bias readers toward a specific interpretation.
So, if women are underrepresented in the news to begin with, what does it look like when
women do make headlines? And how have headlines about women changed over time?
To explore these questions, we have visualized the language used in women-centered
headlines and how this language has (or has not) changed over time. Using keywords associated with the word
“woman” (like girl, mother and lady), we collected and analyzed 382,139 headlines published between 2005 and
2021 by the top English-language news publications and news agencies in four countries: The United States of
America (US), India, South Africa, and the United Kingdom (UK). A total of 186 publications were considered
(i.e. 24 publications in South Africa, 51 publications in India, 57 publications in the UK, and 54 publications
in the US).
Be it empowerment or crime & violence, headlines are designed to get the attention of the
reader. Oftentimes, headlines can inspire the reader to care about something, they can inform the reader about
important world events, or they can present the reader with shocking imagery. Below, you can explore some
headlines and see for yourself.
Using data from SimilarWeb we then tied the monthly viewership of every
publication to the average polarity score of their women-centered headlines. While all outlets sensationalize
their news to some extent, news outlets on the left end of the spectrum (i.e. less sensational) tend to be the
ones who focus on either financial news, like Bloomberg in the United States and LiveMint in India, or on tech
news, such as TechRadar and CNET. Nature, a predominantly scientific publication, is the least sensational but
it also has a more limited reach.
BBC and The New York Times are the largest publications with the least sensational
headlines compared to the Daily Mail, Huffington Post, Fox News or Aaj Tak who publish more shock value
headlines.
Filter by
News outlets arranged by polarity score
← Less Polarizing
More Polarizing →
Read more about our polarity calculations
We measure polarity by performing sentiment analysis on each
headline using the Vader python package, where each headline gets a sentiment score from -1 to 1 (from
more negative to more positive). Because we are interested in polarity, we take the absolute value of
each headline’s score.
While the theme of crime and violence got us delving
into how sensational women-centered headlines are, the theme of gendered language led to the idea of measuring
bias.
Explicit use of gendered language in English — words like “actress,” “congresswoman” or
“landlady” — emphasizes the gender of the subject when there is no need to do so. Research from Yasmeen Hitti et al. has suggested that both gendered language
and words that reinforce societal and behavioral stereotypes, such as “beautiful,” “emotional,” “supportive”
or “dramatic,” add to the bias of a sentence. Using their research methodology, we attributed a bias score to
each headline.
For example, the headline that reads “Daughter
in emotional meeting with woman given life back by selfless courage of her dead mother” gets a higher
bias score than the headline that reads “Hillary
Clinton speaks out for the same American values upheld in retracted embassy statement.” In the chart
below, we visualize this bias index for each publication. In contrast to our results for polarity, there is a
greater variance in bias scores across publications. The Daily Mail scores the highest while the BBC and ESPN
are among those who score the lowest.
Go ahead,
think these headlines are extremely gendered.
News outlets arranged by bias index
← Less Biased
More Biased →
Read more about our bias calculations
We measure gender bias by tracking the combined occurrence of
gendered language and social stereotypes usually associated with women. We do this in two steps:
1) We check if a headline contains gendered language (i.e. “spokeswoman,”
“chairwoman,” “she,” “her,” “bride,” “daughter,” “daughters,” “female,” “fiancee,” “girl,” “girlfriend”
etc.).
2) If it contains gendered language, we then count the number of words that are
considered to be social stereotypes about women (i.e. “weak,” “modest,” “virgin,” “slut,” “whore,”
“sexy,” “feminine,” “sensitive,” “emotional,” “gentle,” “soft,” “pretty,” “bitch,” “sexual” etc.).
Finally, we normalize this count for all headlines within each outlet as a score
between 0 and 1, and we aggregate (i.e. average) this score for each outlet.
In this final chart, we have visualized how the words
used in headlines about women have changed over time.
Among other trends that can be observed from this chart, we found that while the use of
many gendered words (e.g.“sexy,” “fat,” “housewife” or “gossip”) has faded out over time, the use of empowering
words has increased over time (e.g. “founder,” “activist,” “leader” or “appoint”). Other words (e.g. “death,”
“hurt,” and “drama”) have instead stood the test of time, as their use has remained consistent since 2005.
For each word’s ebb or flow, we tried to find a “remember when” memory to explain it.
Remember when Caitlyn Jenner came out as transgender? That was part of a wave of increased trans visibility that
helps to explain why “transgender” shot up in 2015. Remember when the #MeToo movement took off? That adds
context to the sharp rise of “harassment” in 2017, and the sharp rise of the word “equality” in recent years.
Such world events are arranged as bubbles in the timeline above the chart.
If you see an interesting rise, hover over one such bubble to see if you can find a world
event that can explain it. If you think that we’ve missed out on an important event in your part of the world,
let us know.
Filter by
The story of when women make headlines is, like most stories about people, full of
contradictions. It is violent, sensational, biased, hopeful and empowering although not all of them in equal
measure. This visual essay suggests that headlines used to report women-centered news can be biased and can
reinforce existing stereotypes. These headlines also tend to be more sensational than for other news topics, and
they tend to represent women in situations of crime and violence. As a growing body
of research
has already indicated,
this could imply that women are not only underrepresented in the news but also mis-represented.
Nonetheless, this visual essay also suggests that some progress has been made. Over time,
we saw that the use of many empowering words has risen sharply while the use of some gender stereotypes has
plummeted. Let’s hope this trend continues and, in the meantime, enjoy our news with a little grain of salt.
After all, when women make headlines, no words, sensational or not, biased or not, can truly explain the nuance
behind the event because words can only approximate.
Methods
To build the dataset of headlines, we scraped data from Google News, using
RapidAPI, from the
most visited publications and news agencies for readers in the US, the UK, India and South Africa according to
SimilarWeb (as of 2021-06-06). To collect this data, we
queried RapidAPI for headlines containing one or more of the following keywords: women, woman, girl, female,
lady, ladies, she, her, herself, aunt, grandmother, mother, sister, daughter, wife, mom, mum, girlfriend, mrs,
niece. As a result, our analysis encompasses 24 publications in South Africa (18,594 headlines), 51 publications
in India (138,590 headlines), 57 publications in the United Kingdom (109,286 headlines), and 54 publications in
the United States (115,669 headlines).
Gendered language and bias calculation: To categorize words used in
headlines as gendered, we manually curated two dictionaries — gendered words about women (words that are
explicitly gendered in the English language, such as “actress,” “waitress,” “congresswoman,” “landlady” or
“mother”) and words that denote societal and behavioral stereotypes about women (words like “beautiful,” “sexy,”
“pregnant,” or “emotional”). This was curated using existing research from Huimin Xu and team, published under
the title “The Cinderella Complex: Word embeddings reveal
gender stereotypes in movies and books” and the incredible research done by The
Swaddle team. These dictionaries can be found here. The
methodology used to calculate bias was borrowed from the research done by Yasmeen Hitti and team, published
under the title “Proposed Taxonomy for Gender Bias in Text.”
Theme dictionaries: To categorize words used in headlines as part of
a theme (i.e. crime and violence, empowerment, race, ethnicity and identity, people and places) we manually
curated four dictionaries. These dictionaries can be found here. In
cases where a word had more than one contextual usage (like “head” or “chair”), we only classified them inside a
theme if they belonged to that theme in no less than 90% of the cases. To analyze words and textual elements
found in headlines, we used existing Natural Language Processing packages for Python (i.e. spacy, gensim, word2number, pycontractions, bs4, unidecode, textblob, nltk).
Polarity analysis: To analyze the polarity of each headline we used
vaderSentiment. For the comparison of polarity between
women-centered headlines and all other headlines, we scraped headlines using no keyword tags from the most
visited publications and news agencies from readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). With the use of such data,
we were able to calculate baseline polarity scores for each news publication and news agency. Though
constituting a representative sample of headlines, the number of headlines we used to calculate this baseline
polarity is roughly equal to one third of the number of headlines that we used to calculate polarity for
women-centered headlines.
With regards to the stacked bar chart (in the scrollytelling section), there were far
more than 1,231 unique words in the original dataset. For visual and readability purposes, however, we only
retained the 1,231 words that were most frequent and that were common across the four countries studied.
All of the data used for this essay is available in this Github repo.
We collaborated with Jan Diehm,
Rob Smith, Russell Samora, and Michelle McGhee for the piece and we’re quite grateful
and happy for how it turned out!