March 30, 2025

ikayaniaamirshahzad@gmail.com

The Millennial Question


METHOD

We used the Event Registry API to scrape news articles about Millennials. The query filtered on news
articles with the word “Millennials”, “millennials”, “Millennial”, or “millennial” in the headline
published
between June 15, 2015 and June 15, 2019. This query yielded nearly 38,000 articles. We obtained article
metadata, including the URL, title, body, and publishing date from the query. Sometimes, multiple news
outlets in the same media family publish the same article; removing these duplicates yielded a total of
26,565 articles.

We used the Spacy Python package to part-of-speech tag the headline text. Part-of-speech tagging
identifies
each word’s part-of-speech in the sentence (e.g., a noun versus a verb versus an adverb). We filtered on
articles headlines in which Millennials perform an action (“Millennials are killing the napkin
industry’”,
for instance). Narrowing our focus made it easier to identify the focus of their love and/or
destruction.
Using the newly tagged headlines, we subsetted the main dataset on headlines where “millennials” is the
subject noun of the sentence, yielding 12,500 articles. Of these articles, we also removed articles with
less than five sentences in the body.

The objects you can explore are the noun chunks Spacy identified as the first direct object in the
headline.
We opted to look at noun chunks instead of just nouns to get a complete picture of the items Millennials
are
interacting with. Noun chunks include adjectives plus nouns, such as “second home” instead of “home”.
This
method left us with about 4,000 unique nouns and 2,000 unique verbs.



Source link

Leave a Comment