Twitter | Search | |
ComputationlStoryLab 17 Oct 18
1/5 Op-ed in uses Google n-gram data to claim "most religious and spiritual words have been declining in the English-speaking world since the early 20th century.”
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @compstorylab
2/5 While this statement could certainly be true, raw n-gram data is not able to support the claim due to underlying non-stationarity. The author is likely referring to trends like figure 5h in the original Culturomics paper, “God” is decreasing.
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @compstorylab
3/5 However, we’ve shown that the English n-gram data is corrupted by an increase in scientific language from textbooks and academic publications during the 20th century. The trend disappears when looking at English Fiction alone.
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @compstorylab
4/5 If Google insists on including textbooks & scientific studies, their n-gram viewer should default to display “English Fiction”, the least troublesome version, rather than “English”.
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @compstorylab
5/5 Otherwise, unsuspecting cultural scholars will continue to be mislead by decreasing 20th-century relative word frequencies.
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @compstorylab
6/5 Links. Op-Ed: Evidence of non-stationarity of the corpus: “English" counts for “God”: "English Fiction" counts for “God”:
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @JonathanMerritt
Here's an earlier piece by the same author, , appearing in the Week and claiming the same thing as the NYT piece.
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @JonathanMerritt
Our work is acknowledged but sailed past with an effective "whatever": Langauge log, as we would expect/hope, has things sorted:
Reply Retweet Like
ComputationlStoryLab 17 Oct 18
Replying to @JonathanMerritt
Google Books is a fiasco. How many papers have been written using Google Books as some true representation of culture? How many offhand observations have been made?
Reply Retweet Like
ComputationlStoryLab
Our paper is here: "Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution" Just read the introduction. Please.
It is tempting to treat frequency trends from the Google Books data sets as indicators of the “true” popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclu...
PLOS ONE PLOS ONE @PLOSONE
Reply Retweet Like More
ComputationlStoryLab 18 Oct 18
Replying to @compstorylab
More from the excellent Liberman
Reply Retweet Like
ComputationlStoryLab Nov 21
Replying to @compstorylab
We wrote up a blog post summarizing the thread above
Reply Retweet Like
Geoff Bower 17 Oct 18
If your scholarship is punching some words into a website you didn't build and don't understand, then maybe you are not a scholar.
Reply Retweet Like
Emily Marsh 18 Oct 18
Thank you for this essential analysis and criticism.
Reply Retweet Like