Tag Archives: data journalism

Lessons in technology for the journalist crowd

The Columbia Journalism Review has some helpful tips for sniffing out hoaxes:

The audience received a thorough primer on forensic image analysis techniques. Recalling doctored images from “superstorm” Sandy and a hoax video of an eagle grabbing a baby that went viral in December, Farid and Clinch walked through methods and tips for quickly identifying phony images. Reflections and shadows in photographs have to line up according to basic optical principals, for instance. Another rule of thumb: Any picture with an unusually placed shark is fake.

Twitter truth-telling

Slate reports on new research that raises the potential of machine-aided Twitter reading — that is, initial vetting of tweets for veracity, based on certain elements:

A 2010 paper from Yahoo Research analyzed tweets from that year’s 8.8 Chile earthquake and found that legitimate news—such as word that the Santiago airport had closed, that a supermarket in Concepcion was being looted, and that a tsunami had hit the coastal town of Iloca—propagated on Twitter differently than falsehoods, like the rumor that singer Ricardo Arjona had died or that a tsunami warning had been issued for Valparaiso. One key difference might sound obvious but is still quite useful: The false rumors were far more likely to be tweeted along with a question mark or some other indication of doubt or denial.

Building on that work, the authors of the 2010 study developed a machine-learning classifier that uses 16 features to assess the credibility of newsworthy tweets. Among the features that make information more credible:

– Tweets about it tend to be longer and include URLs.

– People tweeting it have higher follower counts.

– Tweets about it are negative rather than positive in tone.

– Tweets about it do not include question marks, exclamation marks, or first- or third-person pronouns.

Several of those findings were echoed in another recent study from researchers at India’s Institute of Information Technology who also found that credible tweets are less likely to contain swear words and significantly more likely to contain frowny emoticons than smiley faces.

But won’t many chronic Twitter liars simply absorb these lessons and tailor their tweets to trick the new algorithm? (Say that last part five times fast.)

What does Top Chef have in common with the 2012 presidential election?

Screen Shot 2012-12-13 at 10.52.11 PM

Although it has existed for some time, I started playing around with Google Trends today and discovered some very useful tools. One is called Correlate, and it allows users to enter time-series or state-level data and then query Google’s aggregated search history.

Just for fun, I entered Obama’s 2012 popular vote percentages by state and then queried the database. The term whose search frequency is the most strongly correlated with Obama’s state-level voting percentages (as measured by the Pearson correlation coefficient) is…”top chef,” with a coefficient (r) of 0.8702. As you can see above, on the left is a map of Obama’s voting percentages by state (greener means a higher percentage of people in that state voted for him in the 2012 presidential election), and on the right is a map showing state-level frequencies of the search term “top chef.” They’re really quite similar.

And here are the rest of the top 10, with their corresponding r-values:

2. december 24 (0.8642)

3. july 18 (0.8593)

4. alia (0.8569)

5. december 18 (0.8547)

6. slum (0.8545)

7. lede (0.8495)

8. august 20 (0.8462)

9. july 11 (0.8418)

10. july 22 (0.8414)

In other news, procrastination from final exams is a very real phenomenon.