City Lab

City Lab

By Aarian Marshall

March 24, 2016

Using social media in scientific studies is fraught. Only a certain (younger, more minority-heavy) demographic uses Twitter, for example, and social media posts only capture the behaviors of people who, well, report their behaviors on social media. Still, social media data has shown up in a number of studies since the mid-2000s, including inquiries into wide-ranging public health issues: depression, the spread of influenza, food poisoning. So why not drinking behaviors?

 

A team of computer science researchers from the University of Rochester have done just that, aggregating geotagged tweets about alcohol consumptionusing machine learning techniques—basically, algorithms that identify patterns in data, like repeated tweets of “wOoHoO #beer!!1!” The researchers also had an assist from the real humans behind Amazon’s Mechanical Turk, which pays people to do simple, repetitive, and easily-crowdsourced tasks (like identifying whether a social media post is really about drinking). Together, the machines and people flagged posts with mentions of “wine,” “drank,” “vodka,” “pong,” and “hammered”—among other delectable phrases.

 

Studying drinking in this way is not unprecedented. Social scientists have noted that data gleaned from social media can help reveal the drinking behaviors of teens, who might be more likely to tweet—or, like, Peach—about alcohol use than disclose it in a survey or to a doctor. Social media might help researchers answer a whole host of behavioral questions: where are people most likely to drink, or talk about drinking? Do people who drink regularly move to places with lots of bars and liquor stores, or do lots of bars and liquor stores make the drinkers? Are urban, suburban, and rural drinking behaviors different from each other? The best part of social media data, the University of Rochester researchers note, is that it’s much cheaper to obtain than the stuff from intensive surveys.

In all, the researchers parsed about 2 million geotagged tweets from the greater New York City area—an urban one—and 1.5 million geotagged tweets from suburban Monroe County in upstate New York. Overall, they wrote in apaper recently accepted into the International AAAI Conference on Web and Social Media, they found a positive association between the “rate of alcohol consumption reported among a community’s Twitter users and the density of alcohol outlets.” So, there were higher shares of people tweeting about drinking in the urban, booze-soaked New York neighborhoods than in the less-dense suburbs of Monroe County.

 

The researchers also tried to figure out where people were doing their drinking—at an official drinking establishment, or in their homes? Determining where social media posters live is easier said than done. In the past, other scientists have assumed that people live wherever they tweet between the hours of 1 a.m. and 6 a.m. But the researchers got a little more specific for this study: their people-and-algorithm combination identified geotagged tweets with words like “home,” “bath,” and “sleep.” If a social media user consistently tweeted those sorts of things from one place, the scientists concluded that was their home.

The result? People are actually more likely to drink at home in New York City, even with its profusion of bars, music venues, and clubs. The New Yorkers who traveled to drink didn’t go far: they stayed within about 125 meters of HQ. Folks based around Monroe, however, were more likely to tweet about drinking within driving distance of their homes.

 

University of Rochester researchers used machine learning techniques to determine where tweeters were posting about drinking. (Hossain et al.)

 

None of this is too confusing: Why pay a bazillion dollars to live in walkable New York if you’re not going to make use of your tiny apartment, or at least the local watering hole?

The University of Rochester computer scientists, however, have big plans for this research technique. They write:

Such analyses can teach us who is and isn’t referencing alcohol on Twitter, and in what settings, to evaluate the degree of self-reporting biases, and also help to create a tool for improving a community’s health, given social networks can become a resource to spread positive health behavior. For instance, the peer social network “Alcoholics Anonymous” is designed to develop social network connections to encourage abstinence among the members and establish helpful ties.

It appears there’s scientific virtue in stalking one’s neighbors via social media—if only in disaggregated form.