My name is danah boyd and I'm a Principal Researcher at Microsoft Research and the founder/president of Data & Society. Buzzwords in my world include: privacy, context, youth culture, social media, big data. I use this blog to express random thoughts about whatever I'm thinking.

Relevant links:

Archive

Six Provocations for Big Data

The era of “Big Data” has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing information from Twitter, Google, Verizon, 23andMe, Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data. Significant questions emerge. Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?

Kate Crawford and I decided to sit down and interrogate some of the assumptions and biases embedded into the rhetoric surrounding “Big Data.” The resulting piece – “Six Provocations for Big Data” – offers a multi-discplinary social analysis of the phenomenon with the goal of sparking a conversation. This paper is intended to be presented as a keynote address at the Oxford Internet Institute’s 10th Anniversary “A Decade in Internet Time” Symposium.

Feedback is more than welcome!

Print Friendly, PDF & Email

6 comments to Six Provocations for Big Data

  • Interesting article, danah. I wrote about another possible impact — the use of these data sources in urban policy and planning. Although ‘big data’ is a bit far off, we’re already seeing much more “alternative” data sources from crowdsourcing and private vendors.

  • Leo

    Danah, I made a note already on G+, but in terms of feedback – this paper is actually now circulating around our project. Briefly, here in VA we are working in developing a Longitudinal Data System which will share info between a variety of agencies, like Employment Commission and Colleges. So, obviously, some of the concerns you raise are important to us as well. I may have some other questions later, as we digest some of content.

    Of less importance, there is a typo on p. 3 3rd para. It says “Bit Data”, and I am guessing it should be “Big Data”? it actually is funny, since there probably should be Bit Data out there, no?

    Tks

  • Derek

    Whooooo! bring on the age of Big Data! Finally my data ninja skills (SAS, R, Java) will be in demand by people with lots of funding!

    Now if only i could find a use for my minor in bee keeping…..

  • anonmouse

    How about using ‘Justify’ as the formatting next time? This is brutal to read.

  • A very interesting paper, danah. I translated into Spanish an excerpt to use it in a post in my blog regarding how representative are Twitter trending topics. Right now in Chile is going on some kind of battle that confronts those defending the students demands and those defending the government response. Usually, when a giving hashtag doesn’t reach the Chilean TT, users claim that Twitter is manipulating the trending. In order to clarify what really happens, I find your quote about the representativeness of Twitter extremely useful.

    You can access the post here:
    http://www.cadaunadas.net/2011/09/twitter-no-nos-representa.html
    Thanks.

  • Shalini

    Danah,
    Nice ideas, but given the fact that (especially in the social sciences) using theories has not been the most effective solution, using evidence based (in this case big data) conclusions may be worth a try.
    Also, if you have read Koza’s work on genetic algorithms , then big data may help with solutions for today while we are still trying to find the WHY of it.
    Also, inferential statistics may not be relevant any more and hopefully researchers with poor skills in that area will not be penalized or jump to erroneous conclusions.
    Keep up the thinking.
    Shalini