Tag Archives: data

Put an End to Reporting on Election Polls

We now know that the US election polls were wrong. Just like they were in Brexit. Over the last few months, I’ve told numerous reporters and people in the media industry that they should be wary of the polling data they’re seeing, but I was generally ignored and dismissed. I wasn’t alone — two computer scientists whom I deeply respect — Jenn Wortman Vaughan and Hanna Wallach — were trying to get an op-ed on prediction and uncertainty into major newspapers, but were repeatedly told that the outcome was obvious. It was not. And election polls will be increasingly problematic if we continue to approach them the way we currently do.

It’s now time for the media to put a moratorium on reporting on election polls and fancy visualizations of statistical data. And for data scientists and pollsters to stop feeding the media hype cycle with statistics that they know have flaws or will be misinterpreted as fact.

Why Political Polling Will Never Be Right Again

Polling and survey research has a beautiful history, one that most people who obsess over the numbers don’t know. In The Averaged American, Sarah Igo documents three survey projects that unfolded in the mid-20th century that set the stage for contemporary polling: the Middletown studies, Gallup, and Kinsey. As a researcher, it’s mindblowing to see just how naive folks were about statistics and data collection in the early development of this field, how much the field has learned and developed. But there’s another striking message in this book: Americans were willing to contribute to these kinds of studies at unparalleled levels compared to their peers worldwide because they saw themselves as contributing to the making of public life. They were willing to reveal their thoughts, beliefs, and ideas because they saw doing so as productive for them individually and collectively.

As folks unpack the inaccuracies of contemporary polling data, they’re going to focus on technical limitations. Some of these are real. Cell phones have changed polling — many people don’t pick up unknown numbers. The FCC’s ruling that limited robocalls to protect consumers in late 2015 meant that this year’s sampling process got skewed, that polling became more expensive, and that pollsters took shortcuts. We’ve heard about how efforts to extrapolate representativeness from small samples messes with the data — such as the NYTimes report on a single person distorting national polling averages.

But there’s a more insidious problem with the polling data that is often unacknowledged. Everyone and their mother wants to collect data from the public. And the public is tired of being asked, which they perceive as being nagged. In swing states, registered voters were overwhelmed with calls from real pollsters, fake pollsters, political campaigns, fundraising groups, special interest groups, and their neighbors. We know that people often lie to pollsters (confirmation bias), but when people don’t trust information collection processes, normal respondent bias becomes downright deceptive. You cannot collect reasonable data when the public doesn’t believe in the data collection project. And political pollsters have pretty much killed off their ability to do reasonable polling because they’ve undermined trust. It’s like what happens when you plant the same crop over and over again until the land can no longer sustain that crop.

Election polling is dead, and we need to accept that.

Why Reporting on Election Polling Is Dangerous

To most people, even those who know better, statistics look like facts. And polling results look like truth serum, even when pollsters responsibly report margin of error information. It’s just so reassuring or motivating to see stark numbers because you feel like you can do something about those numbers, and then, when the numbers change, you feel good. This plays into basic human psychology. And this is why we use numbers as an incentive in both education and the workplace.

Political campaigns use numbers to drive actions on their teams. They push people to go to particular geographies, they use numbers to galvanize supporters. And this is important, which is why campaigns invest in pollsters and polling processes.

Unfortunately, this psychology and logic gets messed up when you’re talking about reporting on election polls in the public. When the numbers look like your team is winning, you relax and stop fretting, often into complacency.When the numbers look like your team is losing, you feel more motivated to take steps and do something. This is part of why the media likes the horse race — they push people to action by reporting on numbers, which in effect pushes different groups to take action. They like the attention that they get as the mood swings across the country in a hotly contested race.

But there is number burnout and exhaustion. As people feel pushed and swayed, as the horse race goes on and on, they get more and more disenchanted. Rather than galvanizing people to act, reporting on political polling over a long period of time with flashy visuals and constantly shifting needles prompts people to disengage from the process. In short, when it comes to the election, this prompts people to not show up to vote. Or to be so disgusted that voting practices become emotionally negative actions rather than productively informed ones.

This is a terrible outcome. The media’s responsibility is to inform the public and contribute to a productive democratic process. By covering political polls as though they are facts in an obsessive way, they are not only being statistically irresponsible, but they are also being psychologically irresponsible.

The news media are trying to create an addictive product through their news coverage, and, in doing so, they are pushing people into a state of overdose.

Yesterday, I wrote about how the media is being gamed and not taking moral responsibility for its participation in the spectacle of this year’s election. One of its major flaws is how it’s covering data and engaging in polling coverage. This is, in many ways, the easiest part of the process to fix. So I call on the news media to put a moratorium on political polling coverage, to radically reduce the frequency with which they reference polls during an election season, and to be super critical of the data that they receive. If they want to be a check to power, they need to have the structures in place to be a check to math.

(This was first posted on Points.)

Data & Civil Rights: What do we know? What don’t we know?

From algorithmic sentencing to workplace analytics, data is increasingly being used in areas of society that have had longstanding civil rights issues. This prompts a very real and challenging set of questions: What does the intersection of data and civil rights look like? When can technology be used to enable civil rights? And when are technologies being used in ways that undermine them? For the last 50 years, civil rights has been a legal battle. But with new technologies shaping society in new ways, perhaps we need to start wondering what the technological battle over civil rights will look like.

To get our heads around what is emerging and where the hard questions lie, the Data & Society Research Institute, The Leadership Conference on Civil and Human Rights, and New America’s Open Technology Institute teamed up to host the first “Data & Civil Rights” conference. For this event, we brought together diverse constituencies (civil rights leaders, corporate allies, government agencies, philanthropists, and technology researchers) to explore how data and civil rights are increasingly colliding in complicated ways.

In preparation for the conversation, we dove into the literature and see what is known and unknown about the intersection of data and civil rights in six domains: criminal justice, education, employment, finance, health, and housing. We produced a series of primers that contextualize where we’re at and what questions we need to consider. And, for the conference, we used these materials to spark a series of small-group moderated conversations.

The conference itself was an invite-only event, with small groups brought together to dive into hard issues around these domains in a workshop-style format. We felt it was important that we make available our findings and questions. Today, we’re releasing all of the write-ups from the workshops and breakouts we held, the videos from the level-setting opening, and an executive summary of what we learned. This event was designed to elicit tensions and push deeper into hard questions. Much is needed for us to move forward in these domains, including empirical evidence, innovation, community organizing, and strategic thinking. We learned a lot during this process, but we don’t have clear answers about what the future of data and civil rights will or should look like. Instead, what we learned in this process is how important it is for diverse constituencies to come together to address the challenges and opportunities that face us.

Moving forward, we need your help. We need to go beyond hype and fear, hope and anxiety, and deepen our collective understanding of technology, civil rights, and social justice. We need to work across sectors to imagine how we can create a more robust society, free of the cancerous nature of inequity. We need to imagine how technology can be used to empower all of us as a society, not just the most privileged individuals. This means that computer scientists, software engineers, and entrepreneurs must take seriously the costs and consequences of inequity in our society. It means that those working to create a more fair and just society need to understand how technology works. And it means that all of us need to come together and get creative about building the society that we want to live in.

The material we are releasing today is a baby step, an attempt to scope out the landscape as best we know it so that we can all work together to go further and deeper. Please help us imagine how we should move forward. If you have any ideas or feedback, don’t hesitate to contact us at nextsteps at datacivilrights.org

(Image by Mark K.)

Data & Society: Call for Fellows

Over the last six months, I’ve been working to create the Data & Society Research Institute to address the social, technical, ethical, legal, and policy issues that are emerging because of data-centric technological development. We’re still a few months away from launching the Institute, but we’re looking to identify the inaugural class of fellows. If you know innovative thinkers and creators who have a brilliant idea that needs a good home and are excited by the possibility of helping shape a new Institute, can you let them know about this opportunity?

The Data & Society Research Institute is a new think/do tank in New York City dedicated to addressing social, technical, ethical, legal, and policy issues that are emerging because of data-centric technological development.

Data & Society is currently looking to assemble its inaugural class of fellows. The fellowship program is intended to bring together an eclectic network of researchers, entrepreneurs, activists, policy creators, journalists, geeks, and public intellectuals who are interested in engaging one another on the key issues introduced by the increasing availability of data in society. We are looking for a diverse group of people who can see both the opportunities and challenges presented by access to data and who have a vision for a project that can inform the public or shape the future of society.

Applications for fellowships are due January 24, 2014. To learn more about this opportunity, please see our call for fellows.

On a separate, but related note, I lurve my employer; my ability to create this Institute is only possible because of a generous gift from Microsoft.

where “nothing to hide” fails as logic

Every April, I try to wade through mounds of paperwork to file my taxes. Like most Americans, I’m trying to follow the law and pay all of the taxes that I owe without getting screwed in the process. I try and make sure that every donation I made is backed by proof, every deduction is backed by logic and documentation that I’ll be able to make sense of three to seven years later. Because, like many Americans, I completely and utterly dread the idea of being audited. Not because I’ve done anything wrong, but the exact opposite. I know that I’m filing my taxes to the best of my ability and yet, I also know that if I became a target of interest from the IRS, they’d inevitably find some checkbox I forgot to check or some subtle miscalculation that I didn’t see. And so what makes an audit intimidating and scary is not because I have something to hide but because proving oneself to be innocent takes time, money, effort, and emotional grit.

Sadly, I’m getting to experience this right now as Massachusetts refuses to believe that I moved to New York mid-last-year. It’s mindblowing how hard it is to summon up the paperwork that “proves” to them that I’m telling the truth. When it was discovered that Verizon (and presumably other carriers) was giving metadata to government officials, my first thought was: wouldn’t it be nice if the government would use that metadata to actually confirm that I was in NYC not Massachusetts. But that’s the funny thing about how data is used by our current government. It’s used to create suspicion, not to confirm innocence.

The frameworks of “innocent until proven guilty” and “guilty beyond a reasonable doubt” are really really important to civil liberties, even if they mean that some criminals get away. These frameworks put the burden on the powerful entity to prove that someone has done something wrong. Because it’s actually pretty easy to generate suspicion, even when someone is wholly innocent. And still, even with this protection, innocent people are sentenced to jail and even given the death penalty. Because if someone has a vested interest in you being guilty, it’s often viable to paint that portrait, especially if you have enough data. Just watch as the media pulls up random quotes from social media sites whenever someone hits the news to frame them in a particular light.

It’s disturbing to me how often I watch as someone’s likeness is constructed in ways that contorts the image of who they are. This doesn’t require a high-stakes political issue. This is playground stuff. In the world of bullying, I’m astonished at how often schools misinterpret situations and activities to construct narratives of perpetrators and victims. Teens get really frustrated when they’re positioned as perpetrators, especially when they feel as though they’ve done nothing wrong. Once the stakes get higher, all hell breaks loose. In “Sticks and Stones”, Emily Bazelon details how media and legal involvement in bullying cases means that they often spin out of control, such as they did in South Hadley. I’m still bothered by the conviction of Dharun Ravi in the highly publicized death of Tyler Clementi. What happens when people are tarred and feathered as symbols for being imperfect?

Of course, it’s not just one’s own actions that can be used against one’s likeness. Guilt-through-association is a popular American pastime. Remember how the media used Billy Carter to embarrass Jimmy Carter? Of course, it doesn’t take the media or require an election cycle for these connections to be made. Throughout school, my little brother had to bear the brunt of teachers who despised me because I was a rather rebellious students. So when the Boston marathon bombing occurred, it didn’t surprise me that the media went hogwild looking for any connection to the suspects. Over and over again, I watched as the media took friendships and song lyrics out of context to try to cast the suspects as devils. By all accounts, it looks as though the brothers are guilty of what they are accused of, but that doesn’t make their friends and other siblings evil or justify the media’s decision to portray the whole lot in such a negative light.

So where does this get us? People often feel immune from state surveillance because they’ve done nothing wrong. This rhetoric is perpetuated on American TV. And yet the same media who tells them they have nothing to fear will turn on them if they happen to be in close contact with someone who is of interest to – or if they themselves are the subject of – state interest. And it’s not just about now, but it’s about always.

And here’s where the implications are particularly devastating when we think about how inequality, racism, and religious intolerance play out. As a society, we generate suspicion of others who aren’t like us, particularly when we believe that we’re always under threat from some outside force. And so the more that we live in doubt of other people’s innocence, the more that we will self-segregate. And if we’re likely to believe that people who aren’t like us are inherently suspect, we won’t try to bridge those gaps. This creates societal ruptures and undermines any ability to create a meaningful republic. And it reinforces any desire to spy on the “other” in the hopes of finding something that justifies such an approach. But, like I said, it doesn’t take much to make someone appear suspect.

In many ways, the NSA situation that’s unfolding in front of our eyes is raising a question that is critical to the construction of our society. These issues cannot be washed away by declaring personal innocence. A surveillance state will produce more suspect individuals. What’s at stake has to do with how power is employed, by whom, and in what circumstances. It’s about questioning whether or not we still believe in checks and balances to power. And it’s about questioning whether or not we’re OK with continue to move towards a system that presumes entire classes and networks of people as suspect. Regardless of whether or not you’re in one of those classes or networks, are you OK with that being standard fare? Because what is implied in that question is a much uglier one: Is your perception of your safety worth the marginalization of other people who don’t have your privilege?

Regulating the Use of Social Media Data

If you were to walk into my office, I’d have a pretty decent sense of your gender, your age, your race, and other identity markers. My knowledge wouldn’t be perfect, but it would give me plenty of information that I could use to discriminate against you if I felt like it. The law doesn’t prohibit me for “collecting” this information in a job interview nor does it say that discrimination is acceptable if you “shared” this information with me. That’s good news given that faking what’s written on your body is bloody hard. What the law does is regulate how this information can be used by me, the theoretical employer. This doesn’t put an end to all discrimination – plenty of people are discriminated against based on what’s written on their bodies – but it does provide you with legal rights if you think you were discriminated against and it forces the employer to think twice about hiring practices.

The Internet has made it possible for you to create digital bodies that reflect a whole lot more than your demographics. Your online profiles convey a lot about you, but that content is produced in a context. And, more often than not, that context has nothing to do with employment. This creates an interesting conundrum. Should employers have the right to discriminate against you because of your Facebook profile? One might argue that they should because such a profile reflects your “character” or your priorities or your public presence. Personally, I think that’s just code for discriminating against you because you’re not like me, the theoretical employer.

Of course, it’s a tough call. Hiring is hard. We’re always looking for better ways to judge someone and goddess knows that an interview plus resume is rarely the best way to assess whether or not there’s a “good fit.” It’s far too tempting to jump on the Internet and try to figure out who someone is based on what we can drudge up online. This might be reasonable if only we were reasonable judges of people’s signaling or remotely good at assessing them in context. Cuz it’s a whole lot harder to assess someone’s professional sensibilities by their social activities if they come from a world different than our own.

Given this, I was fascinated to learn that the German government is proposing legislation that would put restrictions on what Internet content employers could use when recruiting.

A decade ago, all of our legal approaches to the Internet focused on what data online companies could collect. This makes sense if you think of the Internet as a broadcast medium. But then along came the mainstreamification of social media and user-generated content. People are sharing content left right and center as part of their daily sociable practices. They’re sharing as if the Internet is a social place, not a professional place. More accurately, they’re sharing in a setting where there’s no clear delineation of social and professional spheres. Since social media became popular, folks have continuously talked about how we need to teach people to not share what might cause them professional consternation. Those warnings haven’t worked. And for good reason. What’s professionally questionable to one may be perfectly appropriate to another. Or the social gain one sees might outweigh the professional risks. Or, more simply, people may just be naive.

I’m sick of hearing about how the onus should be entirely on the person doing the sharing. There are darn good reasons in which people share information and just because you can dig it up doesn’t mean that it’s ethical to use it. So I’m delighted by the German move, if for no other reason than to highlight that we need to rethink our regulatory approaches. I strongly believe that we need to spend more time talking about how information is being used and less time talking about how stupid people are for sharing it in the first place.

“Transparency is Not Enough”

At Gov2.0 this week, I gave a talk on the importance of information literacy when addressing transparency of government data:

“Transparency is Not Enough”

I address everything from registered sex offenders to what happens when politicians don’t like data to the complexities of interpretation. In doing so, I make three key points:

Information is power, but interpretation is more powerful
Data taken out of context can have unintended consequences
Transparency alone is not the great equalizer

My talk is also available on YouTube if you prefer to listen to a different version of the same message (since my crib is what I intended to say and the video is what actually came out of my mouth).

danah boyd | apophenia

making connections where none previously existed