April 1, 2014

Big Data Power

Posted by Tom Leinster

Guest post by Nils Carqueville and Daniel Murfet

(My university probably isn’t alone in encouraging mathematicians and computer scientists to embrace the idea of “big data”, or in more sober terminology, “data science”. Here, Nils Carqueville and Daniel Murfet introduce their really excellent article on big data in whole-population surveillance. —TL)

In recent years Big Data has become an increasingly relevant topic in the economic sector, for intelligence agencies, and for the sciences.

A particularly far-reaching development made possible by Big Data is that of unprecedented mass surveillance. As Alexander Beilinson, Stefan Forcey, Tom Leinster and others have pointed out, the role that mathematicians and computer scientists play here is central. With this in mind last January we wrote an essay on

stressing some of those aspects of the matter that we think deserve more attention or additional elaboration. We hope this to be a useful contribution to the necessary discussion on modern mass surveillance, and we thank Tom for his efforts in this direction, and for allowing us to post here.

Posted at April 1, 2014 1:38 PM UTC

Re: Big Data Power

Excellent essay: covers all the important points. I’ve just been checking in on the latest developments, so here are some links that might be worth looking into and discussing.

First, president Obama’s new plan, published over the weekend at whitehouse.gov . From other sources, in summary, it ends the bulk collection by the U.S. government and requires court approval for government access to communication records (which will be stored by the companies/providers for 18 months.) (I’m not sure about email records.)

The ACLU released Edward Snowden’s response. Evidently he is pleased–this is only of interest since he probably knows the situation better than or as well as anyone, and so if he thinks we have successfully reached a turning point that is a good sign.

Of course a turning point is not the destination. In the U.S., congress is now up to bat, and so I can only hope for lots of wisdom to prevail (more wisdom than I have anyhow!)

Posted by: stefan on April 1, 2014 6:07 PM | Permalink | Reply to this

Re: Big Data Power

… congress … more wisdom than I have …

Not so sure about that, Stefan :-)

Let me also take this opportunity to gently steer the conversation in a mathematical direction. Much as I’m interested in discussing just about every aspect of this, I don’t want to do that here on this blog. In other words, if too many comments go by without anyone discussing the involvement of mathematicians, computer scientists, etc., then I’ll start getting grumpy and deleting things.

Posted by: Tom Leinster on April 1, 2014 6:58 PM | Permalink | Reply to this

Re: Big Data Power

Ah yes, math. Actually I had been meaning to bring up what I see as a point that needs mathematical clarification. One of the frequent arguments against mass surveillance is that it is ineffective at preventing violent attacks. Mr. Snowden’s statement linked above brings this up. Another argument, as focused upon in the essay of this post, is that the mass surveillance conveys a great deal of power to those that have access to the data. Is there a contradiction here?

My instinct was to admit that mining the large data set could very well be used to make us safer, but that the side effects might be unbearable. Recall the immune system and its problems–the fever can be more dangerous than the virus (I am not an MD, so skip that analogy if it’s old fashioned.) Note that by “safer” I am not trying to say statistically safer as an individual, but rather as a group. We are vulnerable in more ways than physically, since we are bound by empathy. That’s part of what we mean by “terror” attacks.

Perhaps though, there is a way to make precise the difference between using a large data set to find specific information (mining for diamonds), vs a broader search (mining for “interesting rocks”). There might be a nice mathematical question about graph invariants lurking here: searching for subgraphs with a precise magnitude, vs a magnitude in a given range. The latter case would be more applicable to the danger, brought up in this essay, of a black market in personal information. It might be that the government has little interest in trawling for salacious facts, but an amoral individual with access could see things otherwise.

I think it might be worthwhile to quantify this kind of distinction, if at all possible, so as to avoid contradictory arguments. Is mass surveillance simultaneously bad at catching attackers and good at conveying power, or potentially good at both?

Posted by: stefan on April 1, 2014 9:36 PM | Permalink | Reply to this

Re: Big Data Power

One of the frequent arguments against mass surveillance is that it is ineffective at preventing violent attacks. […] Another argument […] is that the mass surveillance conveys a great deal of power to those that have access to the data. Is there a contradiction here?

I don’t think so. It conveys a great deal of power, it just seems to be ineffective at preventing terrorism.

One can imagine a future in which surveilling the whole population is an effective tool for preventing terrorism, and then there’s a discussion to be had. Eugene Lerman and I were talking about that last week. But it seems difficult to have the discussion until some specifics are on the table.

Mathematicians don’t usually have to submit ethical approval forms, that being mostly the domain of experimental psychologists, medics, etc. But perhaps it will become a routine thing in the era of big data. In a way, I hope it will be, because at least it would suggest that the issues are being recognized.

It might be that the government has little interest in trawling for salacious facts, but an amoral individual with access could see things otherwise.

Certainly the latter. Who knows how many NSA employees or contractors use their access for selfish, nefarious purposes? There’s stuff like LOVEINT, but you can easily imagine individuals using their access in really destructive ways.

But it’s also clear that the government (or rather the intelligence agencies) are highly interested in trawling for salacious facts. It’s a history that goes at least as far back as the FBI recording Martin Luther King having extramarital sex and sending the tape to his home (accompanied by a letter trying to persuade him to commit suicide), and continues today with the NSA recording the porn habits of those it wants to be able to discredit, right down to “individuals who do not yet hold extremist views but who are susceptible to the extremist message” (their words).

Posted by: Tom Leinster on April 1, 2014 11:02 PM | Permalink | Reply to this

Re: Big Data Power

The EFF (30c3 video) makes a similar claim: J. Edgar Hoover used this method in a structural way. Moreover, Watergate is not that long ago. History tells us that it is unlikely that such uncontrolled power will not be abused.

Posted by: Bas Spitters on April 2, 2014 7:57 AM | Permalink | Reply to this

Re: Big Data Power

Is there a reason we are not using https ?

Posted by: Bas Spitters on April 2, 2014 8:17 AM | Permalink | Reply to this

Re: Big Data Power

Is there a reason we are not using https ?

None whatsoever.

Posted by: Jacques Distler on April 2, 2014 1:41 PM | Permalink | PGP Sig | Reply to this

Re: Big Data Power

Thanks, I should have tried that myself. I sort of expected HTTPS everywhere and force TLS to take care of this, but apparently they don’t.

The page you link to still gives me mixed content: http://golem.ph.utexas.edu/~distler/blog/images/MathML.png

Posted by: Bas Spitters on April 2, 2014 3:54 PM | Permalink | Reply to this

Re: Big Data Power

The page you link to still gives me mixed content: http://golem.ph.utexas.edu/~distler/blog/images/MathML.png

Fixed, now. Though it seems to me that the MathML logo is a vestige of an older, simpler, time, and could now be dispensed-with.

Posted by: Jacques Distler on April 2, 2014 11:18 PM | Permalink | PGP Sig | Reply to this

Re: Big Data Power

I’d certainly be fine with dispensing with it.

Posted by: Tom Leinster on April 2, 2014 11:26 PM | Permalink | Reply to this

Re: Big Data Power

Maybe it is good if someone actually removed the MathML logo.

Posted by: Bas on September 1, 2014 8:51 PM | Permalink | Reply to this

Re: Big Data Power

For what it’s worth, I also don’t think it needs to be on (almost) every single comment; at most at the top of the main post.

Posted by: David Roberts on April 3, 2014 6:25 AM | Permalink | Reply to this

Re: Big Data Power

Thanks for the nice paper. Two quick comments:

• I am not sure that mass surveillance makes the majority of people behave like shadows of themselves. In fact, by the dictator dilemma fallacy, for mass surveillance to succeed it is important that most people do not feel restricted.

• “Mass surveillance is a weapon of mass destruction waiting to be picked up.” In fact, there was a recent DARPA grant to investigate what a terrorist group could do with (semi-)publicly available data. This also suggests that the NSA should focus not on corrupting security standards (Snowden: `our data is more valuable than theirs’), but on improving them, as they did with e.g. seLinux, the basis for our current Android systems.

Posted by: Bas Spitters on April 2, 2014 9:02 PM | Permalink | Reply to this

Re: Big Data Power

Theres also the question of what an organised crime syndicate could do with this kind of technology; theres unlikely (optimistically) to be more than a handful of people that a government may want to discredit; but a crime syndicate could target the general populace using the same methods. I’ve seen spam threatening police action of the basis of accessing porn; imagine what happens if that is more focused, say by trawling the meta-data of who you’re regularly in contact with and threatening disclosure.

Posted by: Mozibur Ullah on April 4, 2014 11:54 PM | Permalink | Reply to this

