Surveillance Publishing

December 4, 2021

Posted by John Baez

“massive over-payment of academic publishers has enabled them to buy surveillance technology covering the entire workflow that can be used not only to be combined with our private data and sold, but also to make algorithmic (aka ‘evidenceled’) employment decisions.”

Reading about this led me to this article:

Jefferson D. Pooley, Surveillance publishing.

It’s all about what publishers are doing to make money by collecting data on the habits of their readers. Let me quote a bunch!

After a general introduction to surveillance capitalism, Pooley turns to “surveillance publishing”. Their prime example: Elsevier. I’ll delete the scholarly footnotes here:

Consider Elsevier. The Dutch publishing house was founded in the late nineteenth century, but it wasn’t until the 1970s that the firm began to launch and acquire journal titles at a frenzied pace. Elsevier’s model was Pergamon, the postwar science-publishing venture established by the brash Czech-born Robert Maxwell. By 1965, around the time that Garfield’s Science Citation Index first appeared, Pergamon was publishing 150 journals. Elsevier followed Maxwell’s lead, growing at a rate of 35 titles a year by the late 1970s. Both firms hiked their subscription prices aggressively, making huge profits off the prestige signaling of Garfield’s Journal Impact Factor. Maxwell sold Pergamon to Elsevier in 1991, months before his lurid death.

Elsevier was just getting started. The firm acquired The Lancet the same year, when the company piloted what would become ScienceDirect, its Web-based journal delivery platform. In 1993 the Dutch publisher merged with Reed International, a UK paper-maker turned media conglomerate. In 2015, the firm changed its name to RELX Group, after two decades of acquisitions, divestitures, and product launches—including Scopus in 2004, Elsevier’s answer to ISI’s Web of Science. The “shorter, more modern name,” RELX explained, is a nod to the company’s “transformation” from publisher to a “technology, content and analytics driven business.” RELX’s strategy? The “organic development of increasingly sophisticated information-based analytics and decisions tools”. Elsevier, in other words, was to become a surveillance publisher. Since then, by acquisition and product launch, Elsevier has moved to make good on its self-description. By moving up and down the research lifecycle, the company has positioned itself to harvest behavioral surplus at every stage. Tracking lab results? Elsevier has Hivebench, acquired in 2016. Citation and data-sharing software? Mendeley, purchased in 2013. Posting your working paper or preprint? SSRN and Bepress, 2016 and 2017, respectively. Elsevier’s “solutions” for the post-publication phase of the scholarly workflow are anchored by Scopus and its 81 million records.

Curious about impact? Plum Analytics, an altmetrics company, acquired in 2017. Want to track your university’s researchers and their work? There’s the Pure “research information management system,” acquired in 2012. Measure researcher performance? SciVal, spun off from Scopus in 2009, which incorporates media monitoring service Newsflo, acquired in 2015.

Elsevier, to repurpose a computer science phrase, is now a fullstack publisher. Its products span the research lifecycle, from the lab bench through to impact scoring, and even—by way of Pure’s grant-searching tools—back to the bench, to begin anew. Some of its products are, you might say, services with benefits: Mendeley, for example, or even the ScienceDirect journal-delivery platform, provide reference management or journal access for customers and give off behavioral data to Elsevier. Products like SciVal and Pure, up the data chain, sell the processed data back to researchers and their employers, in the form of “research intelligence.”

It’s a good business for Elsevier. Facebook, Google, and Bytedance have to give away their consumer-facing services to attract data-producing users. If you’re not paying for it, the Silicon Valley adage has it, then you’re the product. For Elsevier and its peers, we’re the product and we’re paying (a lot) for it. Indeed, it’s likely that windfall subscription-and-APC profits in Elsevier’s “legacy” publishing business have financed its decade-long acquisition binge in analytics.

As Björn Brembs recently Tweeted:

“massive over-payment of academic publishers has enabled them to buy surveillance technology covering the entire workflow that can be used not only to be combined with our private data and sold, but also to make algorithmic (aka ‘evidenceled’) employment decisions.”

This is insult piled on injury: Fleece us once only to fleece us all over again, first in the library and then in the assessment office. Elsevier’s prediction products sort and process mined data in a variety of ways. The company touts what it calls its Fingerprint® Engine, which applies machine learning techniques to an ocean’s worth of scholarly texts—article abstracts, yes, but also patents, funding announcements, and proposals. Presumably trained on human-coded examples (scholar-designated article keywords?), the model assigns keywords (e.g., “Drug Resistance”) to documents, together with what amounts to a weighted score (e.g., 73%). The list of terms and scores is, the company says, a “Fingerprint.” The Engine is used in a variety of products, including Expert Lookup (to find reviewers), the company’s Journal Finder, and its Pure university-level research-management software. In the latter case, it’s scholars who get Fingerprinted:

“Pure applies semantic technology and 10 different research-specific keyword vocabularies to analyze a researcher’s publications and grant awards and transform them into a unique Fingerprint—a distinct visual index of concepts and a weighted list of structured terms.

But it’s not just Elsevier:

The machine learning techniques that Elsevier is using are of a piece with the RELX’s other predictive-analytics businesses aimed at corporate and legal customers, including LexisNexis Risk Solutions. Though RELX doesn’t provide specific revenue figures for its academic prediction products, the company’s 2020 SEC disclosures indicate that over a third of Elsevier’s revenue come from databases and electronic reference products–a business, the company states, in which “we continued to drive good growth through content development and enhanced machine learning and natural language processing based functionality”.

Many of Elsevier’s rivals appear to be rushing into the analytics market, too, with a similar full research-stack data harvesting strategy. Taylor & Francis, for example, is a unit of Informa, a UKbased conglomerate whose roots can be traced to Lloyd’s List, the eighteenth-century maritime-intelligence journal. In its 2020 annual report, the company wrote that it intends to “more deeply use and analyze the first party data” sitting in Taylor & Francis and other divisions, to “develop new services based on hard data and behavioral data insights.” Last year Informa acquired the Faculty of 1000, together with its OA F1000Research publishing platform. Not to be outdone, Wiley bought Hindawi, a large independent OA publisher, along with its Phenom platform. The Hindawi purchase followed Wiley’s 2016 acquisition of Atypon, a researcher-facing software firm whose online platform, Literatum, Wiley recently adopted across its journal portfolio. “Know thy reader,” Atypon writes of Literatum. “Construct reports on the fly and get visualization of content usage and users’ site behavior in real time.” Springer Nature, to cite a third example, sits under the same Holtzbrink corporate umbrella as Digital Science, which incubates startups and launches products across the research lifecycle, including the Web of Science/Scopus competitor Dimensions, data repository Figshare, impact tracker Altmetric, and many others.

So, the definition of ‘diamond open access’ should include: no surveillance.

Posted at December 4, 2021 11:49 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3369

14 Comments & 0 Trackbacks

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

I’ve also heard that Elsevier’s “Enhanced PDF viewer” tracks where you click and view.

Posted by: Blake Stacey on December 5, 2021 7:37 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Actually that’s what really got me started! Jonny Saunders wrote:

Of course Elsevier’s “enhanced pdf viewer” tracks where you click, view, if you hide the page, etc. and then transmits a big base64 blob of events along with ID from University proxy when you leave. I’m sure straight to SciVal for sale.

Is this the way we want science to work?

along with screenshots showing a bit of how it works.

Posted by: John Baez on December 5, 2021 8:10 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Enhanced interrogation, enhanced PDF viewer… It’s enough to make you suspicious of the word.

It hadn’t occurred to me that Elsevier was tracking my behaviour on its PDF viewer, though it’s not even slightly surprising. Nevertheless, what I invariably do when I do end up in their “enhanced PDF viewer” is immediately download the file onto my own computer. Partly that’s for convenience: I can operate the PDF viewer on my computer far more smoothly and deftly than any bespoke in-browser viewer.

But I also think that in the back of my mind, there’s a generalized suspicion that’s borne out by this kind of revelation. My gut instinct is to have as little contact as possible with Elsevier’s products (or Microsoft’s, or Google’s, or Amazon’s, …). So I download the paper and get out. Given the choice, I want to use only the tools I trust.

Of course, constant compromise is inevitable. For instance, my university uses Pure, so I have little choice but to use it myself.

Incidentally, Elsevier’s sneaky tracking is yet another incentive to get your papers from other places, such as the arXiv or Sci-Hub.

Posted by: Tom Leinster on December 5, 2021 9:39 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

What does your university do with Pure? I’d never heard of it before reading this article about surveillance publishing. The name ‘Pure’ instantly makes me nervous.

Posted by: John Baez on December 6, 2021 5:12 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Pure?

Current research information system

A current research information system (CRIS) is a database or other information system to store, manage and exchange contextual metadata for the research activity funded by a research funder or conducted at a research-performing organisation (or aggregation thereof).[1]

CRIS systems are also known as Research Information Management or RIM Systems (RIMS).

The data model underpinning a CRIS relies on a set of basic entities as defined by the Common European Research Information Format (CERIF) model maintained by the non-profit organisation euroCRIS.

euroCRIS as a standards organization may be non-profit but its general sponsors such as Elsevier and Clarivate aren’t.

euroCRIS sponsorship

Posted by: RodMcGuire on December 6, 2021 7:31 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

For various reasons, the university likes to keep a list of our publications. Pure is the system used.

I don’t find it at all sinister that the university wants a list of what we’re publishing, and that it wants it all in one place and formatted systematically. That seems reasonable enough. And in my department, we’re lucky enough to have very good administrative support, so I hardly ever interact with Pure directly. All I have to do when I get something published is send an email with the details to the relevant person, who does the rest.

But I had no idea Pure was an Elsevier/RELX product, or indeed that it wasn’t an Edinburgh in-house thing. The web address that we use for it is www.pure.ed.ac.uk (don’t bother trying that out — you need a login), and when I visit, no mention of Elsevier or RELX is readily apparent.

Posted by: Tom Leinster on December 6, 2021 11:43 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Aarhus university goes a bit further and also includes as its task to keep Pure in sync with RELX’s Web of Science, but not with other sources. The database provided by DBLP is much better (for CS), but they are not including it.

A bit of background: DBLP, one of the main sources for computer science publications, is working towards an open database with:

ORCID https://dblp.org/faq/How+are+ORCIDs+integrated+in+dblp.html
https://i4oc.org/ (plan S),
opencitations and
crossref: https://blog.dblp.org/feed/?pdf=148

Given that plan S is currently mandated by many of our funders, one can hope this network (in particular i4oc) to be mandated in the coming years.

Aside, there is a related initiative for open abstracts (https://i4oa.org/)

I’m curious about experiences at other universities.

Posted by: Bas Spitters on December 9, 2021 10:38 AM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

PS - yes, now that you mention it, “pure” is at least as sinister as “enhanced”. It’s got a Handmaid’s Tale feel about it.

But “pure” is also a classic bit of slang in Glasgow, where it’s used as an intensifier, like “very” or “totally”. E.g. see pure dead brilliant.

Posted by: Tom Leinster on December 6, 2021 11:50 PM | Permalink | Reply to this

Richard

$MathML-enabled post (click for more details).$

In the context of “pure mathematics”, are you taking the Glasgow meaning or the sinister one?

Posted by: Richard Pinch on December 9, 2021 9:00 AM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Thanks for drawing attention to this.

I’ve been using a browser plugin called “unpaywall”, and I’ve found it to be pretty good. Whenever you land on a publisher’s page it provides a little green button that takes you to any free versions such as arxiv. If you use DBLP, there’s a setting in DBLP to show unpaywall links there too.

Clearly there are different notions of “free”, and that’s part of the problem. But maybe a plugin like unpaywall is part of the solution, in that we can more easily choose where we read things.

Posted by: Sam Staton on December 6, 2021 10:09 AM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Thanks for warning

I would like to draw your attention to the publication https://www.unite.ai/ai-generated-language-is-beginning-to-pollute-scientific-literature/, which also provides examples of incorrect Elsevier activities.

Posted by: Tevikyan Ashot on December 6, 2021 6:12 PM | Permalink | Reply to this

Mozibur

$MathML-enabled post (click for more details).$

The worlds turned into a giant laboratory and we’re the laboratory rats.

What I don’t understand is how universities and research institutes have ended up in this position.

I thought the point of university publishing houses like OUP, the CUP was to keep publishing costs and access cheap enough for universities. How come vast tracts of research have ended up in private hands?

Maybe there is a place for a law for research funded by public money should remain permanently in the public domain?

Posted by: Mozibur Ullah on December 18, 2021 11:33 AM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Elsevier is now claiming that it fingerprints PDFs to protect against (spin the wheel of rationalizations) … ransomware!

Posted by: Blake Stacey on January 31, 2022 11:27 PM | Permalink | Reply to this

Re: Surveillance Publishing

$MathML-enabled post (click for more details).$

Thanks, Elsevier!

Posted by: John Baez on February 1, 2022 12:35 AM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

December 4, 2021