Surveillance Publishing
Posted by John Baez
Björn Brembs recently explained how
“massive over-payment of academic publishers has enabled them to buy surveillance technology covering the entire workflow that can be used not only to be combined with our private data and sold, but also to make algorithmic (aka ‘evidenceled’) employment decisions.”
Reading about this led me to this article:
- Jefferson D. Pooley, Surveillance publishing.
It’s all about what publishers are doing to make money by collecting data on the habits of their readers. Let me quote a bunch!
After a general introduction to surveillance capitalism, Pooley turns to “surveillance publishing”. Their prime example: Elsevier. I’ll delete the scholarly footnotes here:
Consider Elsevier. The Dutch publishing house was founded in the late nineteenth century, but it wasn’t until the 1970s that the firm began to launch and acquire journal titles at a frenzied pace. Elsevier’s model was Pergamon, the postwar science-publishing venture established by the brash Czech-born Robert Maxwell. By 1965, around the time that Garfield’s Science Citation Index first appeared, Pergamon was publishing 150 journals. Elsevier followed Maxwell’s lead, growing at a rate of 35 titles a year by the late 1970s. Both firms hiked their subscription prices aggressively, making huge profits off the prestige signaling of Garfield’s Journal Impact Factor. Maxwell sold Pergamon to Elsevier in 1991, months before his lurid death.
Elsevier was just getting started. The firm acquired The Lancet the same year, when the company piloted what would become ScienceDirect, its Web-based journal delivery platform. In 1993 the Dutch publisher merged with Reed International, a UK paper-maker turned media conglomerate. In 2015, the firm changed its name to RELX Group, after two decades of acquisitions, divestitures, and product launches—including Scopus in 2004, Elsevier’s answer to ISI’s Web of Science. The “shorter, more modern name,” RELX explained, is a nod to the company’s “transformation” from publisher to a “technology, content and analytics driven business.” RELX’s strategy? The “organic development of increasingly sophisticated information-based analytics and decisions tools”. Elsevier, in other words, was to become a surveillance publisher. Since then, by acquisition and product launch, Elsevier has moved to make good on its self-description. By moving up and down the research lifecycle, the company has positioned itself to harvest behavioral surplus at every stage. Tracking lab results? Elsevier has Hivebench, acquired in 2016. Citation and data-sharing software? Mendeley, purchased in 2013. Posting your working paper or preprint? SSRN and Bepress, 2016 and 2017, respectively. Elsevier’s “solutions” for the post-publication phase of the scholarly workflow are anchored by Scopus and its 81 million records.
Curious about impact? Plum Analytics, an altmetrics company, acquired in 2017. Want to track your university’s researchers and their work? There’s the Pure “research information management system,” acquired in 2012. Measure researcher performance? SciVal, spun off from Scopus in 2009, which incorporates media monitoring service Newsflo, acquired in 2015.
Elsevier, to repurpose a computer science phrase, is now a fullstack publisher. Its products span the research lifecycle, from the lab bench through to impact scoring, and even—by way of Pure’s grant-searching tools—back to the bench, to begin anew. Some of its products are, you might say, services with benefits: Mendeley, for example, or even the ScienceDirect journal-delivery platform, provide reference management or journal access for customers and give off behavioral data to Elsevier. Products like SciVal and Pure, up the data chain, sell the processed data back to researchers and their employers, in the form of “research intelligence.”
It’s a good business for Elsevier. Facebook, Google, and Bytedance have to give away their consumer-facing services to attract data-producing users. If you’re not paying for it, the Silicon Valley adage has it, then you’re the product. For Elsevier and its peers, we’re the product and we’re paying (a lot) for it. Indeed, it’s likely that windfall subscription-and-APC profits in Elsevier’s “legacy” publishing business have financed its decade-long acquisition binge in analytics.
As Björn Brembs recently Tweeted:
“massive over-payment of academic publishers has enabled them to buy surveillance technology covering the entire workflow that can be used not only to be combined with our private data and sold, but also to make algorithmic (aka ‘evidenceled’) employment decisions.”
This is insult piled on injury: Fleece us once only to fleece us all over again, first in the library and then in the assessment office. Elsevier’s prediction products sort and process mined data in a variety of ways. The company touts what it calls its Fingerprint® Engine, which applies machine learning techniques to an ocean’s worth of scholarly texts—article abstracts, yes, but also patents, funding announcements, and proposals. Presumably trained on human-coded examples (scholar-designated article keywords?), the model assigns keywords (e.g., “Drug Resistance”) to documents, together with what amounts to a weighted score (e.g., 73%). The list of terms and scores is, the company says, a “Fingerprint.” The Engine is used in a variety of products, including Expert Lookup (to find reviewers), the company’s Journal Finder, and its Pure university-level research-management software. In the latter case, it’s scholars who get Fingerprinted:
“Pure applies semantic technology and 10 different research-specific keyword vocabularies to analyze a researcher’s publications and grant awards and transform them into a unique Fingerprint—a distinct visual index of concepts and a weighted list of structured terms.
But it’s not just Elsevier:
The machine learning techniques that Elsevier is using are of a piece with the RELX’s other predictive-analytics businesses aimed at corporate and legal customers, including LexisNexis Risk Solutions. Though RELX doesn’t provide specific revenue figures for its academic prediction products, the company’s 2020 SEC disclosures indicate that over a third of Elsevier’s revenue come from databases and electronic reference products–a business, the company states, in which “we continued to drive good growth through content development and enhanced machine learning and natural language processing based functionality”.
Many of Elsevier’s rivals appear to be rushing into the analytics market, too, with a similar full research-stack data harvesting strategy. Taylor & Francis, for example, is a unit of Informa, a UKbased conglomerate whose roots can be traced to Lloyd’s List, the eighteenth-century maritime-intelligence journal. In its 2020 annual report, the company wrote that it intends to “more deeply use and analyze the first party data” sitting in Taylor & Francis and other divisions, to “develop new services based on hard data and behavioral data insights.” Last year Informa acquired the Faculty of 1000, together with its OA F1000Research publishing platform. Not to be outdone, Wiley bought Hindawi, a large independent OA publisher, along with its Phenom platform. The Hindawi purchase followed Wiley’s 2016 acquisition of Atypon, a researcher-facing software firm whose online platform, Literatum, Wiley recently adopted across its journal portfolio. “Know thy reader,” Atypon writes of Literatum. “Construct reports on the fly and get visualization of content usage and users’ site behavior in real time.” Springer Nature, to cite a third example, sits under the same Holtzbrink corporate umbrella as Digital Science, which incubates startups and launches products across the research lifecycle, including the Web of Science/Scopus competitor Dimensions, data repository Figshare, impact tracker Altmetric, and many others.
So, the definition of ‘diamond open access’ should include: no surveillance.
Re: Surveillance Publishing
I’ve also heard that Elsevier’s “Enhanced PDF viewer” tracks where you click and view.