Malamud Building Gigantic Journal Database for Data Analysis

The journal Nature has an interesting piece on public domain advocate, Carl Malamud’s project to “build a gigantic store of text and images extracted from 73 million journal articles” for data analysis.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The legal status of the database, which a cooperative project with Jawaharlal Nehru University (JNU), is uncertain.

For the moment, [Malamud] is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in.