Category Archives: Digitization

Making case law accessible to all

There have been some very exciting advances in the fight to make court documents more freely accessible to everyone. As many legal researchers and law librarians are aware, many legal materials can be relatively rare or sheltered behind a paywall. Movements are afoot to change this, at least in part, and there has been progress over the past several months.

Harvard’s Case Law Access Project, which involves scanning scanning in Harvard’s entire collection of case law books, recently scanned it’s last volume. That may sound blase, but that means that nearly 44,000 volumes with roughly 40 million pages of case law have been digitized. This case law will be made freely available to anyone who needs to review it.

In addition to finishing their scanning, Harvard also recommended providing bulk digital data of future case law to make it easier to add to the currently scanned collection.  The director of Cornell’s LII and a Professor of Law from Indiana University also testified on behalf of the continued digitally accessible case law.

Lastly, and potentially most exciting, was the announcement by the Internet Archive of their desire to store PACER records from Federal Courts and make them freely available.  While it remains to be seen if this proposal will come to fruition, it is another indication of legal material becoming more easily available to anybody. For now, many PACER documents can be found via RECAP, a free website that is co-run by the Internet Archive and Princeton’s Center for Information Technology Policy.

As you search for case law and other legal materials, imagine how that process may become easier as it all migrates to the open web!

The University of Wisconsin Law School announces the Bhopal Digital Repository

Last week, the UW Law School hosted a symposium on the Bhopal Disaster, which killed thousands of people in the Bhopal region of India, left a long legal trail, and is still controversial to this day.

As a part of that symposium, the UW Law Library, in conjunction with faculty members Mitra Sharafi, Sumudu Atapattu and Marc Galanter, launched “Bhopal: Law Accidents and Disasters in India: A Digital Archive initiated by Marc Galanter“.  This digital archive, housing nearly 3,500 scanned items related to Bhopal, is freely available for anyone to use.  The resources range from court documents and newspaper clippings to embedded video and other secondary resources. The court documents can be downloaded as full-text PDFs from anywhere in the world, while the newspaper clippings can be downloaded at the Law School.

Professor Marc Galanter, who was involved in the Bhopal legal case in the United States, provides pertinent background history and context for new researchers, and his collection is what both inspired and formed the foundation for the digital archive.

Researchers can quickly do a full-text search across the entire collection or narrow down to search only newspaper clippings or court documents. A bibliography of related Bhopal resources is also included.

Potentially the most exciting part of the Bhopal archive is that it will continue to grow. As other Bhopal scholars volunteer their unique material, it will be reviewed and added to the collection, thereby strengthening the usefulness of the collection itself.

The Bhopal collection is the first special collection of the UW Law School Digital Repository.  If there are any questions about the Bhopal collection or the repository itself, please feel free to contact Kris Turner, or more information can be found at the UW Law School Library website.

Digitize Your Old Photos, Home Movies, Etc. at Madison Public Library

Do you have a collection of analog materials (like home movies, video tapes, audio cassettes) or paper documents (photographs, etc.) that you’d like to digitize but don’t have the equipment to do so?  Then check out the Madison Public Library Central Branch’s new Personal Archiving Lab.

The Personal Archiving Lab supports the following formats:

  • VHS tapes
  • VHS-C tapes
  • DVDs (not BlueRay)
  • Audio cassettes
  • MiniDV tapes
  • Hi-8 tapes
  • Photographs / negatives / slides
  • Paper-based documents

Article: High Court Won’t Hear Copyright Challenge to Google Books

According to the Wall Street Journal Law Blog, the Supreme Court has denied cert to Authors Guild, et al. v. Google, Inc., in which the Authors Guild and individual writers argued that Google engaged in copyright infringement “on an epic scale” by digitizing, indexing, and displaying snippets of print books in internet search results.

From the article:

The last major development came in October when a federal appeals court in New York ruled for Google….

The dispute involves the boundaries of “fair use,” the legal doctrine that permits unauthorized copying in certain, limited circumstances. The Second U.S. Circuit Court of Appeals concluded in October that Google’s scanning millions of copyrighted books wasn’t infringement because what the company makes viewable online is so limited.

Google wins digitalization case

Today, Judge Denny Chin ruled in favor of Google in what may be a landmark case that would enhance Fair Use for digital items. Google argued that scanning in books and publishing ‘snippets’ of the books online (over 20 million and counting) was within the realm of Fair Use, an argument accepted by the Court. Judge Chin explicitly mentioned that the benefit of having the books digitized, stating that “Indeed, all society benefits”.
The case, which began in New York in 2004 (found here) has been a veritable rollercoaster. The ruling, which the Author’s Guild said it would appeal, is a victory for not only Google, but for libraries and researchers that would use these scanned books as research aids. Google only puts certain portions of each scanned book online, and has so far scanned in over 20 million books. With that number of books already scanned, Google estimated it could owe the Author’s Guild over three billion dollars, at roughly $750 dollars per book, if they had lost.
Judge Chin drew on a previous case that that also saw the Author’s Guild claims dismissed. In October 2012, Judge Harold Baer dismissed a case against HathiTrust, a partnership between five research-heavy universities (of which University of Wisconsin is a member), on very similar Fair Use grounds.
The Author’s Guild will appeal the decision in both the HathiTrust and Google cases, arguing that both institutions have violated copyright and far exceeded the bounds of a Fair Use defense by instituting mass scanning. Judge Chin’s ruling found that the scanning not only was beneficial to the public as a whole, but also a transformative work, meaning that copyright was not violated, but rather would likely boost sales instead of impede them.
To read more about this decision, check out the write-ups from Reuters, BBC News or the New York Times.

The University of Wisconsin Digital Collections preserves slices of history

Want to jump in a time machine? The UW Digital Collections (UWDC) is the place to do it. Over the past twelve years, the UWDC has digitized thousands of images and other media from Wisconsin and around the world. One element of librarianship is preservation and it is always exciting to see such wonderful and unique images find a home in an increasingly digital world.
Check out the UW Law School Cane Toss from 1955, or perhaps view German propaganda about Nazi ambitions with a 1938 poster about the Anschluss. These are only a few of the images that I found by browsing various collections. Warning, it is highly addictive finding out what images are on the next page!
The UWDC is not unlike walking through a gigantic museum or archive. History buffs, either casual or serious, will enjoy spending time in these digital ‘halls’. It is a fascinating (and free) way to discover the past. Are there any eras of history or specific events that you feel haven’t been preserved as well as they should be?

CAPTCHAs Being Used to Help Digitize Books with Poor OCR Accuracy

CAPTCHAs are those distorted letters that you have to enter after some internet transactions to verify that you’re actually a human.
I recently learned that some CAPTCHAs are being used to help digitize old printed material by asking users to decipher scanned words from books that computerized optical character recognition failed to recognize. That is very cool.
Science Magazine reports that:

Whereas standard CAPTCHAs display images of random characters rendered by a computer, reCAPTCHA [from Google] displays words taken from scanned texts. The solutions entered by humans are used to improve the digitization process. To increase efficiency and security, only the words that automated OCR programs cannot recognize are sent to humans.

This illustration from the Science article helps demonstrate how it works:
recaptcha.jpg
The article explains:

In this example, the word “morning” was unrecognizable by OCR. reCAPTCHA isolated the word, distorted it using random transformations including adding a line through it, and then presented it as a challenge to a user.
Because the original word (“morning”) was not recognized by OCR, another word for which the answer was known (“overlooks”) was also presented to determine if the user entered the correct answer.

For more information, see the reCAPTCHA page and the Science Magazine article.

WiLS Offers Digitization on Demand of Public Domain Materials

WiLS (Wisconsin Library Services) has recently announced a new Digitization on Demand service. This service will provide complete digital copies of works from UW Madison Memorial Library’s Special Collections and the Mills Music library that are within public domain.
Works will be scanned in their entirety for a library patron to use at their point of need, but the digital copy of the work will also be moved to the Digital Collection Center. Once the works have gone through processing with the Digital Collection Center, they will be linked to in the local OPAC and be hosted at the level of Google and the Hathi Trust to further future access to the work. The cost for this service will be paid for entirely by the requestor.
For more information or to request that an item be digitized, see the WILS web site.

Google Books & its Implications for UW Madison

The Daily Cardinal has a very thorough article on the Google Books initiative and its implications for the UW Madison campus.
The article discusses:

  • staffing concerns at campus libraries
  • copyright issues and the status of the settlement
  • how digitizing materials will increase access to important scholarly and historical works

Look for quotes from Law School prof’s, Shubha Gosh and Anuj Desai.
Thanks to my colleague, Jenny Zook, for pointing me to the article

UWDCC Real Estate Collection Offers Consulting Reports from 1960s-90s

The Real Estate Collection is a new resource from the UW Digital Collections Center. It contains materials and examples of commercial work in real estate done by celebrated University of Wisconsin professor James A. Graaskamp and others.
James Graaskamp taught real estate at the UW-Madison from 1964 to 1988 and was chairman of the Real Estate Department from 1968 until his untimely death in 1988. This digital collection contains over 165 of Landmark Research’s consulting reports completed between the late 1960s to the early 1990s. There are appraisals, market and feasibility studies as well as other types of research and analysis.