About the Project:

Over the course of the next three years (January 2009 – December 2011), Penn Libraries and the Daily Pennsylvanian will partner to produce a fully digitized version of the DP, which will not only provide users with access to digital facsimiles but also the full-text  that can be easily searched. This is impossible today.  Published continuously since 1885, the paper is comprised of more than 125,000 pages.  The paper exists in three formats – paper, microfilm and PDF. A preliminary survey of the DP indicates that the print copies are the most fragile and, in some cases, in need of serious repair and conservation work.    

Given the state of the print copies, the project team will work with the best copy available which may include paper, microfilm or PDF.  Ideally, we would like to create a digital master of our print holdings.  One of the key innovations in this project is the use of Olive text recognition software (http://www.olivesoftware.com/) which has revolutionized the production of fully searchable text.  

 

The digitization process will consist of five steps:

– Scan content

– Generate archival master

– Covert master into a PDF file

– Olive full-text recognition and XML encoding

– Uploading and indexing of  content

 

Of course the project will involve additional components including copy editing, quality control, programming and web design.

As part of the long term digital curation of the DP project, the Library will be required to update the DP regularly with new issues, but that the site is maintained and that the archival masters are preserved over time.  This requires not only disk storage, but also staff to perform the annual production of Olive generated content.

Given the size of the project, the DP and Library staff are determined to make this project a success to insure that Penn students past, current, and future will have access to the Daily Pennsylvanian – one of the most complete records of student life at Penn.