Documents and Scanning…
Published by wynn on February 8, 2011An old article, written Summer 2010 for an in-house newsletter. –wns
In the Digitization program, we work with digitizing documents for archival and web access purposes.
One type of document that we process are historical documents, such as the Verendrye Tablet publication (http://e.library.sd.gov/SodakLIVE-Docs/history/VerendryePlate/SDHistoryVerendryeTablet.pdf) This document, which has a library date stamp of October 2, 1942, is a detailed analysis of the Verendrye Plate, including full sized images of the tablet, front and back; a history of the plate and the Verendrye brothers; and a full translation of the French inscriptions that are found on the plate.
We also scan other media, such as articles and photographs. One current project involving images is the Regional Reference Program – Huron 1965. This scrapbook is comprised of documents, photographs and articles, and has been added to over the years. The covers of the book are wooden boards, and is bound together by leather strips. This project has been challenging as some newspaper articles included the full page of the newspaper, and as it has been added to over the years, the pages numbers and sections no longer match up properly.
When a document is scanned, each page is essentially an image to the computer. We take that image and run it through an OCR (Optical Character Recognition) software. This software “reads” the image and picks up any characters such as letters or numbers. These characters must be reviewed so that the image now reads as a document with words and paragraphs. One can imagine that this is a time-consuming process, as well as requiring an eye for detail.
We will then compile those scanned pages, and rebind them digitally in a Adobe Acrobat PDF (Portable Document Format) document — which is an open standard file format. PDF files are viewable and printable on virtually any platform, and sharable with anyone who has, at the very least, a free version of Adobe Acrobat Reader. PDF files can retain the look of the original document (including graphics and layout), and can be text searchable.
Another process we perform, is to take these PDF documents, run them through accessibility tests, and then once completed (and passed) we will post or link them through our e.library website.
Accessibility tests, you ask? Yes, we test the PDF files so that they can be read by those who require special software (text to speech). This means that every time there is an image in the document, we select the item and give it a description called “alternate text” so that screen readers can describe the image to the person accessing the document.
An ideal completed project will include:
- All pages scanned at 600 DPI, saved as images, in either grayscale or full color, document pending
- A working copy of those images saved down to 300dpi — these are used for the OCR and PDF process
- An accessible PDF replication of the document
- A full text version of the document (stripped of all imagery but includes alternate text of the stripped images)
Another kind of document that we are currently processing include born digital documents — meaning that they were created in Microsoft Word (or similar) and are simply being PDF’d and posted to the servers (example, the agency’s web server). We typically will not scan these documents in depth as the historical documents, but we still review each document for accessibility.
So, why all the hassle? It’s all for you. Now you (and anyone else who cares to) can learn the names of the kids who discovered the metal plate in Fort Pierre and how much they almost sold it for. You can read the original article as it was written, and as if it were in your hands. But the magic, is that it’s on your computer, and available to the world.

Add A Comment