Writers web watch

Scanning


 
The website for writers
WritersServices has over 1200 pages
To help you find
Search
Contents
Avoiding web hazards
Tips & technicalities
Web how-to
Making most from the web
Web history & issues

Home
Up
How does email work?
Email Q & A's
Multiple emails
Rules Wizard
RSS
Pictures
Scanning
Computer Ports
Summon a page
Header info
Web metrics
Measuring traffic
Bandwidth

 

 

 

 

 

Scanning or how to turn typescripts into computer files

A remarkable technology exists to identify the letters on a page and convert it back to a digital file. This is called Optical Character Recognition (OCR) technology.  There are 2 parts to the magic that will transform a dusty old script into a computer file.

Our Scanning service can do this for you.

Part one is a scanner. 

This desk-hogging plastic box scans your document and turns it into a series of dots. You normally have to do a bit of fiddling around to achieve the correct setting but there are a few guidelines. 

You only need a black and white, or litho, image. 

There is no need to capture that very special shade of yellow of old paper. With old paper, you can often enhance the image by using the red, blue or green filter.

Do not set the resolution too high. Although scanners boast about the number of dots per inch (DPI) they can achieve, your OCR software probably likes an image at around 300 dpi. More is not better. Not only might you exceed the capacity of your computer's memory, the system will work very slowly or might not work at all. Reserve high resolutions for small photographic slides that you want to blow-up and print. 

All scanner software offers a chance to preview. You need to do this to set the area that you want scanned and to adjust the brightness. Most scanners offer some filters you can apply to monochrome work. Try them on the previewed image. 

The image needs to be very flat. For old books or very faded scripts on delicate paper, a trip to the photocopier will produce a sheet that you can feed into the machine with much greater confidence of success. 

There is a degree of luck as well as judgement involved in setting things up. It is worth interrupting yourself after a few pages, just to make sure everything is working. It might take an hour to set everything up and check that all is working well, before you set about feeding your old opus into the scanner. If this seems like a bit of an investment of time, just think how long it will take you to peck away at the keyboard to retype the work.

Of course it helps enormously if you read the scanner manual. I know they are not written to confuse, it just seems that way. Often manuals make little sense until you have had a go. The more you read the manual, the more you will achieve with your scanner.

Part two is the OCR software. 

Every scanner seems to come with a package and they are outstanding. The ones I have tried work out the typeface used, as well as the font size, and convert the image to letters in seconds. Some packages allow you to help them learn if they find a jumble of dots they cannot resolve, but for the last decade I have left it to the software to guess and it has managed very well without my help.

I should mention a third part of the OCR team, which is the computer. Converting text is something that requires raw computing power. The basic Pentium with 16 Mb of memory will trudge through the script, page by page. You will have plenty of time to tidy the office or do some other penance while the computer does your work. A computer made in this millennium will convert pages just as fast as you can load them. The change in performance over the last 5 years is remarkable. It used to take 10 minutes a page. Now it takes 10 seconds.

My first set-up cost £70 including the software (a Black Widow scanner and TextBridge Classic software) and it gives me a 99% accurate recognition factor with a good copy. Now I use an HP 'all-in-one' which does a fine job. After the output has been passed through a spell checker and the various attempts to translate coffee stains to text have been removed, the result is 99.9% accurate.

The scanner is over 3 years old, which is definitely middle-aged in computer years. But it works so well I am reluctant to change it. The latest software is frighteningly clever and offers to translate the output into Japanese, which seems to rather defeat the whole object of the exercise.

Sadly, OCR, is not the complete answer. I tried it on my grandfather's trench diary from the Ypres Salient written during 1915, but it didn’t work. It also failed when fed the wonderful copperplate handwritten history of the Stewarts of Comrie. But it has coped with many faded typescripts on yellowing paper and faded dot matrix printed scripts, so give it a go. 

Scansoft  Public-domain FineReader products

Picture file types Print technologies

© Charles Jones 2001-6  

How-to Index  
bullet Tips
bullet Simple how-tos
bullet Issues
bullet Technology
bullet Home

Terminological inexactitude? Technical & Publishing Glossaries

WritersServices - The website for writers Services to help prepare your work   

Web Watch
Search
Contents
Site map
Feedback

 ©WritersServices.com 2000-2008