Why scan micro fiches?
When working with vintage computer systems, you absolutely need lot of ancient documentation. Luckily people are scanning these old docs like crazy, the big archive site "bitsavers.org" is well filled ... especially with DEC stuff.
DEC distributed end user documentation primary on paper. Even most of the micro fiche documents are print-outs which have been transferred to film. So if you find a micro fiche, its content is likely already digitized somewhere. But there are exceptions.
When you repair PDP-11's you need to run the "XXDP" diagnostics programs. These program come with almost no user interface, and error printout is cryptic. Documentation is only found in the MACRO-11 assembler listings of those diagnostics. And so far as I know, these listings were distributed only as micro fiches.
XXDP!
After I fixed a pile of PDP-11/34's without proper XXDP documentation, I asked around for XXDP program listings. They already were in different collections around me, but only as micro fiches. I could've read them on a classic micro fiche reader, but I decided to start a digitizing project (to give something back to bitsavers.org).
I planned for a fiche volume of 400 fiches, containing about 50.000 document pages ("frames"). Scanning these proved to be impossible first:
- You can not scan fiches on a flat bed scanner. Even at 4800dpi, resolution is not sufficient.
- You can not give the fiches to a commercial scan service, costs will kill you.
- Interestingly, the nearby University of Göttingen operates public micro fiche scanners! But you have to manually adjust each single document frame on a fiche. And scanning one frame lasts almost a minute.
- You can not buy an automatic micro fiche scanner: they are very very expensive.
Because of all these difficulties, I decided to build an own scanner. Participation on the "DEC micro fiche untderground" forum showed that this would fill a gap. And I got much support from the guys of my computer club C-C-G.
Conclusion
I did no tests with a flatbed scanner, as I rated image to be most important.
My scanning rig is slow, but sharp, and generates separated pages in unattended mode.
Video of the current scanner
This is the 2nd scanner I build, in 2021. In contrast to the first one it's much faster (6 seconds per frame) and much smaller ... only the size of an 3D printer.
Its function and the full workflow is shown here in detail:
My first scanner
This one was build in 2014. See it here at work:
The design
To be usable, the micro fiche scanner should fulfill these criteria:
- image quality: scan resolution should be higher than the resolution of the film itself, else there may be loss of information
- fully automated movement of the fiche, so all document page frames can be scanned unattended. The only manual operation should be the changing of the fiches.
- The resulting document should have the format and the quality to be recognizable by OCR.
Components of the scanning rig are:
- A modified AGFA GEVAERT COPEX LP4 micro fiche reader. Not all readers show a good picture, but this one is fine.
- The screen of the Agfa reader is photographed with a good digital camera. I used a Canon EOS 500D DSLR with 16MPixel resolution.
- As DSLR optic a 100mm lens with fixed-focale length is used ("CANON MACRO LENS EF 100mm 1:2,8 USM"). Distance to screen is about 2 meters. Use a tele-range lens, else the screen images may get warped. And don't use a zoom! These have too many glass elements inside, impacting picture quality.
- The fiche carrier of the Agfa reader is moved by an "ISEL" industrial CNC x/y positioner with stepper motors (Thanks, Thomas!) The positioner is controller over RS232 in a propietary protocol.
- A PC computer controls the positioner, triggers the DSLR and reads back the image and archives them. Any model with one RS232 and two USB 2.0 ports is usable.
- Pictures from the DSLR are read to PC over USB cable with CANON's "EOSUtility" software.
- The DSLR is triggered with an USB relay connected to the remote-trigger-cable-input.
- The DSLR has an external power supply.
- Central component is the specialized control program, which calibrates and moves the CNC positioner and operates the DSLR.
- The raw photographed images must be processed by a chain of filters to yield OCRable black&white pages in a PDF document.
What I learned
I learned much while getting the assembly to work.
* Mechnical tolerances of the carrier mover can not be build exact enough. So while moving the carrier, an overshoot is build in.
* For automatic location of the frames on a fiche manual calibration is necessary: In fact a translation between the stepper coordinate system and the logical fiche-grid-system must be calculated. The software let you move the fiche carrier with cursor keys. For calibration, 4 frames on the fiches must be exactly positioned, then the position of those reference frames on the fiche must be given.
* The scanning room must be dark, else the image contrast is too bad.
* On the images, the screen area most be surrounded by an uniform black border, which can be cropped off automatically. Therefore a bezel must be attached to the reader to let it appear wider, and the visible parts of the reader must be painted black.
* The sharpness of scanned images is limited by the grain of the film and the reader's diffusing screen. Sharpness results from 4 sources:
- The projected fiche image on the reader screen must be controlled/adjusted after each fiche.
- After the focus of the DSLR camera is adjusted to the screen, disable the auto-focus. Smallest ISO value must be used (ISO 100), else color noise will appear. The 100mm lens has best quality at a middle aperture of f=10.
- The fiche must placed absolutely flat into the carrier. I used tape strips as adjustment marks, the fiche projections gets unsharp if even one side of the fiche is on the strip, and not between the strips.
- The settings of the DSLR results in a shutter speed of 3 seconds. Moving the heavy CNC positioner causes vibrations in the whole assembly, so after carrier movement a delay of 5 seconds is used to let things come to rest.
* The controller software must also organize the filing of the resulting images. Directories must be created, meta-information must be gathered. The data on the title strip are to be saved for each fiche.
* A lot of final speed and quality depends on the user interface of the controller software. Especially typing in the info from the fiche title strips was more difficult than expected: The codes are cryptic, the room is dark, and the font size may be very small.
* The "Isel" CNC positioner makes a very loud and annoying noise. It must be isolated from the floor, else other people in the same house will complain.
* Use of a DSLR as scanning element puts quite some stress onto the camera. An EOS 500d is rated to do 70.000 exposures, and in fact my one died after 40.000 scans ... just in specs. A used 500d may cost 200€ and may have 50.000 exposures left,, so for 1 € you get 250 exposures ... about the size of a fully occupied fiche.
Why not using a regular flatbed scanner?
The web is full of discussions about scanning micro fiches with a flat bed scanner.
Some people claim it to work perfect, other laugh at the idea (me included).
Resolution:
It all depends on the scanning resolution. What is needed?
Some calculation: A XXDP program listing page on a DEC fiche is about 6mm width.
DEC imaged 132 column fanfold printouts here, so one character is 6/132 mm = 45µm width.
If they printed it with a dotmatrix, a char had 6 printer pixels (quality is usually much better). Lets say we need 12 scan dots to image the character, so we need to scan at
45µm/12 = about 4µ. The Nyquist sampling theorem requires double scanning frequency, so we need to scan at least with 2µ resolution.
That's 25.6 / 0.002 = 12800 bpi.
So a flatbed scanner with 9600 TRUE optical resolution in both direction should ALMOST do it. But this gives blurry letter shapes.
In 2020 for example the Epson Perfection V600 is rated to have true 9600 bpi, tests just mention 6400. And the true optical resolution of a CanoScan9000f with "9600 bpi" was testet to be only 1200 dpi.
And we know: if something is working almost in an ideal world, it will never do in real world. We really need extra resolution to compensate mechanical tolerances, sharpness problems, or marketing hype.
Its clear that a flatbed scanner with - say - 2400 bpi can be used on fiches with bigger characters. That explains positive results of some web reports: these guys had bigger text on their fiches.
In contrast:
Taking a photographic picture of a 6mm microfiche page with a DLSR with - say - 4000 pixels horizontally gives 6mm/4000 = 1.5µ per sensor pixel, with 30 pixels per letter. This is about 17000 bpi.
Speed
Flatbed scanners are slow, but they can take a image of the whole fiche. The said Epson Perfection V600 was tested to need about 6 minutes at 6400 bpi (which is much too bad for DEC fiches). Lets kindly assume this was for A4, so we have about 2 minutes for a whole fiche. Compared to my solution, this is lightening fast!
Post processing
While flat bed scanner tooks a fast picture of the whole fiche, you need to separate the fiche into pages. This'd need extra manual/automatic processing steps.
The problem here is the size of the image to be processed: a fiche is about 145 x 105 mm. Scanned with 9600 bpi this is about 54000 * 39000 pixels! Processing an image with > 2GPixel in size WILL need specialized software and even in 2020 lots of processing time. This compensates for the primary scanning speed.
In contrast, my scanning rig generates pictures of separated document pages in a rate of about 15 seconds per page, no separation needed then.
Project summary:
All in all I operated the 2014 scanner for 8 weeks. I scanned 428 fiches with 53545 pages, so the typical fiche has 125 frames, and is filled to 60%. The sum of all raw image sizes is 227 GB. Scan speed is 15 seconds per frame. I work in a home office and could digitize about 10 fiches per day parallel to my regular work.
With the new 2021 Scanner I scanned 900 fiches with 96000 frames in 5 weeks, until now. Still working!