: Best file format for scanned documents


jdurston
Feb 11th, 2006, 12:23 PM
I'm scanning in a bunch of my school notes. What kind of settings should I use to balance readability with a decent file size? I'm saving in grayscale JPEG right now.

teeterboy3
Feb 11th, 2006, 01:15 PM
Does your scanner have the PDF button to instantly scan into PDF format?
That would be the best combination of legibilty and file size compression.

gozer
Feb 11th, 2006, 02:30 PM
use pdf, and you can make any gif or jpg into a pdf using preview.

MacMaster
Feb 11th, 2006, 02:57 PM
PDF will be your best format of choice.

You will be able to open the files on either a Mac or Windows system, and the PDF files will take up a lot less space.

Mac OS X will allow you to save any document as a PDF.

harrytse
Feb 11th, 2006, 07:01 PM
the scanning software I use considers 200dpi @ 100% suitable for text. Saving at a JPEG quality in the 40s, it comes out to under 100KB with 8 1/2 x11. PDFs are very well suited for vector-renderings but viewing PDFs from rasterised images can excessively memory intensive and slow.

teeterboy3
Feb 11th, 2006, 07:37 PM
Sure vector formats will be dramatically smaller… but since these have to be scanned, the files don't reside on the computer electronically, that size comparison is irrelevant. And DPI scanned in at or saved at is independent of the format too.

PDFs can use JPEG compression. Regardless of how the files are scanned in, they are going to be raster (or dot for dot) image files. Therefore filesize is going to be relatively the same issue in any format.

The file format and compression they get saved as in the end, affect the usability thereafter - really hinging on what the intended end use is. Assuming it's mostly for reference and legibility is a concern, whether they are saved as PDFs, JPEGS or whatever the dpi and compression will largely determine the usability. And you won't be able to be as agressive with the dpi or compression regardless of PDF or JPG.

Since PDF, JPG, GIF, etc. all use compression to shrink filesize, the enduse needs will largely dictate the end format.

Still, PDF is the most universal and useful format.

But there is no real need to have a lengthy discussion over which format I think is the best, because largely it has nothing to do with my preference. It solely comes down to where the and how the user wants to use these files.

Edit:
And since no one has really ansered your question…
To print the files, you usually need to scan a resultion of 2.5 times the lpi of the printer (assuming you want to print). For printing on a normal laser, assume arround 85-105 lpi, a nice resolution to scan at would be 250 dpi as a greyscale. Take the file that you have scanned in and save it as a PDF. And when you set the compression (I'm assuming you're using a version of Photoshop to do this) to no lower than 9 or 10. I think the scale is 0-12… or at least 75% . That should give you about as small of a file size you can afford and be able to print the files well. If you only need to view them on a computer, then only scan at 72-100 dpi and leave the compression to the full setting.

HowEver
Feb 11th, 2006, 09:21 PM
Whatever happened to memorization?

Jet_Star
Feb 13th, 2006, 03:59 AM
For notes I would definitely go with saving them as PDFs, that way you can group multiple pages into a single file.

as for the technical detail, you wouldn't need anything higher than 200dpi. If you want to save more space 150dpi is more than acceptable resolution unless you have very detailed sketches with very fine/small drawings or writing in your notes and a compression setting of at least Medium. If you can spare the MBs then a compression of High would be better.

But it all boils down do what you are going to do with these notes after you scan them.

Myrddin Emrys
Feb 13th, 2006, 05:54 PM
Doesn't OCR count for anything anymore?

jdurston
Feb 13th, 2006, 06:46 PM
Doesn't OCR count for anything anymore?

My handwriting is not even close to being OCR compatible. I have a pretty nasty scrawl.

teeterboy3
Feb 13th, 2006, 06:54 PM
Doesn't OCR count for anything anymore?
Are we talking quality OCR like the Newton?