How to scan books and other tricky items

rlvesco7

Registered
One of the most indispensable parts of GTD is how to setup a good filing cabinet. Personally, I try to scan as much as I can, and whatever I can't scan, I put in my filing cabinet. This means pamphlets, manuals, and other unscannable items go into my filing cabinet. Besides the unscannable items mentioned above, I also have lots of books in my world. Many of them are heavy and thus are not with me when I need them or want them. They also create tension because they take up so much room. Because of these problems, I set out to find an easy, efficient way to scan all my books and other difficult to scan items.

Hope this helps someone out there!

-----------------------------------

SPECIFIC GOAL

* Create searchable PDFs
* Output relatively low file size documents (nothing more than 5o megs)
* Fast process (I didn’t want to turn each page)
* Rebindable pages (in case I wanted to keep my books or sell them)

SOME BACKGROUND

In order to achieve the goals above I needed a decent ADF scanner, software that would allow me to create PDF+TEXT, and a method that would allow me to unbind and rebind the pages.

I found several great scanners on ebay, but I could never seem to win any of them. One of the scanners I had originally thought of getting was Fujitsu’s ScanSnap or Kodak’s ScanMate. Both of these families of scanners are easy to use, scan quickly, and output good quality scans. However, I really wanted a dual ADF (Automatic Document Feeder) and flatbed scanner. I’ve just had too many experiences where an ADF scanner wouldn’t scan the receipt, credit card, drivers license, or other odd document that was mission critical to have scanned. Dual ADF/flatbed scanners tend to have some quirks to them and generally output slightly lower quality scans, but I felt it was a small price to pay for the added convenience.

After much searching, I finally decided on the Brother MFC-8870DW. There was several reasons for this:

* Brother Multi-Function printer/scanners are so much better than the competitions. I’ve had the opportunity to work with HPs, Canons, Xeroxs, and several others. The only brand that rarely gave me scanning issues was Brother.
* Wireless networking. Since my apartment has a weird layout and my roommate was splitting the bill, this was by far the best solution. The cost and pain of laying cable was more than then the premium of having wireless. Also note, this wireless networking also included the scanning function. Not all scanners (or even wireless ones at that) allow networked scanning.
* Duplex scanning and printing.

The software I chose for this project was Paperport and Omnipage. Paperport is a decent piece of scanning software that works with a large variety of scanners. Omnipage is OCR software. Optical Character Recognition (OCR) software can be fed image based PDF documents or pictures and it will extract the text. Omnipage and Paperport work seamlessly together to create PDF+text documents. Just to clarify, there are three kinds of PDFs that can be created. One kind consists of just images. These cannot be searched or highlighted. The second kind is called PDF+text. These are exactly the same as image PDFs except that they have text embedded within them so that they can be searched. Lastly, there is PDF Normal. This type of PDF has true text, almost like Microsoft word, and images can be embedded in them. While PDF normal would be ideal, it doesn’t work well with math books because it depends on the OCR software to be able to translate the math formulas which it can’t. So instead of nice formulas, you get gibberish. Lastly, whatever you do, don’t buy them from the manufacturer’s website. You can find Omnipage for as low as $99 online and it will come bundled with Paperport! Buying directly from Nuance, the manufacturer, could cost you upwards of $500!

Many years ago I had tried to remove the binding from a book in order to scan it and I remembered what a pain it was. So while searching for a scanner on ebay, I came across an an industrial paper cutter. It’s capable of cutting 400 pages at a time. If you’re serious about digitizing your library than this is worth the investment. It will save you countless hours and possibly several limbs…. You can find them here for about $99. Another benefit of an industrial paper cutter is for scanning things that would normally be a pain to scan like those amorphous manuals that come with your new fangled printer or DVD player….

THE PROCESS


Unbinding the book

With the help of an X-acto knife, I carefully removed the book from it’s binding.

016.jpg


Once the book is removed from it’s binding, the pages should still be glued together.

020.jpg


If the book is small enough, you can proceed to the next step of cutting off the glue with the industrial paper cutter. If the book is larger than 300 pages, I suggest some further processing.

While most industrial paper cutters can cut through about 400 pages, I recommend not cutting more than 200 pages at a time depending on the book. If you try to cut too many pages at a time, you may end up with an uneven cut which will make rebinding the book later on more troublesome.

So once again using the X-acto knife, you can divide a larger book into smaller chunks. This can be tricky since it is easy to slice into adjoining pages, but with a few minutes of practice, you should be good to go.

021.jpg


Cutting off the Glue

Once you have your book or chunks of it, the next step involves cutting off the glue. There’s really not much to it. Just make sure you understand where the blade comes down. Ideally you can cut off all the glue without taking much paper off in the process. Again, depending on the type of book, you may want to use smaller chunks as it could result in a more even cut.

022.jpg


Prepping Paperport & Scanning

Since the quality you may need may differ from what I need, it’s best to play around with the setting to see what results in the best performance for effort expended. Here are some of my notes scanning 30 double-sided pages of a small math textbook (I didn't try all possible configuration!):
http://www.vesco.us/?p=203

The sweet spot for me was at 300 DPI using black and white scanning. Also, it was actually a lot faster for me not to use duplex scanning even though that meant a little more work on my part. Paperport has a cool feature where you can scan one side of a bunch of papers, then flip them around to scan the other side, and then Paperport will correctly collate the document. No need for a duplex scanner! That said, when quality and speed is not an issue, I love using the duplex scanning feature of my scanner.

When scanning, like when cutting the pages, it’s better to do smaller chunks sometimes even if your scanner can take more. If you try to do too many pages at the same time, you run the risk of causing a paper jam. Also, even if a paper jam is not caused, ADF scanners can sometimes mishandle pages resulting in crooked text.

In addition to the time it takes to actually scan the pages, there is also a post-processing part of Paperport where the PDF is then OCR’d by Omnipage in the background. For 60 pages, this took about 2 1/2 minutes using 64 bit vista with 4 gigs of ram.

Once you are done scanning a book, you may have several batches of scans. Paperport has several methods for dealing with this. The easiest involves “stacking”. You can either drag and drop one batch onto another or you can use the control key and select several batches, then click the right button on the mouse, and select stack. I prefer the latter method as the drag-and-drop method gave inconsistent results.

Once you perform the steps above you should a good looking, searchable PDF file.

Rebinding the Book

The following sources gave excellent tutorials on constructing a book binding jig and how to use one:

http://www.persistenceunlimited.com/2006/10/how-to-build-your-own-bookbinding-jig/

http://www.persistenceunlimited.com/2006/03/fun-and-easy-how-to-guide-to-binding-your-own-paperback-books-at-homefast/

http://www.diybookbinding.com/do-it-yourself-book-binding/
 

Todd V

Registered
Book Scanning Hacks

Wow, what a post! But the thought of ruining some great books just to make them digital -- yikes!

The expensive, but geek-crazy way to go is to fork out some serious cash for a Kirtas book scanner that can scan 2400 pages per hour!!! That's one book scanned every 8 minutes!! Or you can find a large, university library with one, perhaps. Or wait for a "used" one to show up on eBay ;-)

Another inexpensive way to go -- without ruining your books -- is to try this hack for automating your book scanner using Legos!! Pretty amazing hack!

Hope that helps.
 

rlvesco7

Registered
Rebinding the books

Todd,

The lego hack is awesome!

Re: Ruining your books

If you rebind the books according to the last step (which I didn't cover here), in most cases, you won't even know the book was taken apart!

While, I wouldn't use this process for collectibles, I would certainly recommend it for most books. After some practice, your book will look as good as before. You won't ever know that it was taken apart and scanned!

Cheers!

Todd V;65237 said:
Wow, what a post! But the thought of ruining some great books just to make them digital -- yikes!

The expensive, but geek-crazy way to go is to fork out some serious cash for a Kirtas book scanner that can scan 2400 pages per hour!!! That's one book scanned every 8 minutes!! Or you can find a large, university library with one, perhaps. Or wait for a "used" one to show up on eBay ;-)

Another inexpensive way to go -- without ruining your books -- is to try this hack for automating your book scanner using Legos!! Pretty amazing hack!

Hope that helps.
 

QuestorTheElf

Registered
Gosh, the last thing I'd ever want to do is discourage somebody from engaging in an endeavor they've given a lot of thought to. (I hate rejection, as sender or receiver, though some have told me rejection is just a matter of opinion.)

Nevertheless, I can remember a time in my life that I wanted to scan every single book in my collection too. I never engaged in something as elaborate as this, I merely decided to write book notes about every page I found interesting with at most a 1-sentence description, e.g., "Pg. 240 -- difference between mono and stereo plugs." Then I figured, "Scan my notes!" (Books without indexes gall me.)

Then I heard about simplicity. I also came across handling clutter. Okay, I'll admit, I found about these in yet more books. However, the books I read on simplicity and clutter said that one of our biggest time and space hogs are books!

I'm amazed to see some books that I thought 10 years ago I couldn't live without I haven't actually missed. I put some away once in a storage unit thinking when I one day have the space for my library, they'll be there. I've donated them, cautiously avoiding hernias.

I'd say not too many books give me a feeling like David Allen's original GTD, that every time I read it or look something up I get extra from it. Many other books were relevant only for the time they were printed. Technical books are notorious for this. Some management tomes also deteriorate as book-of-the-month meaning fad-of-the-month.

Since so much of GTD to me is about knowing what NOT to do as well as what to do, might this be stuff to Delegate, e.g., to the Google Book project?

I'm just thinking about how much time I've spent before on books I thought I'd really need, learning more in life to rely on inner wisdom. (Or externals like the Web -- "book"marks?)

Okay, maybe there's a happy medium here. I think this project could be practical if you could limit it using that "If you were stuck on a deserted island, what 10 books would you have with you?" GTD is 1, so that leaves you 9.

I bring this up because you may engage endlessly if you have a ton of books you'd like to do this with, ultimately to find little value. And what caused me to say that? You got me, I confess. It's from another lifechanging book, Barry Schwartz's _The Paradox of Choice: Why More is Less_

Good luck!
 

rlvesco7

Registered
Less is more

You bring up an excellent point.

The whole process has actually been a lot better than I thought it would be and it forced me to make tough decisions. For me, and for a lot of people, I think our libraries are like our intellectual garages. We often have grand dreams for how we'd like them to be, but we never seem to getting around to dealing with them.

At this point in the process, I've thrown out (donated) a ton of books. I just will never read them and they aren't worth saving. Others, I have scanned, but I have also donated. And the last group, I scanned and rebound because I want to keep them.

So now, the vast majority of my library has been emptied and the clutter is gone. It feels great. But I will be the first to say: This is definitely a project!

Another benefit worth mentioning is that I can now scan a lot of annoying things that are worth keeping for reference purposes (like manuals, pamplets, and other extraneous paper items), but that would otherwise have to be filed in some sort of awkward way.

Cheers!

QuestorTheElf;65325 said:
Gosh, the last thing I'd ever want to do is discourage somebody from engaging in an endeavor they've given a lot of thought to. (I hate rejection, as sender or receiver, though some have told me rejection is just a matter of opinion.)

Nevertheless, I can remember a time in my life that I wanted to scan every single book in my collection too. I never engaged in something as elaborate as this, I merely decided to write book notes about every page I found interesting with at most a 1-sentence description, e.g., "Pg. 240 -- difference between mono and stereo plugs." Then I figured, "Scan my notes!" (Books without indexes gall me.)

Then I heard about simplicity. I also came across handling clutter. Okay, I'll admit, I found about these in yet more books. However, the books I read on simplicity and clutter said that one of our biggest time and space hogs are books!

I'm amazed to see some books that I thought 10 years ago I couldn't live without I haven't actually missed. I put some away once in a storage unit thinking when I one day have the space for my library, they'll be there. I've donated them, cautiously avoiding hernias.

I'd say not too many books give me a feeling like David Allen's original GTD, that every time I read it or look something up I get extra from it. Many other books were relevant only for the time they were printed. Technical books are notorious for this. Some management tomes also deteriorate as book-of-the-month meaning fad-of-the-month.

Since so much of GTD to me is about knowing what NOT to do as well as what to do, might this be stuff to Delegate, e.g., to the Google Book project?

I'm just thinking about how much time I've spent before on books I thought I'd really need, learning more in life to rely on inner wisdom. (Or externals like the Web -- "book"marks?)

Okay, maybe there's a happy medium here. I think this project could be practical if you could limit it using that "If you were stuck on a deserted island, what 10 books would you have with you?" GTD is 1, so that leaves you 9.

I bring this up because you may engage endlessly if you have a ton of books you'd like to do this with, ultimately to find little value. And what caused me to say that? You got me, I confess. It's from another lifechanging book, Barry Schwartz's _The Paradox of Choice: Why More is Less_

Good luck!
 

Oogiem

Registered
QuestorTheElf;65325 said:
I'm amazed to see some books that I thought 10 years ago I couldn't live without I haven't actually missed.

I guess I'm the odd one here. My only concern about scanning is the horror of cutting apart wonderful old books!

We have a huge library, something like 10K volumes and well over 2500 separate authors. Our local library has almost nothing beyond current stuff. Most of what I have is out of print but not so far out as to be out of copyright and therefor not available on the web through feedbooks or gutenbooks. The data contained within are not available on the web either.

I would love to have a machine readable cross referenced searchable index of the contents, I just had a case where I knew I had a book that had a piece of information I needed but couldn't find it in an emergency. However, I am not willing to cut and then re-bind my books for that.

I might actually look at costs to use or rent time on the expensive book scanners. Or build the lego version myself.
 

Day Owl

Registered
QuestorTheElf;65325 said:
(Books without indexes gall me.)

Yes, yes. That's why I'm an indexer. We still exist, you know. And our work will never be replaced by computer-generated indexes because human judgment is required to create a truly useful relational index. Just ask Do Mi Stauber, also on this forum.

(Disclaimer -- my contact info is not posted here, so this is not an ad, just a comment on the OP...)
 
Top