One of the most indispensable parts of GTD is how to setup a good filing cabinet. Personally, I try to scan as much as I can, and whatever I can't scan, I put in my filing cabinet. This means pamphlets, manuals, and other unscannable items go into my filing cabinet. Besides the unscannable items mentioned above, I also have lots of books in my world. Many of them are heavy and thus are not with me when I need them or want them. They also create tension because they take up so much room. Because of these problems, I set out to find an easy, efficient way to scan all my books and other difficult to scan items.
Hope this helps someone out there!
-----------------------------------
SPECIFIC GOAL
* Create searchable PDFs
* Output relatively low file size documents (nothing more than 5o megs)
* Fast process (I didn’t want to turn each page)
* Rebindable pages (in case I wanted to keep my books or sell them)
SOME BACKGROUND
In order to achieve the goals above I needed a decent ADF scanner, software that would allow me to create PDF+TEXT, and a method that would allow me to unbind and rebind the pages.
I found several great scanners on ebay, but I could never seem to win any of them. One of the scanners I had originally thought of getting was Fujitsu’s ScanSnap or Kodak’s ScanMate. Both of these families of scanners are easy to use, scan quickly, and output good quality scans. However, I really wanted a dual ADF (Automatic Document Feeder) and flatbed scanner. I’ve just had too many experiences where an ADF scanner wouldn’t scan the receipt, credit card, drivers license, or other odd document that was mission critical to have scanned. Dual ADF/flatbed scanners tend to have some quirks to them and generally output slightly lower quality scans, but I felt it was a small price to pay for the added convenience.
After much searching, I finally decided on the Brother MFC-8870DW. There was several reasons for this:
* Brother Multi-Function printer/scanners are so much better than the competitions. I’ve had the opportunity to work with HPs, Canons, Xeroxs, and several others. The only brand that rarely gave me scanning issues was Brother.
* Wireless networking. Since my apartment has a weird layout and my roommate was splitting the bill, this was by far the best solution. The cost and pain of laying cable was more than then the premium of having wireless. Also note, this wireless networking also included the scanning function. Not all scanners (or even wireless ones at that) allow networked scanning.
* Duplex scanning and printing.
The software I chose for this project was Paperport and Omnipage. Paperport is a decent piece of scanning software that works with a large variety of scanners. Omnipage is OCR software. Optical Character Recognition (OCR) software can be fed image based PDF documents or pictures and it will extract the text. Omnipage and Paperport work seamlessly together to create PDF+text documents. Just to clarify, there are three kinds of PDFs that can be created. One kind consists of just images. These cannot be searched or highlighted. The second kind is called PDF+text. These are exactly the same as image PDFs except that they have text embedded within them so that they can be searched. Lastly, there is PDF Normal. This type of PDF has true text, almost like Microsoft word, and images can be embedded in them. While PDF normal would be ideal, it doesn’t work well with math books because it depends on the OCR software to be able to translate the math formulas which it can’t. So instead of nice formulas, you get gibberish. Lastly, whatever you do, don’t buy them from the manufacturer’s website. You can find Omnipage for as low as $99 online and it will come bundled with Paperport! Buying directly from Nuance, the manufacturer, could cost you upwards of $500!
Many years ago I had tried to remove the binding from a book in order to scan it and I remembered what a pain it was. So while searching for a scanner on ebay, I came across an an industrial paper cutter. It’s capable of cutting 400 pages at a time. If you’re serious about digitizing your library than this is worth the investment. It will save you countless hours and possibly several limbs…. You can find them here for about $99. Another benefit of an industrial paper cutter is for scanning things that would normally be a pain to scan like those amorphous manuals that come with your new fangled printer or DVD player….
THE PROCESS
Unbinding the book
With the help of an X-acto knife, I carefully removed the book from it’s binding.
Once the book is removed from it’s binding, the pages should still be glued together.
If the book is small enough, you can proceed to the next step of cutting off the glue with the industrial paper cutter. If the book is larger than 300 pages, I suggest some further processing.
While most industrial paper cutters can cut through about 400 pages, I recommend not cutting more than 200 pages at a time depending on the book. If you try to cut too many pages at a time, you may end up with an uneven cut which will make rebinding the book later on more troublesome.
So once again using the X-acto knife, you can divide a larger book into smaller chunks. This can be tricky since it is easy to slice into adjoining pages, but with a few minutes of practice, you should be good to go.
Cutting off the Glue
Once you have your book or chunks of it, the next step involves cutting off the glue. There’s really not much to it. Just make sure you understand where the blade comes down. Ideally you can cut off all the glue without taking much paper off in the process. Again, depending on the type of book, you may want to use smaller chunks as it could result in a more even cut.
Prepping Paperport & Scanning
Since the quality you may need may differ from what I need, it’s best to play around with the setting to see what results in the best performance for effort expended. Here are some of my notes scanning 30 double-sided pages of a small math textbook (I didn't try all possible configuration!):
http://www.vesco.us/?p=203
The sweet spot for me was at 300 DPI using black and white scanning. Also, it was actually a lot faster for me not to use duplex scanning even though that meant a little more work on my part. Paperport has a cool feature where you can scan one side of a bunch of papers, then flip them around to scan the other side, and then Paperport will correctly collate the document. No need for a duplex scanner! That said, when quality and speed is not an issue, I love using the duplex scanning feature of my scanner.
When scanning, like when cutting the pages, it’s better to do smaller chunks sometimes even if your scanner can take more. If you try to do too many pages at the same time, you run the risk of causing a paper jam. Also, even if a paper jam is not caused, ADF scanners can sometimes mishandle pages resulting in crooked text.
In addition to the time it takes to actually scan the pages, there is also a post-processing part of Paperport where the PDF is then OCR’d by Omnipage in the background. For 60 pages, this took about 2 1/2 minutes using 64 bit vista with 4 gigs of ram.
Once you are done scanning a book, you may have several batches of scans. Paperport has several methods for dealing with this. The easiest involves “stacking”. You can either drag and drop one batch onto another or you can use the control key and select several batches, then click the right button on the mouse, and select stack. I prefer the latter method as the drag-and-drop method gave inconsistent results.
Once you perform the steps above you should a good looking, searchable PDF file.
Rebinding the Book
The following sources gave excellent tutorials on constructing a book binding jig and how to use one:
http://www.persistenceunlimited.com/2006/10/how-to-build-your-own-bookbinding-jig/
http://www.persistenceunlimited.com/2006/03/fun-and-easy-how-to-guide-to-binding-your-own-paperback-books-at-homefast/
http://www.diybookbinding.com/do-it-yourself-book-binding/
Hope this helps someone out there!
-----------------------------------
SPECIFIC GOAL
* Create searchable PDFs
* Output relatively low file size documents (nothing more than 5o megs)
* Fast process (I didn’t want to turn each page)
* Rebindable pages (in case I wanted to keep my books or sell them)
SOME BACKGROUND
In order to achieve the goals above I needed a decent ADF scanner, software that would allow me to create PDF+TEXT, and a method that would allow me to unbind and rebind the pages.
I found several great scanners on ebay, but I could never seem to win any of them. One of the scanners I had originally thought of getting was Fujitsu’s ScanSnap or Kodak’s ScanMate. Both of these families of scanners are easy to use, scan quickly, and output good quality scans. However, I really wanted a dual ADF (Automatic Document Feeder) and flatbed scanner. I’ve just had too many experiences where an ADF scanner wouldn’t scan the receipt, credit card, drivers license, or other odd document that was mission critical to have scanned. Dual ADF/flatbed scanners tend to have some quirks to them and generally output slightly lower quality scans, but I felt it was a small price to pay for the added convenience.
After much searching, I finally decided on the Brother MFC-8870DW. There was several reasons for this:
* Brother Multi-Function printer/scanners are so much better than the competitions. I’ve had the opportunity to work with HPs, Canons, Xeroxs, and several others. The only brand that rarely gave me scanning issues was Brother.
* Wireless networking. Since my apartment has a weird layout and my roommate was splitting the bill, this was by far the best solution. The cost and pain of laying cable was more than then the premium of having wireless. Also note, this wireless networking also included the scanning function. Not all scanners (or even wireless ones at that) allow networked scanning.
* Duplex scanning and printing.
The software I chose for this project was Paperport and Omnipage. Paperport is a decent piece of scanning software that works with a large variety of scanners. Omnipage is OCR software. Optical Character Recognition (OCR) software can be fed image based PDF documents or pictures and it will extract the text. Omnipage and Paperport work seamlessly together to create PDF+text documents. Just to clarify, there are three kinds of PDFs that can be created. One kind consists of just images. These cannot be searched or highlighted. The second kind is called PDF+text. These are exactly the same as image PDFs except that they have text embedded within them so that they can be searched. Lastly, there is PDF Normal. This type of PDF has true text, almost like Microsoft word, and images can be embedded in them. While PDF normal would be ideal, it doesn’t work well with math books because it depends on the OCR software to be able to translate the math formulas which it can’t. So instead of nice formulas, you get gibberish. Lastly, whatever you do, don’t buy them from the manufacturer’s website. You can find Omnipage for as low as $99 online and it will come bundled with Paperport! Buying directly from Nuance, the manufacturer, could cost you upwards of $500!
Many years ago I had tried to remove the binding from a book in order to scan it and I remembered what a pain it was. So while searching for a scanner on ebay, I came across an an industrial paper cutter. It’s capable of cutting 400 pages at a time. If you’re serious about digitizing your library than this is worth the investment. It will save you countless hours and possibly several limbs…. You can find them here for about $99. Another benefit of an industrial paper cutter is for scanning things that would normally be a pain to scan like those amorphous manuals that come with your new fangled printer or DVD player….
THE PROCESS
Unbinding the book
With the help of an X-acto knife, I carefully removed the book from it’s binding.

Once the book is removed from it’s binding, the pages should still be glued together.

If the book is small enough, you can proceed to the next step of cutting off the glue with the industrial paper cutter. If the book is larger than 300 pages, I suggest some further processing.
While most industrial paper cutters can cut through about 400 pages, I recommend not cutting more than 200 pages at a time depending on the book. If you try to cut too many pages at a time, you may end up with an uneven cut which will make rebinding the book later on more troublesome.
So once again using the X-acto knife, you can divide a larger book into smaller chunks. This can be tricky since it is easy to slice into adjoining pages, but with a few minutes of practice, you should be good to go.

Cutting off the Glue
Once you have your book or chunks of it, the next step involves cutting off the glue. There’s really not much to it. Just make sure you understand where the blade comes down. Ideally you can cut off all the glue without taking much paper off in the process. Again, depending on the type of book, you may want to use smaller chunks as it could result in a more even cut.

Prepping Paperport & Scanning
Since the quality you may need may differ from what I need, it’s best to play around with the setting to see what results in the best performance for effort expended. Here are some of my notes scanning 30 double-sided pages of a small math textbook (I didn't try all possible configuration!):
http://www.vesco.us/?p=203
The sweet spot for me was at 300 DPI using black and white scanning. Also, it was actually a lot faster for me not to use duplex scanning even though that meant a little more work on my part. Paperport has a cool feature where you can scan one side of a bunch of papers, then flip them around to scan the other side, and then Paperport will correctly collate the document. No need for a duplex scanner! That said, when quality and speed is not an issue, I love using the duplex scanning feature of my scanner.
When scanning, like when cutting the pages, it’s better to do smaller chunks sometimes even if your scanner can take more. If you try to do too many pages at the same time, you run the risk of causing a paper jam. Also, even if a paper jam is not caused, ADF scanners can sometimes mishandle pages resulting in crooked text.
In addition to the time it takes to actually scan the pages, there is also a post-processing part of Paperport where the PDF is then OCR’d by Omnipage in the background. For 60 pages, this took about 2 1/2 minutes using 64 bit vista with 4 gigs of ram.
Once you are done scanning a book, you may have several batches of scans. Paperport has several methods for dealing with this. The easiest involves “stacking”. You can either drag and drop one batch onto another or you can use the control key and select several batches, then click the right button on the mouse, and select stack. I prefer the latter method as the drag-and-drop method gave inconsistent results.
Once you perform the steps above you should a good looking, searchable PDF file.
Rebinding the Book
The following sources gave excellent tutorials on constructing a book binding jig and how to use one:
http://www.persistenceunlimited.com/2006/10/how-to-build-your-own-bookbinding-jig/
http://www.persistenceunlimited.com/2006/03/fun-and-easy-how-to-guide-to-binding-your-own-paperback-books-at-homefast/
http://www.diybookbinding.com/do-it-yourself-book-binding/