Scanning/OCR/Translation updates 2017
Oct 31, 2017 0:12:35 GMT -5
Post by rhew on Oct 31, 2017 0:12:35 GMT -5
I initially thought that I would be very handicapped in my family tree research because I did not speak or read Chinese, or at least be dependent on the charity of others or on expensive translators. I have been lucky and found several people who have been kind enough to translate stuff for me for free, but I am always afraid that I am imposing on their generosity. However, I have found some ways that I can efficiently understand many source documents written in Chinese, on my own.
This is intended as a summary of the software/techniques/procedures I use – I will include links to more detailed “how-to’s” if possible, including other threads on this forum. I will also try to include any recent updates since those threads were created. If anyone has any further questions, I am happy to go into more detail. And if you have alternate or better methods, I am always willing to learn.
SCANNING: Almost all home flat-bed multi-function printer/scanner/faxes can produce excellent scans, and most come with software to create multi-page pdf’s. For cumbersome or fragile books which can’t easily fit on the flat glass, I have successfully used a good hand-held digital SLR camera to take images of each page of an unwieldy 500 page book. You can probably use a point and shoot, or a good smartphone camera, but you will PROBABLY need a tripod to reduce camera shake, and a camera app with manual settings so that you can force the shutter speed to high (e.g. Camera FV-5 play.google.com/store/apps/developer?id=FGAE)
The essentials for producing archival quality images that can later be OCR’ed, or be legible when zoomed in, are good lighting (direct or indirect sunlight may be necessary, primarily to enable a high enough shutter speed to minimize blurring due to camera shake), high resolution (preferably at least 2,000 x 3,000 pixels per page, but higher if possible), and colour (to more easily distinguish faint handwritten letters). And keeping the pages relatively flat. If you are doing a LOT, search the web for images of “homemade book scanner”.
However there are free smartphone apps that will allow you to make a perfectly acceptable image/pdf that can be read by humans. E.g. Genius Scan play.google.com/store/apps/details?id=com.thegrizzlylabs.geniusscan.free&hl=en As Doug has said, that has been a game changer when visiting museums or archives. Camscanner with Pleco siyigenealogy.proboards.com/thread/1937/document-scanner-ocr
OCR: I use Adobe Acrobat (expensive) which can process a multi-page pdf at once, and can be configured to recognize traditional or simplified Chinese, English, or several other languages. It automatically handles text in vertical columns or horizontal rows. It does not recognize handwriting, and has difficulty with 3rd or 4th generation photocopies. It can also easily merge several jpg’s into one multi-page pdf. But the OCR is never 100% accurate: whenever I am copying the OCR’d text to a document I always have to compare the Chinese characters with the original image.
I also use Pleco (paid version) play.google.com/store/apps/details?id=com.pleco.chinesesystem&hl=en and MS/Bing Translator play.google.com/store/apps/details?id=com.microsoft.translator&hl=en on my android smartphone. I find Pleco extremely accurate – especially since you can compare in real time the image and the character it sees. I frequently use it on my computer screen when Acrobat has failed to recognize a character or phrase. It takes a few steps to crop the characters, pause, copy and paste, then transfer to my computer. I recently discovered that you can load the “ARC Welder” extension into Google Chrome browser on your desktop computer, and run Pleco and almost any android app. arcwelder.proweb.info/download-ARC-Welder-on-pc.html and I haven’t yet installed/configured a web cam on my desktop computer, but I think that will be an additional time saver.
TRANSLATION: None of the software/web translators seem to be all that good, and definitely don’t give you good idiomatic English. I use Google Translate, Bing Translator (seems not to like Chrome, so I have to run it on Edge) and www.freetranslation.com/en/translate-english-chinese , and compare the results. Pleco seems to have a more extensive dictionary than any of them.
The first 2 (Google Translate and MS/Bing) can translate whole web pages, but the formatting frequently gets messed up and any text that is really an image (e.g. frequently buttons) does not get translated.
I use iTtranslate Voice play.google.com/store/apps/dev?id=9134966397852256976 on my android phone – it accurately translates any English sentences I speak into written and spoken Chinese, and vice versa.
COLLABORATION: As others have mentioned, Wechat is essential if you are trying to correspond with anyone in China – text, images, voice or video calls. Its built-in translator is pretty good, and it even has a way to transfer money to individuals there (think PayPal, but even more widely used) – you can get authenticated with a US credit card but you may need a friend with a Chinese bank card to load money onto it for you siyigenealogy.proboards.com/thread/2545/wechat
This is intended as a summary of the software/techniques/procedures I use – I will include links to more detailed “how-to’s” if possible, including other threads on this forum. I will also try to include any recent updates since those threads were created. If anyone has any further questions, I am happy to go into more detail. And if you have alternate or better methods, I am always willing to learn.
SCANNING: Almost all home flat-bed multi-function printer/scanner/faxes can produce excellent scans, and most come with software to create multi-page pdf’s. For cumbersome or fragile books which can’t easily fit on the flat glass, I have successfully used a good hand-held digital SLR camera to take images of each page of an unwieldy 500 page book. You can probably use a point and shoot, or a good smartphone camera, but you will PROBABLY need a tripod to reduce camera shake, and a camera app with manual settings so that you can force the shutter speed to high (e.g. Camera FV-5 play.google.com/store/apps/developer?id=FGAE)
The essentials for producing archival quality images that can later be OCR’ed, or be legible when zoomed in, are good lighting (direct or indirect sunlight may be necessary, primarily to enable a high enough shutter speed to minimize blurring due to camera shake), high resolution (preferably at least 2,000 x 3,000 pixels per page, but higher if possible), and colour (to more easily distinguish faint handwritten letters). And keeping the pages relatively flat. If you are doing a LOT, search the web for images of “homemade book scanner”.
However there are free smartphone apps that will allow you to make a perfectly acceptable image/pdf that can be read by humans. E.g. Genius Scan play.google.com/store/apps/details?id=com.thegrizzlylabs.geniusscan.free&hl=en As Doug has said, that has been a game changer when visiting museums or archives. Camscanner with Pleco siyigenealogy.proboards.com/thread/1937/document-scanner-ocr
OCR: I use Adobe Acrobat (expensive) which can process a multi-page pdf at once, and can be configured to recognize traditional or simplified Chinese, English, or several other languages. It automatically handles text in vertical columns or horizontal rows. It does not recognize handwriting, and has difficulty with 3rd or 4th generation photocopies. It can also easily merge several jpg’s into one multi-page pdf. But the OCR is never 100% accurate: whenever I am copying the OCR’d text to a document I always have to compare the Chinese characters with the original image.
I also use Pleco (paid version) play.google.com/store/apps/details?id=com.pleco.chinesesystem&hl=en and MS/Bing Translator play.google.com/store/apps/details?id=com.microsoft.translator&hl=en on my android smartphone. I find Pleco extremely accurate – especially since you can compare in real time the image and the character it sees. I frequently use it on my computer screen when Acrobat has failed to recognize a character or phrase. It takes a few steps to crop the characters, pause, copy and paste, then transfer to my computer. I recently discovered that you can load the “ARC Welder” extension into Google Chrome browser on your desktop computer, and run Pleco and almost any android app. arcwelder.proweb.info/download-ARC-Welder-on-pc.html and I haven’t yet installed/configured a web cam on my desktop computer, but I think that will be an additional time saver.
TRANSLATION: None of the software/web translators seem to be all that good, and definitely don’t give you good idiomatic English. I use Google Translate, Bing Translator (seems not to like Chrome, so I have to run it on Edge) and www.freetranslation.com/en/translate-english-chinese , and compare the results. Pleco seems to have a more extensive dictionary than any of them.
The first 2 (Google Translate and MS/Bing) can translate whole web pages, but the formatting frequently gets messed up and any text that is really an image (e.g. frequently buttons) does not get translated.
I use iTtranslate Voice play.google.com/store/apps/dev?id=9134966397852256976 on my android phone – it accurately translates any English sentences I speak into written and spoken Chinese, and vice versa.
COLLABORATION: As others have mentioned, Wechat is essential if you are trying to correspond with anyone in China – text, images, voice or video calls. Its built-in translator is pretty good, and it even has a way to transfer money to individuals there (think PayPal, but even more widely used) – you can get authenticated with a US credit card but you may need a friend with a Chinese bank card to load money onto it for you siyigenealogy.proboards.com/thread/2545/wechat