Will OCR help with this? - ehMac.ca
Facebook
Twitter
YouTube
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read Advertise


Reply
 
LinkBack Thread Tools Display Modes
Old Jun 2nd, 2020, 03:43 PM   #1
New Neighbour
 
Join Date: Jan 2011
Posts: 16
Will OCR help with this?

Seeking some technical guidance regarding a book my 94 year old Mum-in-law is writing.

The issues are numerous . I’ll do my best to sum up what I’m up against.

The book has been written entirely as a PDF**using MS “Wordpad”.

The process to spell check the PDF’s and make other corrections is remarkably painful to the point I offered some assistance only to discover what I thought may be a somewhat*easy fix is anything but…

I’ve used google docs to convert the pdf’s to Libreoffice ,odt which allows spell checking BUT all ‘images’ do not carry over. This means all images would have to be imported and inserted back in the appropriate spots.**This would be a more daunting process that I am willing to undertake. Why? You might ask….

Well the book is now up to approximate 500 pages with more to come and on those 500 pages there are more than SEVEN HUNDRED AND THIRTY NINE images!!! With more to come.

I’m aware of the limitations PDF’s present but Wordpad is the only app mum feels comfortable with and at this point 500 pages in… what do you do?

I’m wondering if an OCR program of some description can be employed to convert the complete document with all images/photo’s in tact in place.

Hopefully I’ve explained my dilemma. Thanks for what comes back.

CliveK
* *
macOS 10.15.4
Slackjaw is offline   Reply With Quote
Sponsored Links
Advertisement
 
Old Jun 3rd, 2020, 10:15 AM   #2
Honourable Citizen
 
wonderings's Avatar
 
Join Date: Jun 2003
Location: Waterloo
Posts: 5,245
Sounds like you need a page layout program. I would take a look at Affinity Publisher which is an Indesign like application for a fraction of the cost and no subscription.

https://affinity.serif.com/en-gb/publisher/

You should be able to import a word file with images like you can do in Indesign. Think there is a 90 day full feature demo available so you could give it a shot. If it works the price is around $60 I believe and may be on sale for 50% off at the moment. So both ways are really cheap for software that can hopefully help get this setup easier.
__________________
“If you are neutral in situations of injustice, you have chosen the side of the oppressor. If an elephant has its foot on the tail of a mouse and you say that you are neutral, the mouse will not appreciate your neutrality.” Bishop Desmond Tutu
wonderings is online now   Reply With Quote
Old Jun 3rd, 2020, 12:49 PM   #3
Honourable Citizen
 
Join Date: Nov 2006
Posts: 17,526
Quote:
Originally Posted by wonderings View Post
Sounds like you need a page layout program. I would take a look at Affinity Publisher which is an Indesign like application for a fraction of the cost and no subscription.

https://affinity.serif.com/en-gb/publisher/

You should be able to import a word file with images like you can do in Indesign. Think there is a 90 day full feature demo available so you could give it a shot. If it works the price is around $60 I believe and may be on sale for 50% off at the moment. So both ways are really cheap for software that can hopefully help get this setup easier.
Does it import correctly from the pdf format the OP referred to? I would think it should but nothing is ever certain in the digital world.

I suspect if he was working with a Word file there would have been no problem in the first place.
__________________
Ad links appearing in my posts were not placed there by me. I do not endorse any products which may be linked to my posts. Do not click on those links.

I retain all rights to photo-images I have posted on ehMac. They were posted that other members of the community could enjoy them. They may not be used or sold in any other way without my written consent.

Social Distancing is an Oxymoron. The correct term is Social Demonization or Social Repression
eMacMan is offline   Reply With Quote
 
Old Jun 3rd, 2020, 02:03 PM   #4
Honourable Citizen
 
wonderings's Avatar
 
Join Date: Jun 2003
Location: Waterloo
Posts: 5,245
Quote:
Originally Posted by eMacMan View Post
Does it import correctly from the pdf format the OP referred to? I would think it should but nothing is ever certain in the digital world.

I suspect if he was working with a Word file there would have been no problem in the first place.
oh Missed that bit.

This might be the one place where Publishers PDF handling could be a good thing. Affinity Publisher likes to make a PDF editable. It is a nightmare as it cannot use embedded fonts so if you are working with client supplied PDF's and place them in Publisher they can get seriously messed up. But nn this case it might be a good thing as it "should" make your PDF editable in Publisher with all your images and formatting. Trial is free so no harm in trying, but if I am understanding correctly this might just work.
__________________
“If you are neutral in situations of injustice, you have chosen the side of the oppressor. If an elephant has its foot on the tail of a mouse and you say that you are neutral, the mouse will not appreciate your neutrality.” Bishop Desmond Tutu
wonderings is online now   Reply With Quote
Old Jun 3rd, 2020, 02:40 PM   #5
Honourable Citizen
 
pm-r's Avatar
 
Join Date: May 2009
Location: Brentwood Bay BC
Posts: 14,999
I think I would want to contact the potential publisher of the book and get their suggestions as to what and how to use their suggested applications.

No one will want to have to redo a 500+ page book, for spelling, formatting file type etc.

Would have thought that writing a lengthy book as a PDF would be the last choice, and I can't see any advantage for using any OCR. That's doubling up the amount of work involved at least I would think.



- Patrick
======
pm-r is offline   Reply With Quote
Old Jun 4th, 2020, 07:26 AM   #6
Full Citizen
 
Join Date: Jul 2004
Location: Oakville, ON
Posts: 614
Quote:
Originally Posted by Slackjaw View Post
...
The book has been written entirely as a PDF**using MS “Wordpad”.

 ...
Wordpad's native file format is NOT pdf. Get access to the original file on Windows. MS Word can read native Wordpad files. Or from Windows, save the file in rtf format and then many word and document processors will be able to read it.

Craig
(Let me guess...the original file is corrupt and there are no backups.)
WCraig is offline   Reply With Quote
Old Jun 4th, 2020, 11:01 AM   #7
New Neighbour
 
Join Date: Jan 2011
Posts: 16
Thanks for the suggestions. I'll take a look at the programs suggested.

@WCraig. I'm aware that .pdf is not the native file format for Wordpad but mum saved everything as a .pdf. The are no rtf files to be found.

CliveK
Slackjaw is offline   Reply With Quote
Old Jun 4th, 2020, 12:05 PM   #8
Honourable Citizen
 
pm-r's Avatar
 
Join Date: May 2009
Location: Brentwood Bay BC
Posts: 14,999
Quote:
Craig
(Let me guess...the original file is corrupt and there are no backups.)

Isn't it amazing how often that seems to be so true!!!

Multiple years of work and typing so often will disappear into uselessness.




- Patrick
======
pm-r is offline   Reply With Quote
Old Jun 4th, 2020, 12:42 PM   #9
Full Citizen
 
Join Date: Jul 2004
Location: Oakville, ON
Posts: 614
Quote:
Originally Posted by Slackjaw View Post
... @WCraig. I'm aware that .pdf is not the native file format for Wordpad but mum saved everything as a .pdf. The are no rtf files to be found.

CliveK
This makes no sense. How can your mother continue to write and edit the book? Wordpad cannot read a pdf file; it can only output them. Thus there must be a native file somewhere. I'm not at a Windows machine but I don't think rtf is the native format for Wordpad files, either.

Craig
WCraig is offline   Reply With Quote
Old Jun 4th, 2020, 12:55 PM   #10
Honourable Citizen
 
pm-r's Avatar
 
Join Date: May 2009
Location: Brentwood Bay BC
Posts: 14,999
Quote:
Originally Posted by WCraig View Post
This makes no sense. How can your mother continue to write and edit the book? Wordpad cannot read a pdf file; it can only output them. Thus there must be a native file somewhere. I'm not at a Windows machine but I don't think rtf is the native format for Wordpad files, either.

Craig

I gather she must be using a Windows computer as well???

Quote:
RTF was created by the Microsoft Word team back in the 1980’s. It was intended as a universal format that could be used by most word processors, making it easier for people to share Word documents with people who don’t use Word. It was also incorporated as the default format used by Windows’ built-in WordPad app—a lightweight word processor.




- Patrick
======
pm-r is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



All times are GMT -4. The time now is 08:20 AM.



Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
vBulletin Security provided by vBSecurity v2.2.2 (Pro) - vBulletin Mods & Addons Copyright © 2020 DragonByte Technologies Ltd.
Copyright © 1999 - 2012, ehMac.ca All rights reserved. ehMac is not affiliated with Apple Inc. Mac, iPod, iTunes, iPhone, Apple TV are trademarks of Apple Inc. Content Relevant URLs by vBSEO 3.6.0 RC 2

Tribe.ca: Urban living in Toronto!