Spacemaster Imperium OCR and Text Recovery

Started by NicholasHMCaldwell, February 06, 2014, 03:22:14 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.


Quote from: Terry K. Amthor on March 03, 2014, 03:41:28 AM
Quote from: Terry K. Amthor on February 27, 2014, 05:56:25 AM
Quote from: Guillaume on February 26, 2014, 09:22:43 AM
I checked, as long as it's a pdf file I can get the text out...
But it's going to be plain text, no formating, no flourish.

That should be fine (assuming it is going to me), as long as it's something that I could pour into MS Word, then I'd do a rough format/style while editing it.

If I wasn't clear, if you have the files, send 'em! I'd love to at least have a look and review what Kevin and I wrote so long ago, while Nicholas decides exactly how to handle it all. let me know if you need my email.

I don't have any file... since I have the books I never bothered to scan them.  ::)

As Mando pointed, maybe it would be best to get a list of who is working on which book...
514 to see, 416 to lock, 614 to shot...Target downed...Ask the marines to pick up the pieces.

RM, RM2, RMSS, RMFRP, HARP,  MERP, Cyberspace, SM, SM2, SM:P, Star Strike, Armored Assault, SD , SD : The Next Millenium, Bladestorm, Battle of the Five Armies .... Collecting ICE production since the epoch...

Terry K. Amthor

Quote from: Mando on March 03, 2014, 04:30:22 AM
Could you give us a list of the concerned books, let people already on it tell what books they are on, so that I can pick one or two not currently being scanned?

I can OCR and send txt files for someone to check them (being non native english, my writing level isn't good enough).

I'd like the 2nd edition GM Book, Player Book and Tech book in that order. Let's be sure to coordinate so we don't duplicate efforts.

Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.


For the (many and huge) tables, do you want a plain text markup (tabs, markdown?) or separate Excel Sheets?

Speaking of markdown, it could be an easy way to cleanly and basically format the scanned text for proofreading. It's easy for anyone to use, and it imports well into inDesign. It's available on PC, Mac and tablets.
.:| Fred, aka Mando |:.

Communauté francophone des joueurs de Jeux de Rôles ICE : Iceland


Of the three books,

- 2nd edition GM Book,
- 2nd edition Player Book,
- 2nd edition Tech Book.

I can take the Tech Book.

PS: The scans I have on my drives are very low quality but should work, if someone has better ones, let me know.
.:| Fred, aka Mando |:.

Communauté francophone des joueurs de Jeux de Rôles ICE : Iceland

Terry K. Amthor

Quote from: Mando on March 04, 2014, 12:20:19 AM
For the (many and huge) tables, do you want a plain text markup (tabs, markdown?) or separate Excel Sheets?

Speaking of markdown, it could be an easy way to cleanly and basically format the scanned text for proofreading. It's easy for anyone to use, and it imports well into inDesign. It's available on PC, Mac and tablets.

Tab-delimited or Excel (ugh) works fine for me; if you can get those critical tables properly into an excel sheet, that would be awesome, but I am sure they will need work. I'm not familiar with markdown, but a quick search says that it is an HTML converter?
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.


I'll grab the books and scan them at work, but it will have to wait next week, right now it's hectic due to several emergencies at the same time.  ( I work in Telco for those that didn't know )
514 to see, 416 to lock, 614 to shot...Target downed...Ask the marines to pick up the pieces.

RM, RM2, RMSS, RMFRP, HARP,  MERP, Cyberspace, SM, SM2, SM:P, Star Strike, Armored Assault, SD , SD : The Next Millenium, Bladestorm, Battle of the Five Armies .... Collecting ICE production since the epoch...

Terry K. Amthor

There's no big rush. At some point, I'd like to have the pages scanned at 300 ppi for the artwork as well. Some of the art and many of the layouts and maps can be re-used.
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.


Quote from: Terry K. Amthor on March 05, 2014, 08:44:07 PM
Tab-delimited or Excel (ugh) works fine for me; if you can get those critical tables properly into an excel sheet, that would be awesome, but I am sure they will need work. I'm not familiar with markdown, but a quick search says that it is an HTML converter?

Not really, Terry, it's a basic text markup (à la HTML, but much more simple) that can then be processed to output html, pdf or rtf. For example, to indicate bold you type: this word is **bold**. Header level 1 is: #Chapter 1 and header level 2 is: ##Chapter 1.1. Simple, no? And it can be imported into inDesign. So it's an easy way to share formatted files and review or modify them without having to care about the other person having word or another text processor.

The markup is standard and the file is plain text, so it's easy to convert. Markdown is now often used to replace html or bbcode markup for blogs or mailing. And some start to use it to create "no database" sites, or just to take notes.
.:| Fred, aka Mando |:.

Communauté francophone des joueurs de Jeux de Rôles ICE : Iceland


"Root" site with all needed info:

Mac and ipad apps:
.:| Fred, aka Mando |:.

Communauté francophone des joueurs de Jeux de Rôles ICE : Iceland


In the interest of not duplicating effort, I am at page 35 of a pass through the 2nd Edition Player Book and have clean text versions of several of the adventures, although it appears that the initial focus is more toward the core books.



- 2nd edition GM Book:
- 2nd edition Player Book: Takeyabue (book scanned, ocr'ed, text review)
- 2nd edition Tech Book: Mando (waiting for Guillaume's scan, will ocr and make basic text review)
.:| Fred, aka Mando |:.

Communauté francophone des joueurs de Jeux de Rôles ICE : Iceland

Terry K. Amthor

Quote from: Mando on March 06, 2014, 09:00:51 AM
Quote from: Terry K. Amthor on March 05, 2014, 08:44:07 PM
Tab-delimited or Excel (ugh) works fine for me; if you can get those critical tables properly into an excel sheet, that would be awesome, but I am sure they will need work. I'm not familiar with markdown, but a quick search says that it is an HTML converter?

Not really, Terry, it's a basic text markup (à la HTML, but much more simple) that can then be processed to output html, pdf or rtf. For example, to indicate bold you type: this word is **bold**. Header level 1 is: #Chapter 1 and header level 2 is: ##Chapter 1.1. Simple, no? And it can be imported into inDesign. So it's an easy way to share formatted files and review
or modify them without having to care about the other person having word or another text processor.

The markup is standard and the file is plain text, so it's easy to convert. Markdown is now often used to replace html or bbcode markup for blogs or mailing. And some start to use it to create "no database" sites, or just to take notes.

Hmm, reminds me of the old days of putting in typesetting codes. I don't think we'll need it for this though, as very few people will need to see the drafts, and it's easier for me to have the plain text, create styles in Word (that will import into IDD) and apply them.
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.

Terry K. Amthor

I was not here, I did not share this...   8)

Poor Brad Dourif goes from giant eyebrows in DUNE to having them shaved off to play Wormtongue.
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.

Terry K. Amthor

Emer III is available in hardcopy now as well as PDF!

(sorry wrong thread, got excited)
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.


Quote from: Mando on March 06, 2014, 07:58:13 PM
- 2nd edition GM Book:
- 2nd edition Player Book: Takeyabue (book scanned, ocr'ed, text review)
- 2nd edition Tech Book: Mando (waiting for Guillaume's scan, will ocr and make basic text review)

Got the Tech Book Scanned and OCRed... ( since OCR is an option I have with Acrobat Pro... it just needs to be copy/pasted out of the PDF. )
The file is 27Mb large. ( 600dpi and all, no cover though to make sure it doesn't ends  somewhere unintended )
514 to see, 416 to lock, 614 to shot...Target downed...Ask the marines to pick up the pieces.

RM, RM2, RMSS, RMFRP, HARP,  MERP, Cyberspace, SM, SM2, SM:P, Star Strike, Armored Assault, SD , SD : The Next Millenium, Bladestorm, Battle of the Five Armies .... Collecting ICE production since the epoch...


Quote from: Terry K. Amthor on March 05, 2014, 08:44:07 PM
Tab-delimited or Excel (ugh) works fine for me; if you can get those critical tables properly into an excel sheet, that would be awesome, but I am sure they will need work. I'm not familiar with markdown, but a quick search says that it is an HTML converter?
I have the SM2 attack and critical tables in excel. Not completely proof read, but I used them on a regular basis. (I also have a version of the critical tables where I have re-worded the critical text in order to make it easier to extract details [rounds of stun, bleed, etc] automatically.)


Quote from: Takeyabue on March 06, 2014, 02:23:15 PM
In the interest of not duplicating effort, I am at page 35 of a pass through the 2nd Edition Player Book and have clean text versions of several of the adventures, although it appears that the initial focus is more toward the core books.

Duplication have already occurred;-)
I started creating a Microsoft Word version of SM2 Players book several years ago. I have the whole book in a word document, and have cleaned it up from page 1 to 68, but have not done anything with it for a few years. I'd be happy to give this to the appropriate person in the ICE organization.


Duplication have already occurred;-)
I started creating a Microsoft Word version of SM2 Players book several years ago. I have the whole book in a word document, and have cleaned it up from page 1 to 68, but have not done anything with it for a few years. I'd be happy to give this to the appropriate person in the ICE organization.

Well, at least I tried. I am nearly finished myself (Page 110) so I will go ahead and finish. Mine, however, is probably less useful as it has not formatting but levels of headings and paragraphing.

Thanks for posting and letting me know.


Terry K. Amthor

I am assuming that Nicholas will be contacting you about your efforts, but let me thank you in advance. I'm excited that so many people are interested in the Imperium, and looking forward to working on its rebirth.
Terry K. Amthor
Shadow World Author, Rolemaster & SpaceMaster Co-Designer, ICE co-founder.
Eidolon Studio Art Director

"Any sufficiently advanced technology is indistinguishable from magic."
-- Clarke's First Law.


In the interest of sharing information, I have finished my initial pass through the Player Book and have a 682 KB text file.

I plan to spend a bit of time working on the modules I have already created text files for to bring them into a common standard with the limited layout of the Player Book. This will allow me to use any regular expression scripts that I create across all my files to bring them into any format established once Dr. Caldwell's time frees up and he organizes things.

Following that, if we are still waiting for an organized effort to start up, I will move into pulling text from the Vessel Compediums as that is the need in my gaming group at the current time. I apologize in advance for any duplication, but I have decided to stick to my own needs until Dr. Caldwell has time free up.
