[GEM Development] Crude file converter for Wordplus

Shane M. Coughlan shane at shaneland.co.uk
Fri Aug 24 01:25:52 PDT 2007


Hi all, I saw this on the GEM Announcement list and thought it might be
useful here:

================

Hello,

I haven't been able to find any conversion utilities for GEM Wordplus
(except GEM's own Convert utility, of course) anywhere on the
internet. So, wanting to read online documents on the 100LX with the
graphical formatting visible I wrote a couple batch files to convet
the formatting codes of RTF and HTML to Wordplus's and back to RTF.
At least I presume RTF is still widely used these days.

The GEM Convert utility does work, actually, if you first convert
your file to Wordstar version 3 or 4. (If you don't have a word
processor that can export to that there is an early WordPort program
written for Brother's old word processor, the PN8500 MDS, that
converts to and from a number of antique WP formats. It can be found
on "8bit-micro.com" .) Unfortunately the Wordstar option doesn't seem
to let italics get through so I wrote these.

The batch files call up an assortment of DOS utilities found on the
internet to do the actual work. They are:

Martha21 By Yves Sagnier(?) Converts RTF to HTML and vice
versa. No documentation.
VH By Kevin Solway. HTML viewer, can convert to
plain text.
Reformat By Timothy C. Barmann. Edits crlf paragraph
endings in the lines of text files.
Change By Bruce Guthrie. The important one. Can change
any ASCII character(s) into another in a file.

Essentially the first batch file turns an RTF file into an HTML file,
and adds some extra formatting markers to the HTML code so that the
formatting won't be lost when VH turns it into plain text. The lines
of text (paragraphs) in the text file are then REFORMATted so each
line ends just before column 47. Then the extra formatting markers
and space characters are converted to GEM Wordplus formatting codes.
Then finally a Wordplus 'ruler' is inserted at the top of the file to
tell GEM that "this is a Wordplus file".

The second batch file simply converts Wordplus codes into RTF codes,
turns its strange "triangle" characters into proper space characters,
and adds an RTF header to he top of the file and an ender to the
bottom so modern word processors can recognise it. Much simpler but
still I was too lazy to have it convert all possible code
combinations.

Here's a few quirks:

The "convert to Wordplus" routine sets the margins to a narrow forty
seven columns. This can be changed by using a different header file
in the "formats" folder and changing the number in the REFORMAT line
to match the header's ruler.

In the "convert to RTF" routine it's very helpfull to have had your
paragraph endings in Wordplus delineated ahead of time by two hits on
the ENTER key instead of one, otherwise the resulting file will be
very irritating to edit.

The path names saying where CHANGE is to look for its instruction
files (filename.cng) can, of course, be changed ...

I use the extention "DOK" to tell Wordplus from MS Word 5.5 files -
just so you can find the GEM file easier in the batch file.

For some reason a pair of asterisks (used in the change file to help
identify former HTML markup) remain surrounding the italicized phrase
in the final document. Don't know why.

MARTHA21 has an annoying tendency to insert spaces randomly in the
resulting text (at least in RTF to HTML). Not too many but enough ...
This can be avoided by substituting another utility or just using an
HTML file to begin with by deleting those parts of the batch file
dealing with RTF and other details.

The conversions are partial. I don't think I even bothered
with "light color" in any of them.

The convert to Wordplus routine can't handle more than one format
code at a time - when italics is turned off in the middle of a
bold/italic/underlined sentence, ALL formatting reverts to plain.

The batch files aren't elegant or even complete. I'm not even a
programmer, I was an art major so don't laugh too much at all this ...

Of course there is probably a real program out there that does what I
did much better. If so then what I have here represents a fair waste
of time ... If not, it's a klunky but usable tool, or a bare
beginning for a real programmer to take over ...

For those very few (likely none) of you in this group who don't know
what a batch file is, please respond for further instructions.

Here is the batch file that converts RTF to Wordplus.

copy %1 g-tmp1.rtf
martha21 g-tmp1.rtf
rem add text markup to html file
CHANGE g-tmp1.htm e:\dos\b\cng\htm2dok1.cng /binary
rem simplify html file to text, retaining markup
VH /b g-tmp1.htm
rem add crlf to each line at column 47
reformat g-tmp1.txt 47
rem convert temporary htm markup to gem format codes
change g-tmp1.txt e:\dos\b\cng\htm2dok2.cng /binary
rem add gem formatting to file header and rename
copy e:\Gemapps\formats\format.doc+g-tmp1.txt final.dok
rem get rid of temp files
del g-tmp1.rtf
del g-tmp1.htm
del g-tmp1.txt
echo:
echo: The new GemDoc file is called FINAL.DOK
pause

Here are the "change files" that CHANGE.EXE uses.

HTM2DOK1.CNG

-f<b
-t *:*bbbbb*:* <b
-f<i
-t *:*iiiii*:* <i
-f<sup
-t *:*scscscscsc*:* <sup
-f<sub
-t *:*sbsbsbsbsb*:* <sub
-f<u
-t *:*uuuuu*:* <u
-f</b
-t *:*ppppp*:* </b
-f</u
-t *:*ppppp*:* </u
-f</i
-t *:*ppppp*:* </i
-f</sup
-t *:*ppppp*:* </sup
-f</sub
-t *:*ppppp*:* </sub
-f<br>
-t<br> *:*prprprprpr*:*
-f<p>
-t<p> *:*prprprprpr*:*

-f<B
-t *:*bbbbb*:* <B
-f<I
-t *:*iiiii*:* <I
-f<SUP
-t *:*scscscscsc*:* <SUP
-f<SUB
-t *:*sbsbsbsbsb*:* <SUB
-f<U
-t *:*uuuuu*:* <U
-f</B
-t *:*ppppp*:* </B
-f</U
-t *:*ppppp*:* </U
-f</I
-t *:*ppppp*:* </I
-f</SUP
-t *:*ppppp*:* </SUP
-f</SUB
-t *:*ppppp*:* </SUB
-f<BR>
-t<BR> *:*prprprprpr*:*
-f<P>
-t<P> *:*prprprprpr*:*

HTM2DOK2.CNG

-f*:*bbbbb*:*
-t\027\129
-f*:*iiiii*:*
-t\027\132
-f*:*scscscscsc*:*
-t\027\144
-f*:*sbsbsbsbsb*:*
-t\027\160
-f*:*uuuuu*:*
-t\027\136
-f*:*ppppp*:*
-t\027\128
-f*:*prprprprpr*:*
-t\013\010\013\010
-f\032
-t\030

Again, it's inelegant. I was under the impression that converting RTF
to HTML first would make replacing the control codes easier. Maybe ...

And here is the batch file to convert Wordplus to RTF.

copy %1 1tmp1
CHANGE 1tmp1 e:\dos\b\cng\dok-rtf.cng /binary
copy e:\Gemapps\formats\header.rtf+1tmp1 g-tmp2
del 1tmp1
copy g-tmp2+e:\Gemapps\formats\ender.rtf zfinal.rtf
del g-tmp2
echo:
echo: The new file is called ZFINAL.RTF
pause

Here is the "change file" associated with the CHANGE command in the
last batch file.

DOK-RTF.CNG

;======== convert double paragraph markers to RTF
-F\013\010\013\010
-T\013\010\013\010\032 \par \032
;======== convert italics markers to RTF
-F\027\132
-T\032\092i \032
;======== convert underline markers to RTF
-F\027\136
-T\032\092ul \032
;======== convert boldface markers to RTF
-F\027\129
-T\032\092b \032
;======== convert plaintext markers to RTF
-F\027\128
-T\032\092plain \032
;======== convert GEM space markers to regular spaces
-F\030
-T\032
;======== Ital+Bld+Ul
-F\027\141
-T\032\092b \092i \092ul \032
;======== light color
-F\027\130
-T\032\092cf15 \032
;======== Superscript
-F\027\144
-T\032\092up12 \032
;======== Subscript
-F\027\160
-T\032\092dn12 \032
;======== Bld+Ul
-F\027\137
-T\032\092b \092ul \032
;======== Bld+Ital
-F\027\133
-T\032\092b \092i \032
;======== Ul+Ital
-F\027\140
-T\032\092ul \092i \032
;======== Ul+Superscript
-F\027\152
-T\032\092ul \092up12 \032
;======== end

Here are the RTF header and ender files that I used.

HEADER.RTF

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0
Times New Roman;}}
\viewkind4\uc1\pard\f0\fs20 \par
Text goes here.

ENDER.RTF

\par }

Have fun.

Thomas


More information about the gem-dev mailing list