From shane at shaneland.co.uk Fri Aug 24 01:25:52 2007 From: shane at shaneland.co.uk (Shane M. Coughlan) Date: Fri, 24 Aug 2007 10:25:52 +0200 Subject: [GEM Development] Crude file converter for Wordplus Message-ID: <46CE9610.7050401@shaneland.co.uk> Hi all, I saw this on the GEM Announcement list and thought it might be useful here: ================ Hello, I haven't been able to find any conversion utilities for GEM Wordplus (except GEM's own Convert utility, of course) anywhere on the internet. So, wanting to read online documents on the 100LX with the graphical formatting visible I wrote a couple batch files to convet the formatting codes of RTF and HTML to Wordplus's and back to RTF. At least I presume RTF is still widely used these days. The GEM Convert utility does work, actually, if you first convert your file to Wordstar version 3 or 4. (If you don't have a word processor that can export to that there is an early WordPort program written for Brother's old word processor, the PN8500 MDS, that converts to and from a number of antique WP formats. It can be found on "8bit-micro.com" .) Unfortunately the Wordstar option doesn't seem to let italics get through so I wrote these. The batch files call up an assortment of DOS utilities found on the internet to do the actual work. They are: Martha21 By Yves Sagnier(?) Converts RTF to HTML and vice versa. No documentation. VH By Kevin Solway. HTML viewer, can convert to plain text. Reformat By Timothy C. Barmann. Edits crlf paragraph endings in the lines of text files. Change By Bruce Guthrie. The important one. Can change any ASCII character(s) into another in a file. Essentially the first batch file turns an RTF file into an HTML file, and adds some extra formatting markers to the HTML code so that the formatting won't be lost when VH turns it into plain text. The lines of text (paragraphs) in the text file are then REFORMATted so each line ends just before column 47. Then the extra formatting markers and space characters are converted to GEM Wordplus formatting codes. Then finally a Wordplus 'ruler' is inserted at the top of the file to tell GEM that "this is a Wordplus file". The second batch file simply converts Wordplus codes into RTF codes, turns its strange "triangle" characters into proper space characters, and adds an RTF header to he top of the file and an ender to the bottom so modern word processors can recognise it. Much simpler but still I was too lazy to have it convert all possible code combinations. Here's a few quirks: The "convert to Wordplus" routine sets the margins to a narrow forty seven columns. This can be changed by using a different header file in the "formats" folder and changing the number in the REFORMAT line to match the header's ruler. In the "convert to RTF" routine it's very helpfull to have had your paragraph endings in Wordplus delineated ahead of time by two hits on the ENTER key instead of one, otherwise the resulting file will be very irritating to edit. The path names saying where CHANGE is to look for its instruction files (filename.cng) can, of course, be changed ... I use the extention "DOK" to tell Wordplus from MS Word 5.5 files - just so you can find the GEM file easier in the batch file. For some reason a pair of asterisks (used in the change file to help identify former HTML markup) remain surrounding the italicized phrase in the final document. Don't know why. MARTHA21 has an annoying tendency to insert spaces randomly in the resulting text (at least in RTF to HTML). Not too many but enough ... This can be avoided by substituting another utility or just using an HTML file to begin with by deleting those parts of the batch file dealing with RTF and other details. The conversions are partial. I don't think I even bothered with "light color" in any of them. The convert to Wordplus routine can't handle more than one format code at a time - when italics is turned off in the middle of a bold/italic/underlined sentence, ALL formatting reverts to plain. The batch files aren't elegant or even complete. I'm not even a programmer, I was an art major so don't laugh too much at all this ... Of course there is probably a real program out there that does what I did much better. If so then what I have here represents a fair waste of time ... If not, it's a klunky but usable tool, or a bare beginning for a real programmer to take over ... For those very few (likely none) of you in this group who don't know what a batch file is, please respond for further instructions. Here is the batch file that converts RTF to Wordplus. copy %1 g-tmp1.rtf martha21 g-tmp1.rtf rem add text markup to html file CHANGE g-tmp1.htm e:\dos\b\cng\htm2dok1.cng /binary rem simplify html file to text, retaining markup VH /b g-tmp1.htm rem add crlf to each line at column 47 reformat g-tmp1.txt 47 rem convert temporary htm markup to gem format codes change g-tmp1.txt e:\dos\b\cng\htm2dok2.cng /binary rem add gem formatting to file header and rename copy e:\Gemapps\formats\format.doc+g-tmp1.txt final.dok rem get rid of temp files del g-tmp1.rtf del g-tmp1.htm del g-tmp1.txt echo: echo: The new GemDoc file is called FINAL.DOK pause Here are the "change files" that CHANGE.EXE uses. HTM2DOK1.CNG -f -t
*:*prprprprpr*:* -f

-t

*:*prprprprpr*:* -f -t
*:*prprprprpr*:* -f

-t

*:*prprprprpr*:* HTM2DOK2.CNG -f*:*bbbbb*:* -t\027\129 -f*:*iiiii*:* -t\027\132 -f*:*scscscscsc*:* -t\027\144 -f*:*sbsbsbsbsb*:* -t\027\160 -f*:*uuuuu*:* -t\027\136 -f*:*ppppp*:* -t\027\128 -f*:*prprprprpr*:* -t\013\010\013\010 -f\032 -t\030 Again, it's inelegant. I was under the impression that converting RTF to HTML first would make replacing the control codes easier. Maybe ... And here is the batch file to convert Wordplus to RTF. copy %1 1tmp1 CHANGE 1tmp1 e:\dos\b\cng\dok-rtf.cng /binary copy e:\Gemapps\formats\header.rtf+1tmp1 g-tmp2 del 1tmp1 copy g-tmp2+e:\Gemapps\formats\ender.rtf zfinal.rtf del g-tmp2 echo: echo: The new file is called ZFINAL.RTF pause Here is the "change file" associated with the CHANGE command in the last batch file. DOK-RTF.CNG ;======== convert double paragraph markers to RTF -F\013\010\013\010 -T\013\010\013\010\032 \par \032 ;======== convert italics markers to RTF -F\027\132 -T\032\092i \032 ;======== convert underline markers to RTF -F\027\136 -T\032\092ul \032 ;======== convert boldface markers to RTF -F\027\129 -T\032\092b \032 ;======== convert plaintext markers to RTF -F\027\128 -T\032\092plain \032 ;======== convert GEM space markers to regular spaces -F\030 -T\032 ;======== Ital+Bld+Ul -F\027\141 -T\032\092b \092i \092ul \032 ;======== light color -F\027\130 -T\032\092cf15 \032 ;======== Superscript -F\027\144 -T\032\092up12 \032 ;======== Subscript -F\027\160 -T\032\092dn12 \032 ;======== Bld+Ul -F\027\137 -T\032\092b \092ul \032 ;======== Bld+Ital -F\027\133 -T\032\092b \092i \032 ;======== Ul+Ital -F\027\140 -T\032\092ul \092i \032 ;======== Ul+Superscript -F\027\152 -T\032\092ul \092up12 \032 ;======== end Here are the RTF header and ender files that I used. HEADER.RTF {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}} \viewkind4\uc1\pard\f0\fs20 \par Text goes here. ENDER.RTF \par } Have fun. Thomas