Separating paragraphs in HTML file

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
sher07
Posts: 2
Joined: Mon Jun 30, 2003 10:10 am

Separating paragraphs in HTML file

Post by sher07 »

Dear friends:

My problem is quite simple: I downloaded a plain text file from Project Gutenberg. This text is published with 70 characters per line. I then opened it in a text editor (originally in NoteTab) and saved it as HTML. The HTML file shows the text with paragraph breaks at the end of each line and BETWEEN PARAGRAPHS. I opened this HTML file in Textpad to word-wrap it. No problem (Edit, Select All, Reformat). Now the text wraps all the way to the right screen (in fact, it wraps all the way to the right even without Edit/Reformat, i.e when viewed in Internet Explorer). However, the line spacing between paragraphs also disappears. Everything looks great in IE except for this key issue: how can I keep the line spacing BETWEEN paragraphs while wrapping the text WITHIN each paragraph. Doing this by hand for a few paragraph is no problem, but I've got tons of text files of this sort that need to be properly wrapped. Surely, there must be some automatic way to do this without having to spend hours doing this manually.

Can someone point me in the right direction?

Thank you so much in advance.

Benjamin
Registered User
Textpad
sher07@mindspring.com
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Hi Benjamin....

Can you temporarily replace the paragraph separations?

Example:
1. Search for "RTN RTN" and replace with "QBQ".
2. Do the reformating that you want.
3. Replace the "QBQ" with "RTN RTN".

I don't know what you are looking at as paragraph separators so I have used "double returns" as an example.

You can replace RTN RTN with any character or string. I usually use "Qxx" because very few strings in English begin with anything other than "QU". But you could use a tilde "~" or a pipe "|" or any other character you don't think will be in the total body of text.

Hope this helps....good luck,
Bob
sher07
Posts: 2
Joined: Mon Jun 30, 2003 10:10 am

Paragraph Separators -- A Response

Post by sher07 »

Thank you so much for trying to help. Unfortunately, Textpad cannot solve the problem, at least not easily or elegantly. But my Namo 5.5 can. Let me explain. If you download a Project Gutenberg file in plain text (txt) and you want to use it online, you naturally have to convert it (save it as) HTML. However, the Project Gutenberg texts are typed without wrapping and each line ends with a paragraph (the old typewriter ENTER format) at
70 characters per line. In addition, there are two line spacings between paragraphs.

I first tried to change the characters per line from 70 to 120 or even to 200 or whatever using NoteTab. That had no effect on the
formatting. I then opened the original text in Textpad.

When you Select All and then Edit, Reformat, Textpad will
wrap the entire text properly to the edge of the screen, thus
removing all of the paragraph endings at the end of each
LINE. That solves a major part of the problem, and now it is
easy to just justify align each paragraph or the entire text so
that it looks right. HOWEVER, when viewed with the
browser (IE 6), the text appears as one long wrapped text,
i.e you lose ALL of the paragraph codes, including the ones
separating paragraphs. If you are working with a 400 page
document, it is simply impossible to restore the paragraph
breaks between paragraphs manually.

By good fortune, I also own Namo 5.5 and wrote to them
about the same problem. Namo allows you to open a text
file with the option of "keeping paragraph spacing". It's as
simple as that. You selecte File, Open text. Then the next
dialogue box offer you the option of "text without paragraph
breaks" and "text with paragraph breaks". You choose
option b, and, bingo, the entire 400 page book opens with
text wrapped to the edge but WITH paragraph breaks
preserverd. You then simply save it as HTML and put it
online. Imagine the vast amount of work saved!

You might consider adding this as a feature to Textpad.
Since Textpad is a text/HTML editor, this might prove not
only useful but indeed quite essential.

Thanks again.

Benjamin
User avatar
bbadmin
Site Admin
Posts: 854
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

Benjamin,

Unless your HTML file contains new paragraph tags, all plain text will be concatenated into a single line in IE, because it ignores white space. What you actually need is something like this:

Code: Select all

<p>Stuff in paragraph 1.</p>
<p>Stuff in paragraph 2.</p>
The easiest way to achieve that is to start again with the original plain text file and reformat it. Then, select all, and use the Edit/Copy Other/As a HTML page command. Open a new document, paste in the contents of the clipboard, then save that as a HTML file.

You can achieve similar results by reformatting the plain text, to remove line breaks, then using the following POSIX regular expressions in the Replace command:

Find what: ^(.)
Replace with: <p>\1
Replace all.

Find what: (.)$
Replace with: \1</p>
Replace all.

If you are not using POSIX REs, you'll need to use "\(" and "\)" in place of "(" and ")".

Keith MacDonald
Helios Software Solutions
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

It is not even necessary to work with a tagged expression here

Simply replace
^
by
<p>

and
$
by
</p>

(of course, Regex has to be active...)

Or do it in one go:
Replace
.*
by
<p>&</p>

(again with Regex selected)
User avatar
bbadmin
Site Admin
Posts: 854
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

The tagged expressions are used so that blank lines are not replaced with <p></p>. If you're desperate to simplify, it can be done in one go using:

Find what: .+
Replace with: <p>&</p>
Replace all.

Keith MacDonald
Helios Software Solutions
Post Reply