Replace is destroying line endings

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Replace is destroying line endings

Post by geoffreykidd »

I've used the following for years:

[^“�] -> (null)

It still works in 7, but it's leaving me with ONE line that's all “� pairs.

This is the first stage of a process I use to find unabalanced quote marks in dialogue, and it kills the process, which is 1. Kill anything not a quotemark. 2. Kill pairs of quotemarks. 3. Bookmark the lines with individual (unmatched) quotemarks. Undo 2. Undo 1.

Manually check the bookmarked paragraphs. (weed out false positives)

How do I avoid killing line endings? I don't DARE update my portable (working) copy until this works.

HELP!
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Never mind. Sorry to have bothered anybody.

Post by geoffreykidd »

Proper expression turned out to be: [^“�\r\n]

and everything else worked perfectly.

G_d bless Regex Buddy and Textpad!
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Yes, in TextPad 7, [^...] matches any single character, including a newline character, that [...] doesn't match.
And \n now matches only linefeeds, not generic newlines, so you have to specify linefeeds and carriage returns separately.
(If it's not in a character set [...] you can use \R for a generic newline.)

You can do this in fewer steps:

To match all lines containing no unbalanced quotes:
^([^“�\r\n]|“[^“�\r\n]*�)*$

To match all lines containing an unbalanced quote:
(?!^([^“�\r\n]|“[^“�\r\n]*�)*$)^.+
(No doubt something simpler is possible.)

Edit: Removed redundant parentheses in second regex.
Last edited by ben_josephs on Thu Apr 11, 2013 8:17 pm, edited 2 times in total.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

I tried both regexes on my current project, and the first one successfully bookmarked everything except the lines that did need checking. All I needed to do was invert all bookmarks and F2 my way through the text.

It was beautiful! The bookmarked lines were a match-for-match with the lines my old technique left bookmarked. This also means I can now create a "bookmark unbalanced quotes" macro which will do both steps in one pass.

I still need to check the results manually because there's a typesetting convention that says next-paragraph-same-speaker ends without a closing quote. FYI, the first time I tried the technique back in 2005, I got 73 hits of which 67 were false positives because the characters tended to be long-winded. :) But those six true results were worth their weight in platinum to me.

The second macro (used on same file) didn't mark anything including the lines that did indeed have unbalanced quotes. If I can find the time, I may stuff it into RegEx Buddy for debugging.

Either way, I now have a fix for a vital tool in my proofreading workshop. I can't thank you enough for the help.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Please post an example of a line with unbalanced quotes that my second regex didn't match.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

This is interesting. Regex2 worked on a couple of short files and one humungous one (6000-odd lines), going its merry way quickly and efficiently.

However, copying and pasting plaintext from my current project and trying seemed to freeze the expression dead.

Possible character-encoding problem?
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

I ran the second regex past RegexBuddy and got the following:

(?!^([^“�\r\n]|“[^“�\r\n]*�)*$)^.+

Options: ^ and $ match at line breaks

A POSIX Extended RE does not support lookaround «(?!^([^“�\r\n]|“[^“�\r\n]*�)*$)»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «([^“�\r\n]|“[^“�\r\n]*�)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
Match either the regular expression below (attempting the next alternative only if this one fails) «[^“�\r\n]»
Match a single character NOT present in the list below «[^“�\r\n]»
One of the characters ““�� «“�»
A carriage return character «\r»
A line feed character «\n»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «“[^“�\r\n]*�»
Match the character ““� literally «“»
Match a single character NOT present in the list below «[^“�\r\n]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
One of the characters ““�� «“�»
A carriage return character «\r»
A line feed character «\n»
Match the character “�� literally «�»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match any single character that is not a line break character «.+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

TextPad 7 implements Perl-style regular expressions, not POSIX extended regular expressions. Perl-style regular expressions are more powerful, and they do support look ahead and look behind. Look in TextPad's help under Reference Information | Regular Expressions.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

Thank you. Will do.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

Re-tested Regex2 in Regex Buddy against my test file and it located the unbalanced lines fine. It also worked in Textpad with a copy of one of Horatio Alger's novels I got from Project Gutenberg. However, loading the test file into Textpad and running regex2 with "find" or "find next", I get "search passed end of file."

I'm beginning to think there's something funky about the file that may be causing the recognizer to go crazy. Could I send you a copy of the file for examination? If so, to whom would I address it, please? Thank you.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

You could put it somewhere on the web and post a link to it.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

I've sent the file by a submit form on the main site, since its contents are somewhat confidential.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Do you mean you've sent it to Helios? I am nothing to do with Helios, so I won't see it.
geoffreykidd
Posts: 35
Joined: Thu Aug 02, 2007 8:50 pm

Post by geoffreykidd »

I thought you were one of their support people, so...
ak47wong
Posts: 703
Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia

Post by ak47wong »

The only accounts associated with Helios are bbadmin and helios. Everyone else is just a regular user.
Post Reply