Page 1 of 1

negation at end of line

Posted: Fri Aug 18, 2017 12:44 am
by ztodd
Figured out a solution before I posted my question :)
but since it's interesting, I'll still post it.

When you use a negation set right before a new line, it will match the new line character!

i.e., example file :

Code: Select all

abc
abcd
xyz
search for reg ex:

[^c]\n

and it will match the new line character on the first line!

Actually, since my file is a PC file, it's two characters for a new line- char code 13 (CR) and 10 (LF). I think that's what must be causing this. The negation set is matching the CR, and the \n is matching the LF. I assume that must be what's happening- because of the following :

When I search for reg ex :

[^c]

then it matches twice at the end of every line! I wondered if Textpad thinks it's a Unix file, but I verified in the document properties it shows File type = PC. So it should know to expect CR+LF for every new line...

So if I wanted to delete every line except those that end with a C, this would NOT do it :

Replacing the following reg ex with an empty string :

^.*?[^c]\n

(Whenever I do this type of 'line-deleting' search and replace, it always skips every other line to delete so I have to perform the search and replace multiple times. But that's a different issue.)

This is the reg ex I have to use to find every line except those that end with a C :

^.*?[^c]$\n

I guess the dollar sign makes it stop before the CR+LF, so that way we make sure the [^c] is definitely looking at the character before the CR.

Posted: Tue Aug 22, 2017 1:38 pm
by ben_josephs
Yes, in TextPad a negated character set matches any single character that is not in the listed set, including, individually, CR (carriage return) and LF (line feed).

Outside a character set, \n matches the newline sequence, whether it's LF, CR or CR,LF. But inside a character set, \n matches only LF.

As an alternative to using the end-of-line anchor $ you could exclude CR and LF from the character set:

.*[^c\r\n]\n

Note that these regexes do not match every line except those that end with a C. They match every non-empty line except those that end with a C, because a character set always matches exactly one character.

To match all lines that don't end with c you might use a negative look-behind expression:

^.*(?<!c)\n

The ? ("match minimally") in the subexpression .*? in your regexes has no effect, as to succeed .* must match up to the last character before the newline, regardless of whether it's matching maximally or minimally.