Page 1 of 1

RegEx matching [^\n]+ includes \n

Posted: Wed Dec 03, 2025 9:41 pm
by JohnFoelster
I'm having mild frustration trying to bludgeon my tab delimited data files into shape.

As the title implies, when I do something like this in a two column file:

Find what:

Code: Select all

([^\t\n]+)\t([^\t\n]+)\n\1\t([^\t\n]+)\n
Replace:

Code: Select all

\1\t\2,\3\n
It matches the first class, then the tab but then grabs the newline in the second class.

The intention is to take something like:

Code: Select all

TEXT 1 HERE\t1\n
TEXT 1 HERE\t2\n
TEXT 2 HERE\t3\n
TEXT 2 HERE\t4\n
And turn it into:

Code: Select all

TEXT 1 HERE\t1,2\n
TEXT 2 HERE\t3,4\n
But what I get is:

Code: Select all

TEXT 1 HERE\t1\n
,2\n
TEXT 2 HERE\t3\n
,4\n
I was under the impression that [^\n] did not match newline characters. [^$] doesn't seem to help, it matches to the end of the document rather than the end of the line.

Re: RegEx matching [^\n]+ includes \n

Posted: Thu Dec 04, 2025 4:47 am
by gurok
I think line endings might be at play here. As in, \r gets captured, inserting a new line when referenced.

I was able to get it to work in a new blank document with:

Find what:

Code: Select all

([^\r\t\n]+)\t([^\r\t\n]+)\r\n\1\t([^\t\r\n]+)
Replace:

Code: Select all

\1\t\2,\3

Re: RegEx matching [^\n]+ includes \n

Posted: Thu Dec 04, 2025 7:17 am
by AmigoJack
JohnFoelster wrote: Wed Dec 03, 2025 9:41 pm[^$] doesn't seem to help
$ has no special meaning inside square brackets - it only literally matches a dollar sign. Just like a dot or asterisk are no metacharacters inside character classes.

Re: RegEx matching [^\n]+ includes \n

Posted: Thu Dec 04, 2025 1:52 pm
by bbadmin
I always find it easier to think about what to match, rather than not match, unless otherwise necessary. On that basis, this works for me:

Find what:

Code: Select all

^(\w+) (\d) (\w+)\t(\d)\n\w+ \2 \w+\t(\d)$
Replace with:

Code: Select all

$1 $2 $3\t$4,$5
Note that "\2" in the search string is a back reference to what matched in the second sub-expression, which is the digit after "TEXT ".

You'll need to add repeat operators to the "\d"s, if numbers can have more than one digit.

Re: RegEx matching [^\n]+ includes \n

Posted: Sat Dec 06, 2025 5:02 am
by JohnFoelster
gurok wrote: Thu Dec 04, 2025 4:47 am I think line endings might be at play here. As in, \r gets captured, inserting a new line when referenced.

I was able to get it to work in a new blank document with:

Find what:

Code: Select all

([^\r\t\n]+)\t([^\r\t\n]+)\r\n\1\t([^\t\r\n]+)
Replace:

Code: Select all

\1\t\2,\3
Hmmm... yup, that did it. Thank you.

Re: RegEx matching [^\n]+ includes \n

Posted: Sat Dec 06, 2025 5:04 am
by JohnFoelster
AmigoJack wrote: Thu Dec 04, 2025 7:17 am
JohnFoelster wrote: Wed Dec 03, 2025 9:41 pm[^$] doesn't seem to help
$ has no special meaning inside square brackets - it only literally matches a dollar sign. Just like a dot or asterisk are no metacharacters inside character classes.
Huh... I'd swear that used to mean "until the end of the line" when I was first learning RegEx in TextPad 23 years ago.

Must be senility.

Re: RegEx matching [^\n]+ includes \n

Posted: Sat Dec 06, 2025 5:05 am
by JohnFoelster
bbadmin wrote: Thu Dec 04, 2025 1:52 pm I always find it easier to think about what to match, rather than not match
I like to live dangerously... :D

Re: RegEx matching [^\n]+ includes \n

Posted: Sat Dec 06, 2025 8:41 am
by bbadmin
@JohnFoelster:

Outside of square brackets, $ does mean end of line, but inside, it's a literal match.