Combine lines with the same first 4 bytes

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
karipay
Posts: 12
Joined: Thu Feb 05, 2009 10:26 pm

Combine lines with the same first 4 bytes

Post by karipay »

Hi,

Is it possible to combine 2 line having the same 4 bytes in textpad?

EX.
0001ABCDEFG
0001HIJKLMNO
0002PQRSTUV

Output
0001ABCDEFGHIJKLMNO
0002PQRSTUV

TIA!
User avatar
kengrubb
Posts: 324
Joined: Thu Dec 11, 2003 5:23 pm
Location: Olympia, WA, USA

Post by kengrubb »

Regex Replace
Find what: ^(0001.*)\n0001(.*)$
Replace with: \1\2
(2[Bb]|[^2].|.[^Bb])

That is the question.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Ths solution provided by kengrubb does work but it is limited by having to hard code the first four characters.

I wanted to overcome that, so I thought that this would work for any 4 char string:

Search for: ^(.{4})(.*)\n\1(.*)$
Replace with: \1\2\3

But surprisingly, I got an error message that this was an invalid RegEx.
The Help file does mention that TextPad supports using Tagged Expressions in the Search string.
For example \(tu\) \1 matches the string "tu tu".
This does work
Search for : ^(.{4})(.*)\1
on this string: 0001ABCDE0001FGHI, I do get a match up to "FGHI".
But my solution for the original problem does not work, it seems because it has an "\n" in the search string.

I know that TextPad's RegEx has some strange limitations, and does not wrap lines, but searches using hard coded \n do work. But apparently, using \1-\9 only works on a line before a \n, and cannot be used on a subsequent line. Not sure if this is a bug or if it is normal for TextPad.
Hope this was helpful.............good luck,
Bob
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The restriction is that TextPad's regular expression recogniser doesn't allow back-references (such as \1) to refer back over a newline. That is, you can't have a newline between a captured (parenthesised) subexpression and a reference back to it.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Thanks ben.

You confirmed my suspicion. Another lesson learned from a great teacher.
Hope this was helpful.............good luck,
Bob
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

From that behaviour, and also from the fact that \n can't be used in a quantified way (like \n+ or similar) I guessed a long time ago that Textpad splits the regular expression at \n and then applies the first part - if it matches, it applies the second part to the next line, if that also matches, it applies the third part to the next line and so on ...

As each line seems to be treated by its own regex, that would explain why back references do not work - the regex for the second line doesn't know what to refer to if the () is in the regex for the first line ...
User avatar
kengrubb
Posts: 324
Joined: Thu Dec 11, 2003 5:23 pm
Location: Olympia, WA, USA

Post by kengrubb »

Changing the focus slightly, I tried this in WE using the Perl RE syntax.

Search for:
^(.{4})(.*)\n\1
Replace with:
\1\2

However, I'm not nearly as capable with Perl RE syntax as I am with Posix extended syntax.

It's not throwing an error, but it's not finding the RE. If there's someone here who's more Perl RE fluent ...
(2[Bb]|[^2].|.[^Bb])

That is the question.
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

WildEdit is unfriendly in its handling of line endings: you have to be explicit and precise about them. Your search expression is good for Linux line endings (LF). For Windows line endings (CR,LF) you need to search for
^(.{4})(.*)\r\n\1
To handle either Linux or Windows line endings you might use
^(.{4})(.*)\r?\n\1

For Perl replacement syntax you should really use
$1$2
but WildEdit seems to accept the backslashes.
User avatar
kengrubb
Posts: 324
Joined: Thu Dec 11, 2003 5:23 pm
Location: Olympia, WA, USA

Post by kengrubb »

ben_josephs,

Worked like a charm. Thanks!

karipay,

You might consider buying WE. Seems it's a better tool for the job at hand.
(2[Bb]|[^2].|.[^Bb])

That is the question.
Post Reply