Remove everything after 4th white space

srive99 · Post by **srive99** » Thu Nov 29, 2012 7:32 pm

Hello,
I have the strings

111 111 111 some text here
234 1 43 some more text here
1 0 213 some other text here 11

I need to replace so I need following..

111 111 111
234 1 43
1 0 213

Could some one help me out with regular expression. Essentially I need to remove everything after 3rd white space. Thanks

ben_josephs · Post by **ben_josephs** » Thu Nov 29, 2012 10:43 pm

Use "Posix" regular expression syntax:

Configure | Preferences | Editor

[X] Use POSIX regular expression syntax

Search | Replace... (<F8>):

Find what: ^([^ ]+ [^ ]+ [^ ]+) .*
Replace with: \1

[X] Regular expression

Replace All

Make sure when you copy and paste from here that a space hasn't been introduced at the end of the search or replacement expression.

jeffy · Post by **jeffy** » Thu Dec 13, 2012 4:16 am

To build on ben_josephs idea, I would personally write it as this:

Code: Select all

^(([^ ]+ ){3}).+$

That is, "3" repeats of at-least-one-non-space followed by one-space.

I'd also make it a dot-plus if you're SURE there's text after the third space.

Finally, I'd add that $, to avoid matching partial lines.

(This is my first POSIX regex

ben_josephs · Post by **ben_josephs** » Thu Dec 13, 2012 8:13 am

The repetition operators * and + are greedy: they always match as much as possible. Therefore, if .* or .+ is at the right-hand end of a regex it will always match to the end of the line (at least it will in TextPad, in which . doesn't match a newline). So the $ in .*$ or .+$ is redundant.

jeffy · Post by **jeffy** » Thu Dec 13, 2012 2:20 pm

ben_josephs wrote:.*$ or .+$ is redundant.

As far as search and replace is concerned, that's true, it is redundant.

When just searching, however it is not redundant. Using the non-$ version, go to the top of the document

Code: Select all

111 111 111 some text here 
234 1 43 some more text here 
1 0 213 some other text here 11

and search down all the way. Three instances. However, it's different when you search up from the bottom.

Using the $ constricts the results to entire-lines, regardless the direction.

ben_josephs · Post by **ben_josephs** » Fri Dec 14, 2012 8:27 pm

When I wrote
the $ in .*$ or .+$ is redundant
I should have written
the $ in .*$ or .+$ should always be redundant

Matches (in left-to-right text) are always constructed from left to right. $ anchors the right-hand end of the match (or part of it) to the end of somethingâ€”in TextPad, to the right-hand end of a line or the end of the file. .* at the end of a regex matches up to the first thing that . doesn't matchâ€”in TextPad, the right-hand end of a lineâ€”or the end of the file. A single match is attempted from the current position and constructed towards the right. If that fails the position is advanced one place to the right, a new attempt to match is made, and so on. A repeated match is a repeated attempt to make a single match, and the same left-to-right rule for those single matches applies, regardless of the direction of the repetition. After each successful single match the current position is moved one place to the right or left, depending on the repetition direction, and a new single match is attempted. (If the direction is leftwards the engine will not search to the right as far as the position of the previous match, so that it doesn't loop.) The important point is that it is the position of the left-hand end of the matches that moves, not the right-hand end. If a $ matches anywhere other than the end of a line or the end of the file, the regex engine (or the code that calls it) is behaving incorrectly.

What I'm seeing is different from what you're seeing. Here (TextPad 6.1 on Windows 7) the behaviour of leftward repetitive searches with your regex and similar regexes does not depend on whether those regexes end with a $. This is as it should be. However, the behaviour does in some cases depend on whether Wrap Searches is selected. When not selected, the behaviour of leftward searches of these regexes is sometimes incorrect. When selected, the behaviour seems to be always incorrect. Try .+