Remove everything after 4th white space

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
srive99
Posts: 30
Joined: Fri Apr 16, 2010 1:45 pm

Remove everything after 4th white space

Post by srive99 »

Hello,
I have the strings
111 111 111 some text here
234 1 43 some more text here
1 0 213 some other text here 11
I need to replace so I need following..
111 111 111
234 1 43
1 0 213
Could some one help me out with regular expression. Essentially I need to remove everything after 3rd white space. Thanks
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Use "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Search | Replace... (<F8>):
Find what: ^([^ ]+ [^ ]+ [^ ]+) .*
Replace with: \1

[X] Regular expression

Replace All
Make sure when you copy and paste from here that a space hasn't been introduced at the end of the search or replacement expression.
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

To build on ben_josephs idea, I would personally write it as this:

Code: Select all

^(([^ ]+ ){3}).+$
That is, "3" repeats of at-least-one-non-space followed by one-space.

I'd also make it a dot-plus if you're SURE there's text after the third space.

Finally, I'd add that $, to avoid matching partial lines.

(This is my first POSIX regex :)
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The repetition operators * and + are greedy: they always match as much as possible. Therefore, if .* or .+ is at the right-hand end of a regex it will always match to the end of the line (at least it will in TextPad, in which . doesn't match a newline). So the $ in .*$ or .+$ is redundant.
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

ben_josephs wrote:.*$ or .+$ is redundant.
As far as search and replace is concerned, that's true, it is redundant.

When just searching, however it is not redundant. Using the non-$ version, go to the top of the document

Code: Select all

111 111 111 some text here 
234 1 43 some more text here 
1 0 213 some other text here 11
and search down all the way. Three instances. However, it's different when you search up from the bottom.

Using the $ constricts the results to entire-lines, regardless the direction.
ben_josephs
Posts: 2457
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

When I wrote
the $ in .*$ or .+$ is redundant
I should have written
the $ in .*$ or .+$ should always be redundant

Matches (in left-to-right text) are always constructed from left to right. $ anchors the right-hand end of the match (or part of it) to the end of something—in TextPad, to the right-hand end of a line or the end of the file. .* at the end of a regex matches up to the first thing that . doesn't match—in TextPad, the right-hand end of a line—or the end of the file. A single match is attempted from the current position and constructed towards the right. If that fails the position is advanced one place to the right, a new attempt to match is made, and so on. A repeated match is a repeated attempt to make a single match, and the same left-to-right rule for those single matches applies, regardless of the direction of the repetition. After each successful single match the current position is moved one place to the right or left, depending on the repetition direction, and a new single match is attempted. (If the direction is leftwards the engine will not search to the right as far as the position of the previous match, so that it doesn't loop.) The important point is that it is the position of the left-hand end of the matches that moves, not the right-hand end. If a $ matches anywhere other than the end of a line or the end of the file, the regex engine (or the code that calls it) is behaving incorrectly.

What I'm seeing is different from what you're seeing. Here (TextPad 6.1 on Windows 7) the behaviour of leftward repetitive searches with your regex and similar regexes does not depend on whether those regexes end with a $. This is as it should be. However, the behaviour does in some cases depend on whether Wrap Searches is selected. When not selected, the behaviour of leftward searches of these regexes is sometimes incorrect. When selected, the behaviour seems to be always incorrect. Try .+
Post Reply