find words with the same beginning

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
pfistyle
Posts: 4
Joined: Tue May 17, 2011 9:16 am

find words with the same beginning

Post by pfistyle »

hi there,

i don't know if i'm just too stupid or if i just didn't search as i should, but i didn't find any answer to my problem.

i have a text. in this text, there are normally a lot of names, beginning always with the same 3 letters, f. ex

ab- (then, there are some more letters, f.ex the whole name is ab-cdef-gh-1)

and i found out how to mark the line, where these words are, and i can cut the marked lines, but what i need is just all the names that begin with this ab- ...

can anyone help or give me a search - hint?

thx
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

You haven't provided enough information.

If the names can be anywhere in the line, and
if there is only one matching name on any line, and
if the characters in the names are restricted to letters, digits and hyphens, and
if names embedding the prefix (such as cd-abef-gh-1) are not to be matched,
then
try this:

Use "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Search | Replace... (<F8>):
Find what: (^|.*[^a-z0-9-])(ab-[a-z0-9-]+).*
Replace with: \2

[X] Regular expression

Replace All
Last edited by ben_josephs on Tue May 17, 2011 11:24 am, edited 1 time in total.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

If you perform the following search and replace it will delete all (nearly all - see below) lines containing the string 'ab-'
Find what: ^.*ab-.*\n

Replace with: [nothing]

[X] Regular expression

Replace All
In the preferences make sure you have enabled POSIX regular expressions.

This saerches for all lines (^ is the start of a line) followed by any characters (.) repeated zero or more times (*) followed by the 'ab-' string then any more characters until the final line feed (\n).

Where this won't work is if the last line of the file contains one of the names. This is because there is no line feed there..

It might be worth checking out TextPad's help file on regular expression searching as it's a good introduction.

Hope this helps.[/i]
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The OP wants to keep the matching words, not delete the lines containing them.

And ^.*ab-.*\n matches lines containing words that embed ab- as well as lines containing words that begin with it.
Last edited by ben_josephs on Tue May 17, 2011 11:29 am, edited 1 time in total.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

I've just re-read your reply and realised that you probably don't want to cut all the lines that contain the 'ab-' strings as I first thought.

ben_josephs answer is appropriate if you want to list them. I'll leave my incorrect answer up in case it's of interest.

As an aside, is there a way to delete the last line if it contains the name?
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

A line not terminated with a line terminator is arguably not a line.

I don't believe that using a single TextPad regex you can match both terminated lines (with their terminators) and an unterminated last line, because TextPad's weak regex recogniser doesn't allow a \n to be in an alternation, to be quantified, or to be contained in a parenthesised expression. (It appears to allow \n?, but that doesn't work.)
pfistyle
Posts: 4
Joined: Tue May 17, 2011 9:16 am

Post by pfistyle »

hi there,

thx for your answers...

indeed i wanted to be able to copy these words... with ^.*ab-.* i can find the lines that have these expressions, but is there a way to just mark these words, and not the whole line?

what i am trying to do is to copy a whole text and just search for the names and copy them in a new textfile...

txh
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

What are the answers to the questions I asked earlier?
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

pfistyle wrote:what i am trying to do is to copy a whole text and just search for the names and copy them in a new textfile...
Can you provide a sample of the file format to see what is required? If you can show this as Code to keep the formatting it will help.

To provide a solution you need to provide some guidance by answering the questions from ben_josephs regarding the positions of the names and whether the 'ab-' string can occur elsewhere (such as cd-efab-gh-1).
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
pfistyle
Posts: 4
Joined: Tue May 17, 2011 9:16 am

Post by pfistyle »

hi there,

sorry for being absent for a while.

the text looks f. ex. like this:


bla bla bla ab-cdef-gh-01 bla bla bla bla bla
ab-ijkl-gh-07 bla bla bla ab-ijkl-gh-08 bla
bla bla bla bla ab-mnop-gh-003 bla bla bla
bla bla ab-qrst-gh-05 bla bla:

ab-uvw-gh-01

bla bla bla...


hope this helped to make clear the structure... now the expressions i look for are always beginning with the same two letters, a "-", then 3 to 4 letters random, another "-", then the same two letters and in the end there is always a 2 or 3 digit number.

thanks!
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Please explain precisely in what way my earlier suggestion isn't suitable.

Here's another, more restrictive, suggestion that should work with the style of text in your example:
Find what: (^|.* )(ab-[a-z]+-gh-[0-9]+).*
Replace with: \2

[X] Regular expression

Replace All
Or even
Find what: .*\<(ab-[a-z]+-gh-[0-9]+).*
Replace with: \1

[X] Regular expression

Replace All
These assume you are using "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
pfistyle
Posts: 4
Joined: Tue May 17, 2011 9:16 am

Post by pfistyle »

hi ben_josephs,

i tried both replace - strings, they didn't work (expression not found)
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

They do work, all three of them. Try again, following the instructions exactly.

Make sure a spurious space hasn't been introduced at the end of the regexes.
User avatar
SteveH
Posts: 327
Joined: Thu Apr 03, 2003 11:37 am
Location: Edinburgh, Scotland
Contact:

Post by SteveH »

The other thing to try is to make a TextPad file containing your example text and run the search and replace on that.

Code: Select all

bla bla bla ab-cdef-gh-01 bla bla bla bla bla 
ab-ijkl-gh-07 bla bla bla ab-ijkl-gh-08 bla 
bla bla bla bla ab-mnop-gh-003 bla bla bla 
bla bla ab-qrst-gh-05 bla bla: 

ab-uvw-gh-01 

bla bla bla
If it works on that text (it should) and not on the 'real' file then your example may not represent the real file.
Running TextPad 5.4 on Windows XP SP3 and on OS X 10.7 under VMWare or Crossover.
Post Reply