Beginner Basics - editing lists of data

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Beginner Basics - editing lists of data

Post by CJEMS »

How do I use TextPad to edit a list of data consisting of numbers and words per line with, say, 150 line of data? The goal is to eliminate the spaces between words per line and delete any numbers appearing at the beginning of each line of data. There are no punctuations per line.

More specifically, will the resident macros in preferences be sufficient for this task?

Thx for any help, a TextPad Newbie
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Please provide examples.

Rather than describing with words, it is best to provide a sample of multiple lines showing before and after results wanted.
Hope this was helpful.............good luck,
Bob
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Post by CJEMS »

A few examples come to mind. A simple one might be using a search result from a keyword search that generates other phrases or expressions containing the original keyword.

3479 marine fish
3467 fish oil
3420 piranha fish
3415 angler fish
3382 fish finder
3183 fish restaurant
2989 game and fish
2891 fish net
2876 blue fish
2813 parrot fish
2758 gold fish
2716 fish face

The goal now is to eliminate the numbers, and the spaces.

Result:
marine fish
fish oil
piranha fish
angler fish
fish finder
fish restaurant
game and fish
fish net
blue fish
parrot fish
gold fish
fish face

or

marinefish
fishoil
piranhafish
anglerfish
fishfinder
fishrestaurant
gameandfish
fishnet
bluefish
parrotfish
goldfish
fishface

Both results will meet my needs at least for the very near future.


Thank you.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

To remove the prefix numbers:
Find what: ^[0-9]+[_\t]* [Replace the underscore with a space]
Replace with: [nothing]

[X] Regular expression
To remove all spaces:
Find what: _ [Replace the underscore with a space]
Replace with: [nothing]

[X] Regular expression
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Post by CJEMS »

Thanks. The only problem is I really have no proficiency with this app beyond using it as a temporary placement while moving data into and out of incompatible applications.
I place the formula where to generate the result?

Thank You
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

From the Main Menu:
Configure/Preferences/Editor
Check for use POSIX syntax
Apply, Close Preferences

From the Main Menu
Search/Replace
Enter first set of values provided by ben_josephs
Check for Regular Expressions
Replace All
Enter second set of values provided
Replace All

If not correct can do Undo with CTL-Z.
Hope this was helpful.............good luck,
Bob
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Post by CJEMS »

Thanks Bob, but no go.
This procedure eliminates only the first set of numbers on the top line, with no other lines effected.
I tried highlighting and running the instructions, still no go.

Any other suggestions?

Thank You
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

The Search/Replace strings were provided by ben_josephs. They work perfectly for me.

Using POSIX syntax.

Make sure no lines are highlighted/selected.
Press CTL-Home to move cursor to beginning of the first line.
Open the Replace window.

Other values in the Replace Window
Text/No Case Match / Regular Expressions / Active document

This is the first string that is searched for:
^[0-9]+[ \t]* //Note the space before the \t

Also be sure if you copy and paste that you do not have a trailing space at the end of the string.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All

And the second string searched for is just a single space character.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All

====================

If still not working, was sample provided a copy/paste of real data, or just a sample of what it might look like? Again, it works fine with the twelve line sample data you provided.
Hope this was helpful.............good luck,
Bob
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Post by CJEMS »

Thank You Bob Hansen for the helpful feedback (and ben_joseph for the effective search string)

I finally realize why I kept getting an error message - "Cannot find regular expression: '^[0-9]+[ \y]*' .......

The data list I am using has a space in front of the number string on each line of data. When I delete this space or run the second string search FIRST (Replace All "_" { "_" being a spacebar keystroke}), then I can successfully run the string search ^[0-9]+[ \t]*
On the original example that space was absent.

Bravc! and Thank You.

Simple and Elegant solution.

Now I am wondering how I can delete ONLY the first space (blank space) in front of each number string on each line. Then I will run the Replace All ^[0-9]+[ \t]* to delete ALL numbers + [ \t] (ALL large spaces or whatever those spaces are in my data lists (more than single spaces!)) and produce a data list with the text (words) separated by the proper spacing on each line of data

Data list example:

57107 jack russell terrier dog
44462 boston terrier
40209 yorkshire terrier
34925 bull terrier
28179 jack russell terrier
20891 terrier
17092 rat terrier
12748 cairn terrier
9750 west highland terrier
9606 scottish terrier
9536 staffordshire bull terrier
9369 american pit bull terrier
7196 wheaten terrier
7063 american staffordshire terrier
6871 border terrier
6541 west highland white terrier
6519 fox terrier
6194 boston terrier puppy
5948 silky terrier

Notice the single space in front of each number string per line. If I manually delete that single space on every other line of data then run the Replace All search string ^[0-9]+[ \t]*,
the result looks like this:

jack russell terrier dog
44462 boston terrier
yorkshire terrier
34925 bull terrier
jack russell terrier
20891 terrier
rat terrier
12748 cairn terrier
west highland terrier
9606 scottish terrier
staffordshire bull terrier
9369 american pit bull terrier
wheaten terrier
7063 american staffordshire terrier
border terrier
6541 west highland white terrier
fox terrier
6194 boston terrier puppy
silky terrier

The only lines that perform the original search string command ^[0-9]+[ \t]* successfully are the manually deleted ones.

Here is the result I am looking for:

jack russell terrier dog
boston terrier
yorkshire terrier
bull terrier
jack russell terrier
terrier
rat terrier
cairn terrier
west highland terrier
scottish terrier
staffordshire bull terrier
american pit bull terrier
wheaten terrier
american staffordshire terrier
border terrier
west highland white terrier
fox terrier
boston terrier puppy
silky terrier

Again, Bravo! and Thank You for your help and continuing efforts. I am beginning to understand the power of this app in handling raw data
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Try adding a space at the front of the Search string from ben_josephs

Change this:
^[0-9]+[ \t]*

To this:
^ [0-9]+[ \t]*

Note the extra space after the ^ character.

I cannot test right now, but that should do the same as the original, but will also remove the leading space character. If there is more than one space aat the beginning, then add an * after the space, like this:
^ *[0-9]+[ \t]*
Hope this was helpful.............good luck,
Bob
CJEMS
Posts: 7
Joined: Sat Apr 14, 2007 6:33 pm

Post by CJEMS »

Thanks again Bob.

This is perfect. I am wondering if there is any general info tutorials on this forum that gives beginners a really good overview?


Thank You.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.

There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.

A standard reference for regular expressions is

Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/

But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.
Post Reply