Beginner Basics - editing lists of data
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Beginner Basics - editing lists of data
How do I use TextPad to edit a list of data consisting of numbers and words per line with, say, 150 line of data? The goal is to eliminate the spaces between words per line and delete any numbers appearing at the beginning of each line of data. There are no punctuations per line.
More specifically, will the resident macros in preferences be sufficient for this task?
Thx for any help, a TextPad Newbie
More specifically, will the resident macros in preferences be sufficient for this task?
Thx for any help, a TextPad Newbie
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
A few examples come to mind. A simple one might be using a search result from a keyword search that generates other phrases or expressions containing the original keyword.
3479 marine fish
3467 fish oil
3420 piranha fish
3415 angler fish
3382 fish finder
3183 fish restaurant
2989 game and fish
2891 fish net
2876 blue fish
2813 parrot fish
2758 gold fish
2716 fish face
The goal now is to eliminate the numbers, and the spaces.
Result:
marine fish
fish oil
piranha fish
angler fish
fish finder
fish restaurant
game and fish
fish net
blue fish
parrot fish
gold fish
fish face
or
marinefish
fishoil
piranhafish
anglerfish
fishfinder
fishrestaurant
gameandfish
fishnet
bluefish
parrotfish
goldfish
fishface
Both results will meet my needs at least for the very near future.
Thank you.
3479 marine fish
3467 fish oil
3420 piranha fish
3415 angler fish
3382 fish finder
3183 fish restaurant
2989 game and fish
2891 fish net
2876 blue fish
2813 parrot fish
2758 gold fish
2716 fish face
The goal now is to eliminate the numbers, and the spaces.
Result:
marine fish
fish oil
piranha fish
angler fish
fish finder
fish restaurant
game and fish
fish net
blue fish
parrot fish
gold fish
fish face
or
marinefish
fishoil
piranhafish
anglerfish
fishfinder
fishrestaurant
gameandfish
fishnet
bluefish
parrotfish
goldfish
fishface
Both results will meet my needs at least for the very near future.
Thank you.
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
From the Main Menu:
Configure/Preferences/Editor
Check for use POSIX syntax
Apply, Close Preferences
From the Main Menu
Search/Replace
Enter first set of values provided by ben_josephs
Check for Regular Expressions
Replace All
Enter second set of values provided
Replace All
If not correct can do Undo with CTL-Z.
Configure/Preferences/Editor
Check for use POSIX syntax
Apply, Close Preferences
From the Main Menu
Search/Replace
Enter first set of values provided by ben_josephs
Check for Regular Expressions
Replace All
Enter second set of values provided
Replace All
If not correct can do Undo with CTL-Z.
Hope this was helpful.............good luck,
Bob
Bob
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
The Search/Replace strings were provided by ben_josephs. They work perfectly for me.
Using POSIX syntax.
Make sure no lines are highlighted/selected.
Press CTL-Home to move cursor to beginning of the first line.
Open the Replace window.
Other values in the Replace Window
Text/No Case Match / Regular Expressions / Active document
This is the first string that is searched for:
^[0-9]+[ \t]* //Note the space before the \t
Also be sure if you copy and paste that you do not have a trailing space at the end of the string.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All
And the second string searched for is just a single space character.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All
====================
If still not working, was sample provided a copy/paste of real data, or just a sample of what it might look like? Again, it works fine with the twelve line sample data you provided.
Using POSIX syntax.
Make sure no lines are highlighted/selected.
Press CTL-Home to move cursor to beginning of the first line.
Open the Replace window.
Other values in the Replace Window
Text/No Case Match / Regular Expressions / Active document
This is the first string that is searched for:
^[0-9]+[ \t]* //Note the space before the \t
Also be sure if you copy and paste that you do not have a trailing space at the end of the string.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All
And the second string searched for is just a single space character.
Replaced with nothing. (Press backspace to clear any invisible spaces).
Replace All
====================
If still not working, was sample provided a copy/paste of real data, or just a sample of what it might look like? Again, it works fine with the twelve line sample data you provided.
Hope this was helpful.............good luck,
Bob
Bob
Thank You Bob Hansen for the helpful feedback (and ben_joseph for the effective search string)
I finally realize why I kept getting an error message - "Cannot find regular expression: '^[0-9]+[ \y]*' .......
The data list I am using has a space in front of the number string on each line of data. When I delete this space or run the second string search FIRST (Replace All "_" { "_" being a spacebar keystroke}), then I can successfully run the string search ^[0-9]+[ \t]*
On the original example that space was absent.
Bravc! and Thank You.
Simple and Elegant solution.
Now I am wondering how I can delete ONLY the first space (blank space) in front of each number string on each line. Then I will run the Replace All ^[0-9]+[ \t]* to delete ALL numbers + [ \t] (ALL large spaces or whatever those spaces are in my data lists (more than single spaces!)) and produce a data list with the text (words) separated by the proper spacing on each line of data
Data list example:
57107 jack russell terrier dog
44462 boston terrier
40209 yorkshire terrier
34925 bull terrier
28179 jack russell terrier
20891 terrier
17092 rat terrier
12748 cairn terrier
9750 west highland terrier
9606 scottish terrier
9536 staffordshire bull terrier
9369 american pit bull terrier
7196 wheaten terrier
7063 american staffordshire terrier
6871 border terrier
6541 west highland white terrier
6519 fox terrier
6194 boston terrier puppy
5948 silky terrier
Notice the single space in front of each number string per line. If I manually delete that single space on every other line of data then run the Replace All search string ^[0-9]+[ \t]*,
the result looks like this:
jack russell terrier dog
44462 boston terrier
yorkshire terrier
34925 bull terrier
jack russell terrier
20891 terrier
rat terrier
12748 cairn terrier
west highland terrier
9606 scottish terrier
staffordshire bull terrier
9369 american pit bull terrier
wheaten terrier
7063 american staffordshire terrier
border terrier
6541 west highland white terrier
fox terrier
6194 boston terrier puppy
silky terrier
The only lines that perform the original search string command ^[0-9]+[ \t]* successfully are the manually deleted ones.
Here is the result I am looking for:
jack russell terrier dog
boston terrier
yorkshire terrier
bull terrier
jack russell terrier
terrier
rat terrier
cairn terrier
west highland terrier
scottish terrier
staffordshire bull terrier
american pit bull terrier
wheaten terrier
american staffordshire terrier
border terrier
west highland white terrier
fox terrier
boston terrier puppy
silky terrier
Again, Bravo! and Thank You for your help and continuing efforts. I am beginning to understand the power of this app in handling raw data
I finally realize why I kept getting an error message - "Cannot find regular expression: '^[0-9]+[ \y]*' .......
The data list I am using has a space in front of the number string on each line of data. When I delete this space or run the second string search FIRST (Replace All "_" { "_" being a spacebar keystroke}), then I can successfully run the string search ^[0-9]+[ \t]*
On the original example that space was absent.
Bravc! and Thank You.
Simple and Elegant solution.
Now I am wondering how I can delete ONLY the first space (blank space) in front of each number string on each line. Then I will run the Replace All ^[0-9]+[ \t]* to delete ALL numbers + [ \t] (ALL large spaces or whatever those spaces are in my data lists (more than single spaces!)) and produce a data list with the text (words) separated by the proper spacing on each line of data
Data list example:
57107 jack russell terrier dog
44462 boston terrier
40209 yorkshire terrier
34925 bull terrier
28179 jack russell terrier
20891 terrier
17092 rat terrier
12748 cairn terrier
9750 west highland terrier
9606 scottish terrier
9536 staffordshire bull terrier
9369 american pit bull terrier
7196 wheaten terrier
7063 american staffordshire terrier
6871 border terrier
6541 west highland white terrier
6519 fox terrier
6194 boston terrier puppy
5948 silky terrier
Notice the single space in front of each number string per line. If I manually delete that single space on every other line of data then run the Replace All search string ^[0-9]+[ \t]*,
the result looks like this:
jack russell terrier dog
44462 boston terrier
yorkshire terrier
34925 bull terrier
jack russell terrier
20891 terrier
rat terrier
12748 cairn terrier
west highland terrier
9606 scottish terrier
staffordshire bull terrier
9369 american pit bull terrier
wheaten terrier
7063 american staffordshire terrier
border terrier
6541 west highland white terrier
fox terrier
6194 boston terrier puppy
silky terrier
The only lines that perform the original search string command ^[0-9]+[ \t]* successfully are the manually deleted ones.
Here is the result I am looking for:
jack russell terrier dog
boston terrier
yorkshire terrier
bull terrier
jack russell terrier
terrier
rat terrier
cairn terrier
west highland terrier
scottish terrier
staffordshire bull terrier
american pit bull terrier
wheaten terrier
american staffordshire terrier
border terrier
west highland white terrier
fox terrier
boston terrier puppy
silky terrier
Again, Bravo! and Thank You for your help and continuing efforts. I am beginning to understand the power of this app in handling raw data
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Try adding a space at the front of the Search string from ben_josephs
Change this:
^[0-9]+[ \t]*
To this:
^ [0-9]+[ \t]*
Note the extra space after the ^ character.
I cannot test right now, but that should do the same as the original, but will also remove the leading space character. If there is more than one space aat the beginning, then add an * after the space, like this:
^ *[0-9]+[ \t]*
Change this:
^[0-9]+[ \t]*
To this:
^ [0-9]+[ \t]*
Note the extra space after the ^ character.
I cannot test right now, but that should do the same as the original, but will also remove the leading space character. If there is more than one space aat the beginning, then add an * after the space, like this:
^ *[0-9]+[ \t]*
Hope this was helpful.............good luck,
Bob
Bob
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Look in TextPad's help under
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.
Reference Information | Regular Expressions,
Reference Information | Replacement Expressions and
How to... | Find and Replace Text | Use Regular Expressions.
There are many regular expression tutorials on the web, and you will find recommendations for some of them if you search this forum.
A standard reference for regular expressions is
Friedl, Jeffrey E F
Mastering Regular Expressions, 2nd ed
O'Reilly, 2002
ISBN: 0596002890
http://regex.info/
But be aware that the regular expression recogniser used by TextPad is rather weak by the standards of recent tools, so you may get frustrated if you discover a handy trick that doesn't work in TextPad.