My source looks like this (a list of long distance UK walks):
1066 Country Walk, East Sussex - 50 km (31 miles) Pevensey Castle to Rye
Abbeys Amble, North Yorkshire, 167 km (104 miles)
Abbott's Hike, 172 km (107 miles) Cumbria challenging moorland walking
Ainsty Bounds Walk, North Yorkshire, circular from Tadcaster, 71 km (44 miles)
Angles Way, 123 km (76 miles) from Great Yarmouth to Knettishall Heath, with much of the path following the Norfolk/Suffolk border. Additionally there is a link path from Knettishall Heath to Thetford
Avon Valley Path, 54 km (34 miles) Christchurch to Salisbury (Hampshire and Wiltshire)
Basingstoke Canal, 53 km (33 miles)
Bishop Bennet Way, 55 km (34 miles) Beeston to Wirswall (Cheshire, Staffordshire)
etc
I want a list like this:
1066 Country Walk, 31
Abbeys Amble, 104
Abbott's Hike, 107
Ainsty Bounds Walk, 44
etc
(So that I can import it into a spreadsheet and sort by distance.)
IOW I want the name before the first comma followed by a comma, a space and the mileage taken from inside the first pair of brackets.
I'm not clear why this doesn't work:
Find: (.*), (.*) \((.*) (.*)
Replace with: \1, \3
--
Terry, East Grinstead, UK
Extracting data, including some bracketed
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
If you're tempted to use .* check whether it's what you really mean.
Try
Try
Edit: Changed \ to new-style $ in replacement expression.Find what: ^([^,]+),[^(]*\((\d+).*
Replace with: $1, $2
Last edited by ben_josephs on Thu Oct 24, 2013 9:13 pm, edited 1 time in total.
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
Thanks. I have 4.7.3, indeed ancient, but I'm so comfortable with it that I can't work up the enthusiasm to change. But I do appreciate the handicap that gives me when seeking help here.
Your revised version worked just fine, thank you.
I follow the last part but could you briefly interpret that first bold section for me please
^([^,]+),[^(]*\(([0-9]+).*
I'm going to try adapting it to a slight change of requirement, namely to get a result which includes all the original text except the km data. Like this:
1066 Country Walk, East Sussex - Pevensey Castle to Rye, 31
Abbeys Amble, North Yorkshire, 104
Abbott's Hike, Cumbria challenging moorland walking, 107
Ainsty Bounds Walk, North Yorkshire, circular from Tadcaster, 44
etc
Do you think infrequent RE users like me would be better advised to tackle tasks like this in separate stages? In this case for example:
1. Delete the bracketed km
2. Get bracketed miles to the end
3. Remove unwanted left and right brackets and 'miles'
Much appreciate your help.
--
Terry, East Grinstead, UK
Your revised version worked just fine, thank you.
I follow the last part but could you briefly interpret that first bold section for me please
^([^,]+),[^(]*\(([0-9]+).*
I'm going to try adapting it to a slight change of requirement, namely to get a result which includes all the original text except the km data. Like this:
1066 Country Walk, East Sussex - Pevensey Castle to Rye, 31
Abbeys Amble, North Yorkshire, 104
Abbott's Hike, Cumbria challenging moorland walking, 107
Ainsty Bounds Walk, North Yorkshire, circular from Tadcaster, 44
etc
Do you think infrequent RE users like me would be better advised to tackle tasks like this in separate stages? In this case for example:
1. Delete the bracketed km
2. Get bracketed miles to the end
3. Remove unwanted left and right brackets and 'miles'
Much appreciate your help.
--
Terry, East Grinstead, UK
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
^([^,]+),[^(]* matches
where [^,]+ matches:
and [^(]* matches:
For your new requirement try
This is easier with TextPad 7's regex engine.
Yes, you might well find it easier to tackle such tasks in stages. With TextPad's old regex engine you often have no choice.
Code: Select all
^ the beginning of a line
( start of captured text number 1
[^,]+ any non-empty string within a line not containing a comma (see below)
) end of captured text number 1
, a comma
[^(]* any (possibly empty) string within a line not containing a left parenthesis (see below)
Code: Select all
[^,] any character except newline or comma
+ ... any non-zero number of times
Code: Select all
[^(] any character except newline or left parenthesis
* ... any (possibly zero) number of times
But this doesn't handle properly the arbitrary inclusion or otherwise of commas in the way you indicate.Find what: ^(.*[^,]),? [0-9]+ km \(([0-9]+) miles\)(.*)
Replace with: \1\3, \2
This is easier with TextPad 7's regex engine.
Yes, you might well find it easier to tackle such tasks in stages. With TextPad's old regex engine you often have no choice.