Hi,
I am wondering whether TextPad/RegEx can do the following, I have a large number of files that I want to get certain pieces of information out of, the files that looks like this, this being only a small sample of the total contents,
<Paragrap>This is going to be a large piece of text<Paragraph/><HouseStyle>Detached<HouseStyle/><Price>£300,000<Price/>
<Beds>3<Beds/>
For each file in a directory I want a row that will be formatted like -:
Paragraph HouseStyle Price Beds
This is going.... Detached £300,000 3
Is this at all possible or is it just wishful thinking?
Thanks PD
RegEx
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
The formatting can be done, is the data on a single or multiple lines? Pls post your sample using the Code tag. TP can handle single lines but you need a multiple line capable regex engine, a tool like WildEdit to name but one, for multiple lines.
Then I open up and see
the person fumbling here is me
a different way to be
the person fumbling here is me
a different way to be
-
- Posts: 11
- Joined: Wed Jul 02, 2003 6:48 am
REgEx
Hi,
Thanks for the reply with regards to my posting, the actual data is XML. I am afriad that I cant post the exact file due to security reasons but it is in the same format as my example :
There is Data above the TransformedXml Tag
<TransformedXml>
<Paragrap>This is going to be a large piece of text<Paragraph/><HouseStyle>Detached<HouseStyle/><Price>£300,000<Price/>
<Beds>3<Beds/><TransformedXml/>
There is Data below the TransformedXml Tag
So the pieces of data that I need stripping out is halfway through the Xml file.
The data is on multiple lines and all files are in one folder with the same format.
Sorry I cant be any more specific, hope you can help
Regards PD 8)
Thanks for the reply with regards to my posting, the actual data is XML. I am afriad that I cant post the exact file due to security reasons but it is in the same format as my example :
There is Data above the TransformedXml Tag
<TransformedXml>
<Paragrap>This is going to be a large piece of text<Paragraph/><HouseStyle>Detached<HouseStyle/><Price>£300,000<Price/>
<Beds>3<Beds/><TransformedXml/>
There is Data below the TransformedXml Tag
So the pieces of data that I need stripping out is halfway through the Xml file.
The data is on multiple lines and all files are in one folder with the same format.
Sorry I cant be any more specific, hope you can help
Regards PD 8)
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
Ok, Wildedit it is. To use it on files larger than 10KB you'll need to register.
First clean your files of newlines within the Paragraph tags.
Something like
Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2
Before you do that you need to tick "'.' does not match a newline
character" in the options. Narrow your search in the first step on the
Paragraph tag as needed, I'm just grabbing them all.
Repeat this until WE reports zero changes made, check the log tab.
Next clear the tick for "'.' does not match a newline character" in the
options.
Find - this is all on one line
<TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>
Replace - four lines
A word of warning to cover my royal beh*, I'm doing this on a very
small sample of data, take care, back up etc
First clean your files of newlines within the Paragraph tags.
Something like
Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2
Before you do that you need to tick "'.' does not match a newline
character" in the options. Narrow your search in the first step on the
Paragraph tag as needed, I'm just grabbing them all.
Repeat this until WE reports zero changes made, check the log tab.
Next clear the tick for "'.' does not match a newline character" in the
options.
Find - this is all on one line
<TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>
Replace - four lines
Code: Select all
<TransformedXml>
Paragraph HouseStyle Price Beds
$1 $2 $3 $4
<TransformedXml/>
small sample of data, take care, back up etc
Then I open up and see
the person fumbling here is me
a different way to be
the person fumbling here is me
a different way to be
-
- Posts: 11
- Joined: Wed Jul 02, 2003 6:48 am
Reply!!
Hi there,
Thanks again for the assistance and my apolgise for not getting back to you sooner. I have tried what you suggested and my new data layout is the same as my old one.
I might be doing something wrong but these are the steps that I carried out
I put a tick in the '.' does not match a newline character
Then I ran the
Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2
Then I un-ticked the box that I previosuly ticked
Then I did a find
TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>
And Replaced it with
<TransformedXml>
Paragraph HouseStyle Price Beds
$1 $2 $3 $4
<TransformedXml/>
Have I not understood you answer and being a bit dim, never used WildEdit before so all replies have to be very very very simple
Thanks again for the assistance and my apolgise for not getting back to you sooner. I have tried what you suggested and my new data layout is the same as my old one.
I might be doing something wrong but these are the steps that I carried out
I put a tick in the '.' does not match a newline character
Then I ran the
Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2
Then I un-ticked the box that I previosuly ticked
Then I did a find
TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>
And Replaced it with
<TransformedXml>
Paragraph HouseStyle Price Beds
$1 $2 $3 $4
<TransformedXml/>
Have I not understood you answer and being a bit dim, never used WildEdit before so all replies have to be very very very simple
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
Your examples are not XML. The forward slash in an end tag is in front of the tag name, not after it. I have corrected this in my example below. I have also corrected the misspelling of Paragraph.
You have not made your requirements clear and you have not explained what doesn't work.
Do you want the items laid out in columns? How do you want the large piece of text arranged?
You can't output fixed-width columns if the items in one column are are of different widths. But you can approximate them with tabs.
I would try something like this as a starting point, with '.' does not match a newline character not selected:
You have not made your requirements clear and you have not explained what doesn't work.
Do you want the items laid out in columns? How do you want the large piece of text arranged?
You can't output fixed-width columns if the items in one column are are of different widths. But you can approximate them with tabs.
I would try something like this as a starting point, with '.' does not match a newline character not selected:
Find what:
<Paragraph>(.*?)</Paragraph>\s*<HouseStyle>(.*?)</HouseStyle>\s*<Price>(.*?)</Price>\s*<Beds>(.*?)</Beds>
Replace with:
Paragraph\tHouseStyle\tPrice\t\tBeds
$1
\t\t$2\t$3\t4