RegEx

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
trespasser
Posts: 11
Joined: Wed Jul 02, 2003 6:48 am

RegEx

Post by trespasser »

Hi,

I am wondering whether TextPad/RegEx can do the following, I have a large number of files that I want to get certain pieces of information out of, the files that looks like this, this being only a small sample of the total contents,

<Paragrap>This is going to be a large piece of text<Paragraph/><HouseStyle>Detached<HouseStyle/><Price>£300,000<Price/>
<Beds>3<Beds/>

For each file in a directory I want a row that will be formatted like -:

Paragraph HouseStyle Price Beds
This is going.... Detached £300,000 3

Is this at all possible or is it just wishful thinking?

Thanks PD
User avatar
s_reynisson
Posts: 940
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

The formatting can be done, is the data on a single or multiple lines? Pls post your sample using the Code tag. TP can handle single lines but you need a multiple line capable regex engine, a tool like WildEdit to name but one, for multiple lines.
Then I open up and see
the person fumbling here is me
a different way to be
trespasser
Posts: 11
Joined: Wed Jul 02, 2003 6:48 am

REgEx

Post by trespasser »

Hi,

Thanks for the reply with regards to my posting, the actual data is XML. I am afriad that I cant post the exact file due to security reasons but it is in the same format as my example :

There is Data above the TransformedXml Tag

<TransformedXml>
<Paragrap>This is going to be a large piece of text<Paragraph/><HouseStyle>Detached<HouseStyle/><Price>£300,000<Price/>
<Beds>3<Beds/><TransformedXml/>

There is Data below the TransformedXml Tag

So the pieces of data that I need stripping out is halfway through the Xml file.

The data is on multiple lines and all files are in one folder with the same format.

Sorry I cant be any more specific, hope you can help

Regards PD 8)
User avatar
s_reynisson
Posts: 940
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Ok, Wildedit it is. To use it on files larger than 10KB you'll need to register.

First clean your files of newlines within the Paragraph tags.
Something like
Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2

Before you do that you need to tick "'.' does not match a newline
character" in the options. Narrow your search in the first step on the
Paragraph tag as needed, I'm just grabbing them all.
Repeat this until WE reports zero changes made, check the log tab.

Next clear the tick for "'.' does not match a newline character" in the
options.
Find - this is all on one line
<TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>
Replace - four lines

Code: Select all

<TransformedXml>
Paragraph HouseStyle Price Beds
$1 $2 $3 $4
<TransformedXml/>
A word of warning to cover my royal beh*, I'm doing this on a very
small sample of data, take care, back up etc ;)
Then I open up and see
the person fumbling here is me
a different way to be
trespasser
Posts: 11
Joined: Wed Jul 02, 2003 6:48 am

Reply!!

Post by trespasser »

Hi there,

Thanks again for the assistance and my apolgise for not getting back to you sooner. I have tried what you suggested and my new data layout is the same as my old one.

I might be doing something wrong but these are the steps that I carried out

I put a tick in the '.' does not match a newline character

Then I ran the

Find
(<Paragrap>.*?)\r\n(.*?<Paragraph/>)
Replace
$1$2

Then I un-ticked the box that I previosuly ticked

Then I did a find

TransformedXml>.*?<Paragrap>(.*?)<Paragraph/>.*?<HouseStyle>(.*?)<HouseStyle/>.*?<Price>(.*?)<Price/>.*?<Beds>(.*?)<Beds/>.*?<TransformedXml/>

And Replaced it with

<TransformedXml>
Paragraph HouseStyle Price Beds
$1 $2 $3 $4
<TransformedXml/>

Have I not understood you answer and being a bit dim, never used WildEdit before so all replies have to be very very very simple :lol:
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Your examples are not XML. The forward slash in an end tag is in front of the tag name, not after it. I have corrected this in my example below. I have also corrected the misspelling of Paragraph.

You have not made your requirements clear and you have not explained what doesn't work.

Do you want the items laid out in columns? How do you want the large piece of text arranged?

You can't output fixed-width columns if the items in one column are are of different widths. But you can approximate them with tabs.

I would try something like this as a starting point, with '.' does not match a newline character not selected:
Find what:
<Paragraph>(.*?)</Paragraph>\s*<HouseStyle>(.*?)</HouseStyle>\s*<Price>(.*?)</Price>\s*<Beds>(.*?)</Beds>

Replace with:
Paragraph\tHouseStyle\tPrice\t\tBeds
$1
\t\t$2\t$3\t4
Post Reply