RTF Search

tteeples · Post by **tteeples** » Thu Aug 27, 2009 2:17 pm

I work extensively with RTF files and am brand new to this POSIX regular expression thing. Please forgive me if my choice of verbiage is not correct. I am looking for a search expression that finds a string within a string.

For example I have a line in the font table that looks like

{\f40\fbidi \froman\fcharset238\fprq2 Times New Roman CE{\*\falt Times New Roman};}

I am trying to get an expression that will find

{\*\falt Times New Roman}

(\{\\\*\\)
finds the first {\*\ part of the string and
(\{\\\*\\).*
finds everything from that point to the end of the line, but I can't figure out how to select everything from there to the first } in the line.

Can someone please help me out or at least point me in the right direction?

Thanks in advance!!!

Bob Hansen · Post by **Bob Hansen** » Thu Aug 27, 2009 7:45 pm

To find thie: {\*\falt Times New Roman} do this:

1. Find what: \{\\\*\\falt Times New Roman\}

Use the following settings:
-----------------------------------------
[X] Regular expression
"Find Next" or "Mark All"
-----------------------------------------

Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
-----------------------------------------

tteeples · Post by **tteeples** » Thu Aug 27, 2009 7:55 pm

Bob,
I appreciate your help! However I was not clear in my first comments. Your suggestion is too specific, the line I provided was only a sample line of many. A better example would be:

{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman{\*\falt Times New Roman};}
{\f37\froman\fcharset0\fprq2{\*\panose 00000000000000000000}Georgia;}
{\f38\fswiss\fcharset0\fprq2{\*\panose 00000000000000000000}Verdana;}
{\f39\froman\fcharset238\fprq2 Times New Roman CE{\*\falt Times New Roman};}
{\f40\froman\fcharset204\fprq2 Times New Roman Cyr{\*\falt Times New Roman};}
{\f42\froman\fcharset161\fprq2 Times New Roman Greek{\*\falt Times New Roman};}
{\f43\froman\fcharset162\fprq2 Times New Roman Tur{\*\falt Times New Roman};}
{\f44\froman\fcharset177\fprq2 Times New Roman (Hebrew){\*\falt Times New Roman};}
{\f45\froman\fcharset178\fprq2 Times New Roman (Arabic){\*\falt Times New Roman};}
{\f46\froman\fcharset186\fprq2 Times New Roman Baltic{\*\falt Times New Roman};}
{\f47\froman\fcharset163\fprq2 Times New Roman (Vietnamese){\*\falt Times New Roman};}

I need to find everything between the {\*\ and the corresponding or next } on each line.

Bob Hansen · Post by **Bob Hansen** » Fri Aug 28, 2009 1:24 am

Does this work for you? This will find everything inside and including the braces {....}.

Find what: \{\\\*\\.[^}]*}
============================

Add ( ) to isolate just the text without the lead chars and the closing }

Find what: \{\\\*\\(.[^}]*)}
This string can now be used in replacement as \1

What are you trying to do, just highlight the string, or extract it and use it with something else? Trying to remove everything from the line? What about your example that has more than one on a line?

tteeples · Post by **tteeples** » Fri Aug 28, 2009 3:46 am

Bob

Thank you so much for your help!! That was exactly what I was looking for. However what I was looking for didn't quite do what I wanted it to.

The new MS Word likes to throw tons of additional data into a file when you save it. I was looking for a way to delete this additional data. Unfortunately there are fields within fields in an rtf document. So pulling from {\*\ to the first } doesn't work if it is something like this:

{\*\data{sub data{additional data}}}

Maybe I am just getting in way over my head.

I do appreciate your help though!! I am not totally giving up on this regular expression thing.

Bob Hansen · Post by **Bob Hansen** » Fri Aug 28, 2009 4:42 am

Does not sound like a problem.
You could probably modify the RegEx to have different ones for multiple imbedded data groups.

Assume three imbedded data groups:
You would run the group 3 retrieve data RegEx first, write it somewhere, and delete that group
Then run the group 2 retrieve data RegEx, append to groupr 3 data, and delete that group.
Then run the group 1 retrieve data RedEx, append to the earlier data groups and delete everything except the data you wrote. Consider the RegEx that I gave you as the group1 expression.

That could all be put into a macro to do it all in one pass.