I am trying to process tab-delimited files output from Excel to prepare them for input into another program. The files have a prelude section, a repeated data section, and a postlude section. I want to get rid of the prelude/postlude entirely and transform each line of the repeated section.
My input file looks like this:
-------------------------------
Princeton University
Function Revenue by Category
Sorted by Function Date
Filtered by Site = TEST
"Tuesday, November 4, 2008"
Event # Event Name Account # Customer Name Function Date Function Type Category Charges
TEST200 Astronomy 9991111 Jane Doe 11/4/2008 Dinner Buffet Food "1,767.00"
TEST200 Astronomy 9991111 Jane Doe 11/4/2008 Dinner Buffet S & S 420.00
TEST200 Astronomy 9991111 Jane Doe 11/4/2008 Dinner Buffet Svc Chg 265.05
Total: "2,452.05"
"Printed: November 4, 2008 - 11:50 AM" Page 1
-------------------------------
I have tried the following seach and replace strings, but I can't seem to get this to work.
Search
Princeton University.*Charges$.*?\t{7}\r\n(?:(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t"?(.*?)"?$)*?\t{7}\r\n.*
Replace
"D"\t$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8
Options:
* Regular expression
* Replacement format
(Above and below the repeating data lines are lines consisting solely of 7 tabs.) I would appreciate any suggestions you have to offer.
Thanks!
Dave
Having trouble with replace of repetitive lines
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
Having trouble with replace of repetitive lines
David L. Herrington
Manager, Departmental Application Services
OIT Administrative Information Services
Princeton University
daveh@princeton.edu – http://oitdas.princeton.edu
609.258.2952 (Voice) – 609.258.2927 (Fax)
Manager, Departmental Application Services
OIT Administrative Information Services
Princeton University
daveh@princeton.edu – http://oitdas.princeton.edu
609.258.2952 (Voice) – 609.258.2927 (Fax)
-
- Posts: 2461
- Joined: Sun Mar 02, 2003 9:22 pm
The way you use $ indicates that you are not using the option '.' does not match a newline character (so that '.' does match a newline), while your use of \r\n indicates that you are using that option (so that '.' does not match a newline). You must be consistent.
If you do not use that option, this will match the entire text:
Princeton University.*Charges$.*?\t{7}\r\n(((.*?)\t){7}"?(.*?)"?$)*?\r\n\t{7}\r\n.*
(Note the extra \r\n near the end.)
To match the lines you want to keep, you can use
Princeton University.*Charges$.*?\t{7}\r\n((((.*?)\t){7}"?(.*?)"?$)*?)\r\n\t{7}\r\n.*
and use $1 in the replacement.
I don't believe you can trim the top and bottom and get rid of the quote symbols at the same time.
If you do not use that option, this will match the entire text:
Princeton University.*Charges$.*?\t{7}\r\n(((.*?)\t){7}"?(.*?)"?$)*?\r\n\t{7}\r\n.*
(Note the extra \r\n near the end.)
To match the lines you want to keep, you can use
Princeton University.*Charges$.*?\t{7}\r\n((((.*?)\t){7}"?(.*?)"?$)*?)\r\n\t{7}\r\n.*
and use $1 in the replacement.
I don't believe you can trim the top and bottom and get rid of the quote symbols at the same time.