Page 1 of 1

find each field label

Posted: Fri Apr 27, 2012 12:49 pm
by mdh
Have a large text file with thousands of entries like the following. Need to identify each field label. Would like one reg. exp. to find all. Each label consists of a term followed by a colon, or two terms separated by a space followed by a colon, or two terms separated by a comma and space followed by a colon.

Note multiple spaces exist at the beginning of lines and between label:value pairs, and following label:value pairs

Lastname, Firstname
id:44486000002443 library:ABC
*Address Information-- Mailing Address:1
address1:
Daytime phone:nnn-nnn-nnnn
Line:Green
Street:P.O. Box nnn
City, state:City, UU.
Zip:nnnnn
Email:account@domain.com
Phone:nnn-nnn-nnnn
address2:
none

address3:
none

*Extended Information--
none
Profile:PUBLIC status:OK

bills:0 charges:0 holds:0
number of history charges:0
total bills:6 total charges:137
created:12/4/2001 last use:5/5/2007 priv granted:12/2/2009
priv expired:12/2/2011
cat1:FEMALE cat2: cat3:
cat4: cat5:
Profile:PUBLIC user access:PUBLIC environment:PUBLIC
dept:

Posted: Fri Apr 27, 2012 1:20 pm
by ben_josephs
You haven't provided enough information.

What do you want to do with the labels? Please give examples of the text before and after the change you require. Enclose them in [​code]...[/code] tags.

What is the label in the line
*Address Information-- Mailing Address:1
?

possible output

Posted: Fri Apr 27, 2012 1:56 pm
by mdh
Ultimately need to import into Excel, so \n before each label would be OK, and some unique character before each label to strip all lines which do not begin with a label.

The astericked line will be ignored, the address# lines will be used.

Thank you for your consideration.

Posted: Fri Apr 27, 2012 2:45 pm
by ben_josephs
Using # as the special character:

Use "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Search | Replace... (<F8>):
Find what: [^:]+:[^ ]* *
Replace with: \n#\0

[X] Regular expression

Replace All
Then mark all lines that begin with a # that's followed by anything other than a *:

Search | Find... (<F5>):
Find what: ^#[^*]

[X] Regular expression

Mark All

Terrific!

Posted: Fri Apr 27, 2012 3:49 pm
by mdh
Thank you, very much.