find each field label

mdh · Post by **mdh** » Fri Apr 27, 2012 12:49 pm

Have a large text file with thousands of entries like the following. Need to identify each field label. Would like one reg. exp. to find all. Each label consists of a term followed by a colon, or two terms separated by a space followed by a colon, or two terms separated by a comma and space followed by a colon.

Note multiple spaces exist at the beginning of lines and between label:value pairs, and following label:value pairs

Lastname, Firstname
id:44486000002443 library:ABC
*Address Information-- Mailing Address:1
address1:
Daytime phone:nnn-nnn-nnnn
Line:Green
Street:P.O. Box nnn
City, state:City, UU.
Zip:nnnnn
Email:account@domain.com
Phone:nnn-nnn-nnnn
address2:
none

address3:
none

*Extended Information--
none
Profile:PUBLIC status:OK

bills:0 charges:0 holds:0
number of history charges:0
total bills:6 total charges:137
created:12/4/2001 last use:5/5/2007 priv granted:12/2/2009
priv expired:12/2/2011
cat1:FEMALE cat2: cat3:
cat4: cat5:
Profile:PUBLIC user access:PUBLIC environment:PUBLIC
dept:

ben_josephs · Post by **ben_josephs** » Fri Apr 27, 2012 1:20 pm

You haven't provided enough information.

What do you want to do with the labels? Please give examples of the text before and after the change you require. Enclose them in [code]...[/code] tags.

What is the label in the line
*Address Information-- Mailing Address:1
?

mdh · Post by **mdh** » Fri Apr 27, 2012 1:56 pm

Ultimately need to import into Excel, so \n before each label would be OK, and some unique character before each label to strip all lines which do not begin with a label.

The astericked line will be ignored, the address# lines will be used.

Thank you for your consideration.

ben_josephs · Post by **ben_josephs** » Fri Apr 27, 2012 2:45 pm

Using # as the special character:

Use "Posix" regular expression syntax:

Configure | Preferences | Editor

[X] Use POSIX regular expression syntax

Search | Replace... (<F8>):

Find what: [^:]+:[^ ]* *
Replace with: \n#\0

[X] Regular expression

Replace All

Then mark all lines that begin with a # that's followed by anything other than a *:

Search | Find... (<F5>):

Find what: ^#[^*]

[X] Regular expression

Mark All

mdh · Post by **mdh** » Fri Apr 27, 2012 3:49 pm

Thank you, very much.

Community

find each field label

find each field label

possible output

Terrific!