Have a large text file with thousands of entries like the following. Need to identify each field label. Would like one reg. exp. to find all. Each label consists of a term followed by a colon, or two terms separated by a space followed by a colon, or two terms separated by a comma and space followed by a colon.
Note multiple spaces exist at the beginning of lines and between label:value pairs, and following label:value pairs
Lastname, Firstname
id:44486000002443 library:ABC
*Address Information-- Mailing Address:1
address1:
Daytime phone:nnn-nnn-nnnn
Line:Green
Street:P.O. Box nnn
City, state:City, UU.
Zip:nnnnn
Email:account@domain.com
Phone:nnn-nnn-nnnn
address2:
none
address3:
none
*Extended Information--
none
Profile:PUBLIC status:OK
bills:0 charges:0 holds:0
number of history charges:0
total bills:6 total charges:137
created:12/4/2001 last use:5/5/2007 priv granted:12/2/2009
priv expired:12/2/2011
cat1:FEMALE cat2: cat3:
cat4: cat5:
Profile:PUBLIC user access:PUBLIC environment:PUBLIC
dept:
find each field label
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
possible output
Ultimately need to import into Excel, so \n before each label would be OK, and some unique character before each label to strip all lines which do not begin with a label.
The astericked line will be ignored, the address# lines will be used.
Thank you for your consideration.
The astericked line will be ignored, the address# lines will be used.
Thank you for your consideration.
-
ben_josephs
- Posts: 2464
- Joined: Sun Mar 02, 2003 9:22 pm
Using # as the special character:
Use "Posix" regular expression syntax:
Search | Find... (<F5>):
Use "Posix" regular expression syntax:
Search | Replace... (<F8>):Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
Then mark all lines that begin with a # that's followed by anything other than a *:Find what: [^:]+:[^ ]* *
Replace with: \n#\0
[X] Regular expression
Replace All
Search | Find... (<F5>):
Find what: ^#[^*]
[X] Regular expression
Mark All