find each field label

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
mdh
Posts: 3
Joined: Fri Apr 27, 2012 12:42 pm

find each field label

Post by mdh »

Have a large text file with thousands of entries like the following. Need to identify each field label. Would like one reg. exp. to find all. Each label consists of a term followed by a colon, or two terms separated by a space followed by a colon, or two terms separated by a comma and space followed by a colon.

Note multiple spaces exist at the beginning of lines and between label:value pairs, and following label:value pairs

Lastname, Firstname
id:44486000002443 library:ABC
*Address Information-- Mailing Address:1
address1:
Daytime phone:nnn-nnn-nnnn
Line:Green
Street:P.O. Box nnn
City, state:City, UU.
Zip:nnnnn
Email:account@domain.com
Phone:nnn-nnn-nnnn
address2:
none

address3:
none

*Extended Information--
none
Profile:PUBLIC status:OK

bills:0 charges:0 holds:0
number of history charges:0
total bills:6 total charges:137
created:12/4/2001 last use:5/5/2007 priv granted:12/2/2009
priv expired:12/2/2011
cat1:FEMALE cat2: cat3:
cat4: cat5:
Profile:PUBLIC user access:PUBLIC environment:PUBLIC
dept:
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

You haven't provided enough information.

What do you want to do with the labels? Please give examples of the text before and after the change you require. Enclose them in [​code]...[/code] tags.

What is the label in the line
*Address Information-- Mailing Address:1
?
mdh
Posts: 3
Joined: Fri Apr 27, 2012 12:42 pm

possible output

Post by mdh »

Ultimately need to import into Excel, so \n before each label would be OK, and some unique character before each label to strip all lines which do not begin with a label.

The astericked line will be ignored, the address# lines will be used.

Thank you for your consideration.
ben_josephs
Posts: 2464
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Using # as the special character:

Use "Posix" regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Search | Replace... (<F8>):
Find what: [^:]+:[^ ]* *
Replace with: \n#\0

[X] Regular expression

Replace All
Then mark all lines that begin with a # that's followed by anything other than a *:

Search | Find... (<F5>):
Find what: ^#[^*]

[X] Regular expression

Mark All
mdh
Posts: 3
Joined: Fri Apr 27, 2012 12:42 pm

Terrific!

Post by mdh »

Thank you, very much.
Post Reply