I need to extract email addresses from a large messy txt file. The addresses are between various tags, but there does not seem to be much consistency and there are no line brakes. Here is an example:
What I have:
---------------------------------------------------
aage.something@clp.noKommunal LandspensjonskasseInsurance Company<_tags/>Asomething@Summitsoemthing.comAngeloSomething Biz MgmtMulti-Dweller OfficeUSARoswellGA(622) 355-3116<_tags/>aa_se@macalusterinstitution.eduMacalester CollegeUniversity<_tags/>cd@bidart-reiman.com
What I need (comma or tab delimited):
---------------------------------------------------
aage.something@clp.no, Asomething@Summitsoemthing.com, aa_se@macalusterinstitution.edu, cd@bidart-reiman.com
Any suggestions for a regex that would get me there (or close to)?
Thanks.
Extract Email addresses
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
Here is an approach based on the simple sample file you supplied.
Take two steps:
1. Insert a line break "\n" at the end of each tag so that each email will start at the beginning of the lines.
Search for: <_tags/>
Replace with: \0\n
2. Extract the email address from the front.
Search for: ^(.*@.*\.[a-z]{1,3}[^[:upper:]]).*
Replace with: \1
You can then replace each line break with a comma space or a tab.
Use the following settings:
-----------------------------------------
[X] Match case
[X] Regular expression
Replace All
-----------------------------------------
Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
-----------------------------------------
Email addresses can be more complex and there are many more complex Search strings to be considered, but, as noted above, if they meet the format shown in you example, this will probably be OK for you.
Take two steps:
1. Insert a line break "\n" at the end of each tag so that each email will start at the beginning of the lines.
Search for: <_tags/>
Replace with: \0\n
2. Extract the email address from the front.
Search for: ^(.*@.*\.[a-z]{1,3}[^[:upper:]]).*
Replace with: \1
You can then replace each line break with a comma space or a tab.
Use the following settings:
-----------------------------------------
[X] Match case
[X] Regular expression
Replace All
-----------------------------------------
Configure | Preferences | Editor
[X] Use POSIX regular expression syntax
-----------------------------------------
Email addresses can be more complex and there are many more complex Search strings to be considered, but, as noted above, if they meet the format shown in you example, this will probably be OK for you.
Hope this was helpful.............good luck,
Bob
Bob
- Bob Hansen
- Posts: 1516
- Joined: Sun Mar 02, 2003 8:15 pm
- Location: Salem, NH
- Contact:
TextPad HELP Using Regular Expressions, for help with their unique strings.
Mastering Regular Expressions, Jeffrey Friedl, O'Reilly Publications. (2nd or 3rd editions).
Regular Expressions Cookbook, Jan Goyvaertes and Steven Levithan, O'Reilly Publications (Includes a tutorial).
Mastering Regular Expressions, Jeffrey Friedl, O'Reilly Publications. (2nd or 3rd editions).
Regular Expressions Cookbook, Jan Goyvaertes and Steven Levithan, O'Reilly Publications (Includes a tutorial).
Hope this was helpful.............good luck,
Bob
Bob