Multi Line Regex

Ideas for new features

Moderators: AmigoJack, helios, bbadmin, Bob Hansen, MudGuard

Post Reply
RonStecher
Posts: 3
Joined: Mon Mar 28, 2011 1:13 am
Location: Australian
Contact:

Multi Line Regex

Post by RonStecher »

Textpad claims to support mulitline regular expressions.
But it doesn't really. Not when there's an undetermined number of lines.
A simple example is that you might want to remove long comments from your javascript. Comments like this:

/* Derive random numbers
If the array has 438 elements, we need to generate random numbers
from 0 to 437. The Math.random method does just that. */

First we would search for /* with the asterisk escaped: /\*
Then one or more characters that are not asterisk: [^\*]{1,}
Then finally asterisk slash: \*/
The full expression is: /\*[^\*]{1,}\*/
In UltraEdit this exact expression works instantly using UltraEdit's Perl Regex syntax.
But for years I've been trying to find a way to make it work in Textpad, but it just doesn't.
Almost exactly the same syntax works in Microsoft Word:
/\*[!\*]{1,}\*/ (the only difference is that ! replaces ^ as the negator)

It really should work in TextEdit.
A few Google searches show that there's a few frustrated Textpad users who wish that it would do what I'm asking.
Please enhance TextEdit so that mulitiline regexes are possible.
To be quite precise, it's not an enhancement request. It's a bug report.
If Textpad claims to be able to do mulitline regex, and it can't then that's a bug. Please fix.
ben_josephs
Posts: 2456
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The deficiency of TextPad's old regex engine has been remarked on many times. As you point out, its handling of newlines is very weak. In particular, it doesn't allow a \n to be quantified with * or + or {m,n}, so it's incapable of matching an arbitrary number of newlines. Helios's own WildEdit (http://www.textpad.com/products/wildedit/) doesn't suffer from these deficiencies.

BTW, your regex is incorrect. It doesn't match, for example
/** a common style of comment **/
This is better:
/\*([^*]|\*+[^/*])*\*+/
(There are other ways to do this.)
(See
Friedl, Jeffrey E F: Mastering Regular Expressions, 3rd ed.
O'Reilly, 2006. ISBN: 0-596-52812-4
http://regex.info/
)
RonStecher
Posts: 3
Joined: Mon Mar 28, 2011 1:13 am
Location: Australian
Contact:

Multi Line Regex

Post by RonStecher »

Ben,
I was aware that my expression was flawed. It won't work if there are instances of / or * within the body of the comment. At this point in time I had not yet determined a way to overcome this flaw.
I compliment you for your ingenious solution. If you came up with that yourself, then I'm really impressed. If your solution is essentially straight out of the book, then I still would give you praise for having studied the book and mastered the subtle and challenging concepts therein. Regular Expressions are possibly an area that many in the software development community are a somewhat afraid to embrace, so, full credit to you.
In the light of your ingenuity, I'd offer that your solution can be simplified to:
/\*([^*]|\*)*\*/

The book sounds good. Thanks for the tip.

I don't want WildEdit because it seems to be a group file editor, not a text editor or code editor.

What I want is for TextEdit to deliver the multiline regex capability that it claims to have. In other words I think Helios should fix the bug.
ben_josephs
Posts: 2456
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Matching C-style comments is a standard problem, so I checked my solution against the standard reference, in which Friedl offers an extended discussion leading to a yet more complex but faster solution.

The parenthesised subexpression in your suggestion matches a character that is either not a star or a star, that is, any character. So the whole is equivalent to
/\*.*\*/
which fails on
/* a comment */ not_a_comment /* another comment */
User avatar
jeffy
Posts: 323
Joined: Mon Mar 03, 2003 9:04 am
Location: Philadelphia

Post by jeffy »

Version 7 now uses Perl regex, so this is now a non-issue.
Post Reply