Origin of number sequence generation in replacement format strings?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Shepazu
Posts: 2
Joined: Wed Nov 27, 2024 8:00 pm

Origin of number sequence generation in replacement format strings?

Post by Shepazu »

I've always been a fan of TextPad's elegant extension of RegEx in generating sequential numbers, using the \i expression syntax.

I found more details elsewhere in this forum: viewtopic.php?p=43052#p43052):

Code: Select all

\i Replace with numbers starting from 1, incrementing by 1.
\i{10} Replace with numbers starting from 10, incrementing by 1.
\i{0,10} Replace with numbers starting from 0, incrementing by 10.
\i{100,-10} Replace with numbers starting from 100, decrementing by -10.
\i{1,1,3,0} Replace with numbers starting from 1, incrementing by 1. The numbers will be right justified in a width of 3 characters, zero filled.


I've seen clumsy scripts that do something like this for other editors, but I've never seen another editor that supported this natively.

My question is: Where did this come from? Was it based on some previous implementation, or did Helios just invent this?

It doesn't seem to be in the IEEE RegEx standard, or in Perl Compatible Regular Expressions (PCRE).

Has it every been formalized anywhere as a standard? Or is there an equivalent syntax that does the same thing elsewhere?

I'm asking because I'm working on a new standard specification for some accessibility features for people with disabilities, and it turns out I need something like this for expressing the incrementation of indexes.

I'm also just curious where it came from, and why it isn't more widely adopted. Would Helios be open to seeing this standardized?
User avatar
bbadmin
Site Admin
Posts: 948
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Re: Origin of number sequence generation in replacement format strings?

Post by bbadmin »

It's good to hear that you like that feature to generate sequence numbers in TextPad. Although that replacement expression is our creation, we can't prevent anyone from re-implementing it in another context, but it would be nice to get the credit if they did.
User avatar
AmigoJack
Posts: 568
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Re: Origin of number sequence generation in replacement format strings?

Post by AmigoJack »

There are far more replacement expressions which are all explained in the help under Reference Information > Replacement Format Strings:
Replacement Format Strings
Replacement format strings are used to substitute text in conjunction with Regular Expressions, when using the Replace command.

These format strings treat all characters as literals except for '$', '\', '(', ')', '?', and ':'. To output any of those characters as a literal, precede it with a '\'.

Grouping
The characters '(' and ')' perform lexical grouping, so use \( and \) if you want a to output literal parenthesis.

Conditionals
The character '?' begins a conditional expression, the general form is:

Code: Select all

?Ntrue-expression:false-expression
where N is decimal digit.

If sub-expression N was matched, then true-expression is evaluated and sent to output, otherwise false-expression is evaluated and sent to output.

You will normally need to surround a conditional-expression with parenthesis in order to prevent ambiguities.

For example, the format string "(?1foo:bar)" will replace each match found with "foo" if the sub-expression $1 was matched, and with "bar" otherwise.

For sub-expressions with an index greater than 9, or for access to named sub-expressions use:

Code: Select all

?{INDEX}true-expression:false-expression
or

Code: Select all

?{NAME}true-expression:false-expression
Placeholder Sequences
Placeholder sequences specify that some part of what matched the regular expression should be sent to output as follows:
  • $&
    Outputs what matched the whole expression.
  • $MATCH
    As $&
  • ${^MATCH}
    As $&
  • $`
    Outputs the text between the start of the text and the start of the current match.
  • $PREMATCH
    Same as $`
  • ${^PREMATCH}
    Same as $`
  • $'
    Outputs all the text following the end of the current match.
  • $POSTMATCH
    As $'
  • ${^POSTMATCH}
    As $'
  • $+
    Outputs what matched the last marked sub-expression in the regular expression.
  • $LAST_PAREN_MATCH
    As $+
  • $LAST_SUBMATCH_RESULT
    Outputs what matched the last sub-expression to be actually matched.
  • $^N
    As $LAST_SUBMATCH_RESULT
  • $$
    Outputs a literal '$'
  • $n
    Outputs what matched the n'th sub-expression.
  • ${n}
    Outputs what matched the n'th sub-expression.
  • $+{NAME}
    Outputs whatever matched the sub-expression named "NAME".
Any $-placeholder sequence not listed above, results in '$' being treated as a literal.

Escape Sequences
An escape character followed by any character x, outputs that character unless x is one of the escape sequences shown below.
  • \a
    Outputs the bell character: '\a'.
  • \e
    Outputs the ANSI escape character (code point 27).
  • \f
    Outputs a form feed character: '\f'
  • \i or
    \i{n,i,w,f}
    Outputs a sequence number, starting from n, incremented by i, in a field width of w, with leading spaces filled by the character f. Note that \i is equivalent to \i{1,1,0, }.
  • \n
    Outputs the document's line ending character sequence. (Not just \n)
  • \p
    Outputs the contents of the clipboard.
  • \r
    Outputs the document's line ending character sequence. (Not just \r)
  • \s{n,c}
    Replaces each character in sub-expression $n with character c. Use this where you need to replace a variable number of characters with the same number of character c.
  • \t
    Outputs a tab character: '\t'.
  • \v
    Outputs a vertical tab character: '\v'.
  • \xDD
    Outputs the character whose hexadecimal code point is 0xDD
  • \x{DDDD}
    Outputs the character whose hexadecimal code point is 0xDDDDD
  • \cX
    Outputs the ANSI escape sequence "escape-X".
  • \D
    If D is a decimal digit in the range 1-9, then outputs the text that matched sub-expression D.
  • \l
    Causes the next character from the format expression to be output in lower case.
  • \u
    Causes the next character from the format expression to be output in upper case.
  • \L
    Causes all subsequent characters to be output in lower case, until a \E is found.
  • \U
    Causes all subsequent characters to be output in upper case, until a \E is found.
  • \E
    Terminates a \L or \U sequence.
The help file page for TextPad 8.4.0 says at the bottom:
Copyright © 1998-2010 John Maddock. Parts Copyright © 2013 Helios Software Solutions Ltd
Most likely because TextPad uses (used?) Boost/Regex++ as engine.
Shepazu
Posts: 2
Joined: Wed Nov 27, 2024 8:00 pm

Re: Origin of number sequence generation in replacement format strings?

Post by Shepazu »

Thanks, AmigoJack, I appreciate the detailed answer.

I'm specifically interested in who came up with the `\i` syntax. I tracked down John Maddock (who's now a luthier!) and he thinks that was original to TextPad, not a feature of RegEx++.
Post Reply