Page 1 of 1

Can someone explain this

Posted: Fri Feb 05, 2010 4:42 pm
by louiscar
Hi,

I discovered something odd here. I note that the following expression will find multiple ending square brackets and I can't seem to stop this with a ? after '\]?'

<!\[CDATA\[(.+)\]>

So for instance this will find:

<![CDATA[Eastborne Marina]>
<![CDATA[Eastborne Marina]]>
<![CDATA[Eastborne Marina]]]]]>

Setting is Posix btw.

Not sure I understand why this is happening as it wouldn't with other chars

Can someone explain why \] appears to be a special case?

Posted: Fri Feb 05, 2010 7:53 pm
by ben_josephs
It isn't a special case, and \] doesn't match more than one bracket.

The repetition operators are greedy: they match as much as possible (while not preventing their containing regular expressions from matching). So the subexpression .+ matches as many characters as possible, so long as there is a ]> following. That is, the .+ in your regex matches everything between the [ and the last ].

Try
<!\[CDATA\[([^]]+)\]>