Page 1 of 1

regex to find unmatched quote marks

Posted: Thu Jun 24, 2021 7:04 pm
by jazzastronomer
Just as an opening bracket needs to be paired with a closing bracket "(" with a ")".

I need similar functionality with quote marks.

For each line of:

question <tab> answer
question <tab> answer
question <tab> answer

For each line highlight any quote mark not part of an open/close pair.

I am dealing with a text file that is intolerant of unclosed quote marks. For each line any opening quote mark must have matching closing quote mark.

I am presently just highlighting all quote mark types and manually checking them.

Posted: Thu Jun 24, 2021 10:06 pm
by ben_josephs
If you search using this:

Code: Select all

^[^"\n]*\K"(?=(([^"\n]*"){2})*[^"\n]*$)
TextPad will find the first double quote on the next line containing unmatched double quotes.

If you search using this:

Code: Select all

^[^"\n]*(("[^"\n]*){2})*\K"(?=[^"\n]*$)
TextPad will find the last double quote on the next line containing unmatched double quotes.

With either of them, if you Mark All, TextPad will mark all lines containing unmatched double quotes.

thanks

Posted: Fri Jun 25, 2021 4:55 pm
by jazzastronomer
ben_josephs you are a prolific poster!

Many thanks, it works well.

I put the regex into an online tester and am stepping through it's operation trying to figure out how it works.

I will try replacing " with ' and running that regex to pick up more errors.


I'm presently just saving my Textpad regex in a text file. Is there a better way to save and document these regex?

Re: thanks

Posted: Mon Jun 28, 2021 1:01 pm
by AmigoJack
an online tester
Don't be afraid to actually link the website you're using so others who read this get to know of it, too.
I will try replacing " with ' and running that regex to pick up more errors.
Keep in mind that only one of all possible characters in question must be used in the whole line - if you also want to discover single apostrophs and double quotation marks mixed in one line that don't amount to a multiple of 2 then things will get complicated.
Is there a better way to save and document these regex?
Text files need discipline: if you want one expression on a line then strictly stick to it and don't indent it, as then nobody knows if leading/trailing whitespaces are part of it or not. That being said: do not use text file formats that come with their own syntax, requiring you to escape parts of your regex (i.e. XML, RTF...).

Posted: Mon Jun 28, 2021 2:12 pm
by MudGuard

Code: Select all

^(?:(?:[^"\n]*"){2})*[^"\n]*\K"(?=[^"\n]*$)
would be my solution.

^ for the line start anchor
(?: ... )* for any number of occurences of
(?: ...){2} two occurrences of
[^"\n]*" which is any number of non-quote/non-linebreak chars followed by a quote


all this followed by
[^"\n]* any number of non-quote/non-linebreak char
\K to exclude everything we found so far from the match

" the quote we want to find
(?=[^"\n]*$) any number of non-quote/non-linebreak chars till the line end as a lookahead


Thus, exactly the unmatched quote will be selected.

thanks all

Posted: Tue Jun 29, 2021 6:27 pm
by jazzastronomer
@AmigoJack
That was the very website I used! Once I sort out how the regex works, I will try modifying it, as a learning experience. Thanks for the comments.

@MudGuard
Another example to learn from! Many thanks.

I am getting re-acquainted with Textpad after leaving (years ago) because of regex and unicode weirdness. I will have more questions for you actual programmers later.

many thanks