regex to find unmatched quote marks

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
jazzastronomer
Posts: 34
Joined: Sat Nov 03, 2007 3:04 am

regex to find unmatched quote marks

Post by jazzastronomer »

Just as an opening bracket needs to be paired with a closing bracket "(" with a ")".

I need similar functionality with quote marks.

For each line of:

question <tab> answer
question <tab> answer
question <tab> answer

For each line highlight any quote mark not part of an open/close pair.

I am dealing with a text file that is intolerant of unclosed quote marks. For each line any opening quote mark must have matching closing quote mark.

I am presently just highlighting all quote mark types and manually checking them.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

If you search using this:

Code: Select all

^[^"\n]*\K"(?=(([^"\n]*"){2})*[^"\n]*$)
TextPad will find the first double quote on the next line containing unmatched double quotes.

If you search using this:

Code: Select all

^[^"\n]*(("[^"\n]*){2})*\K"(?=[^"\n]*$)
TextPad will find the last double quote on the next line containing unmatched double quotes.

With either of them, if you Mark All, TextPad will mark all lines containing unmatched double quotes.
jazzastronomer
Posts: 34
Joined: Sat Nov 03, 2007 3:04 am

thanks

Post by jazzastronomer »

ben_josephs you are a prolific poster!

Many thanks, it works well.

I put the regex into an online tester and am stepping through it's operation trying to figure out how it works.

I will try replacing " with ' and running that regex to pick up more errors.


I'm presently just saving my Textpad regex in a text file. Is there a better way to save and document these regex?
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Re: thanks

Post by AmigoJack »

an online tester
Don't be afraid to actually link the website you're using so others who read this get to know of it, too.
I will try replacing " with ' and running that regex to pick up more errors.
Keep in mind that only one of all possible characters in question must be used in the whole line - if you also want to discover single apostrophs and double quotation marks mixed in one line that don't amount to a multiple of 2 then things will get complicated.
Is there a better way to save and document these regex?
Text files need discipline: if you want one expression on a line then strictly stick to it and don't indent it, as then nobody knows if leading/trailing whitespaces are part of it or not. That being said: do not use text file formats that come with their own syntax, requiring you to escape parts of your regex (i.e. XML, RTF...).
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

Code: Select all

^(?:(?:[^"\n]*"){2})*[^"\n]*\K"(?=[^"\n]*$)
would be my solution.

^ for the line start anchor
(?: ... )* for any number of occurences of
(?: ...){2} two occurrences of
[^"\n]*" which is any number of non-quote/non-linebreak chars followed by a quote


all this followed by
[^"\n]* any number of non-quote/non-linebreak char
\K to exclude everything we found so far from the match

" the quote we want to find
(?=[^"\n]*$) any number of non-quote/non-linebreak chars till the line end as a lookahead


Thus, exactly the unmatched quote will be selected.
jazzastronomer
Posts: 34
Joined: Sat Nov 03, 2007 3:04 am

thanks all

Post by jazzastronomer »

@AmigoJack
That was the very website I used! Once I sort out how the regex works, I will try modifying it, as a learning experience. Thanks for the comments.

@MudGuard
Another example to learn from! Many thanks.

I am getting re-acquainted with Textpad after leaving (years ago) because of regex and unicode weirdness. I will have more questions for you actual programmers later.

many thanks
Post Reply