Page 1 of 1

How to find every nth word?

Posted: Wed Oct 16, 2013 12:48 pm
by Kelly
Could anybody please tell me how to write a regular expression to find every nth word?

Thank you very kindly,

Kelly

Posted: Wed Oct 16, 2013 9:22 pm
by ben_josephs
You can search for sequences of, say, 5 words with some variation of
(?:\w+\W+){4}(\w+)\W*

You can create a list of every 5th word using a replacement expression similar to
$1\n

If you don't want to find sequences that straddle newlines, try something like
(?:\w+[^\n\w]+){4}(\w+)[^\n\w]*

Posted: Thu Oct 17, 2013 1:31 am
by Kelly
Thank you so much Ben!

I'm just starting to begin to appreciate the power of regex - pretty amazing.

Kelly

Posted: Thu Oct 17, 2013 9:00 am
by ben_josephs
Here's an improvement. With this regex the replacement will remove the residue of words after all the groups of 5 have been matched:
(?:(?:\w+\W+){4}(\w+)\W*|.+)
(This relies on the fact that in TextPad's (and most other) regex recognisers, alternation (...|...) is not greedy. The alternatives are tried one by one from the left. Once one of them has matched all subsequent ones are ignored, even if they might have found a longer match.)