Capture first word on line: using Unicode

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Capture first word on line: using Unicode

Post by Mike Olds »

Hello,

Windows 7, 64bit TP 8.50

I am working on a .txt file of a dictionary which I am converting to .htm and would like to have the first word of each main entry put into boldface.

I am using the TP Search and Replace tool, not Wild-Edit.

Main entries begin a new line.

The problem is that this tool thinks that Unicode characters are word breaks.

Any solutions?
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

What regex are you using to match words?
Please provide examples of words that this regex doesn't recognise as single words.
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Re: Capture first word on line: using Unicode

Post by MudGuard »

Mike Olds wrote:The problem is that this tool thinks that Unicode characters are word breaks.
So every character is treated as word break?
As every character is a Unicode character ...
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello,

Sorry for the delay in responding. I do not get notices although I have it checked.

Sample lines:

<p>:Ak�ca (adjective) [a + k�ca] pure, flawless, clear D II 244; Snp 476; Ja V 203.</p>

<p>:Paric�reti [causative of paricarati]</p>

I think I need to ask a different question as the so called regex I was using appears to only be capturing the first letter.

I was using <p>(:\w)

All the relevant lines begin with <p>:

What I meant by Unicode characters I realize now was too vague. I am speaking about the characters with diacriticals, including compound characters. And since my regex was unsuitable, there might not be any more of a problem than my ignorance.

So can you give me a regex that will capture the first word of a line.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Try
<p>(:\w+)

Let us know whether that works.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Ben,

Thank you for your quick response. Yes that seems to work.

I had just also found another which appears to work:

^<p>(:\S+)
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

That matches sequences of any characters that aren't white space. Is that what you want?
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Ben,

Yes, sort of. There are some complications. But that seems to get what I need. (It captures <sup>1</sup> which is good.

Thanks again for this help and for tolerating my ignorance. I'm getting way too old for this sort of thing, my mind just can't keep up.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 532
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Mike Olds wrote:Sorry for the delay in responding. I do not get notices although I have it checked.
Just visit this board regularly (i.e. every weekend) and look out for the "unread post" icon (Image) to easily spot activity that is yet unknown to you.

If you need an overview of all your own posts (to see topics you've created or participated) just use the Find all posts by Mike Olds link in your public profile - you could bookmark it.
Post Reply