Page 1 of 1
Capture first word on line: using Unicode
Posted: Thu Jan 07, 2021 2:46 pm
by Mike Olds
Hello,
Windows 7, 64bit TP 8.50
I am working on a .txt file of a dictionary which I am converting to .htm and would like to have the first word of each main entry put into boldface.
I am using the TP Search and Replace tool, not Wild-Edit.
Main entries begin a new line.
The problem is that this tool thinks that Unicode characters are word breaks.
Any solutions?
Posted: Thu Jan 07, 2021 8:16 pm
by ben_josephs
What regex are you using to match words?
Please provide examples of words that this regex doesn't recognise as single words.
Re: Capture first word on line: using Unicode
Posted: Thu Jan 07, 2021 11:08 pm
by MudGuard
Mike Olds wrote:The problem is that this tool thinks that Unicode characters are word breaks.
So every character is treated as word break?
As every character is a Unicode character ...
Posted: Fri Jan 08, 2021 11:21 pm
by Mike Olds
Hello,
Sorry for the delay in responding. I do not get notices although I have it checked.
Sample lines:
<p>:Ak�ca (adjective) [a + k�ca] pure, flawless, clear D II 244; Snp 476; Ja V 203.</p>
<p>:Paric�reti [causative of paricarati]</p>
I think I need to ask a different question as the so called regex I was using appears to only be capturing the first letter.
I was using <p>(:\w)
All the relevant lines begin with <p>:
What I meant by Unicode characters I realize now was too vague. I am speaking about the characters with diacriticals, including compound characters. And since my regex was unsuitable, there might not be any more of a problem than my ignorance.
So can you give me a regex that will capture the first word of a line.
Posted: Fri Jan 08, 2021 11:28 pm
by ben_josephs
Try
<p>(:\w+)
Let us know whether that works.
Posted: Fri Jan 08, 2021 11:36 pm
by Mike Olds
Hello Ben,
Thank you for your quick response. Yes that seems to work.
I had just also found another which appears to work:
^<p>(:\S+)
Posted: Fri Jan 08, 2021 11:57 pm
by ben_josephs
That matches sequences of any characters that aren't white space. Is that what you want?
Posted: Sat Jan 09, 2021 12:34 am
by Mike Olds
Hello Ben,
Yes, sort of. There are some complications. But that seems to get what I need. (It captures <sup>1</sup> which is good.
Thanks again for this help and for tolerating my ignorance. I'm getting way too old for this sort of thing, my mind just can't keep up.
Posted: Sun Jan 10, 2021 3:21 pm
by AmigoJack
Mike Olds wrote:Sorry for the delay in responding. I do not get notices although I have it checked.
Just visit this board regularly (i.e. every weekend) and look out for the "unread post" icon (

) to easily spot activity that is yet unknown to you.
If you need an overview of all your own posts (to see topics you've created or participated) just use the
Find all posts by Mike Olds link in your public profile - you could bookmark it.