Page 1 of 1

Search in files - wrong encoding for German Umlauts

Posted: Mon Aug 26, 2013 7:52 pm
by haeb
Hi all,

there seems to be a bug in search in files.

If searching for an word which has German Umlauts, Textpad does not find the word when the charcter set of the file is set to utf-8. There is no difference whether the file is saved with or without BOM.

If the files charcter set is to ANSI TP does find the words.

If searching another word, nearby the umlaut-word in the utf-8 file, TP finds the the other word and displays the umlaut-word in a wrong encoding.

If somebody want to see my search results, i can send screenshot.

Win7 x64 TP 7.0.9 German

Regards
Horst

Posted: Mon Aug 26, 2013 7:54 pm
by haeb
Hi all,

ADDITION

I meant "search in files" (STRG+F5) not search in A file

Regards

the same problem with syntax definition

Posted: Mon Sep 02, 2013 9:48 am
by criss
Hi,

keywords in syntax definition with Umlaute are not highlighted in UTF8 files when they contain german Umlaute (like üäöß).

Posted: Thu Jan 16, 2014 2:09 pm
by haeb
Hi all,

this is still a bug in 7.1.0

You can't "search in files" for a string which contains a umlaut

Horst

Posted: Tue May 06, 2014 12:58 pm
by haeb
Hi all,

... in 7.2.0 still a bug...

You can't "search in files" for a string which contains a umlaut

Horst

Posted: Tue May 06, 2014 11:31 pm
by kengrubb
Works fine with Find, Find Next, and Find Previous. Does not work with Find In Files.

I also found the problem searching for € (ASCII Hex 80)

Problem did not occur searching for  (ASCII Hex 7F)

Win 7 64-bit
TP 7.2.0 64-bit

Posted: Thu Jul 10, 2014 1:55 pm
by haeb
In 7.3.0 it does NOT work as expected!

Sorry for this mail (which i have corrected now)!

look to my new mail of today.

Posted: Thu Sep 11, 2014 5:06 pm
by haeb
New tests...

The function "search in files" do NOT work in any file type, which contains the umlauts but for ANSI files.

It is somehow confusing, but i try to explain what works and what doesn't.

I have tested a folder containing 7 files with all availabe file types. Every files has just one line, which explains the file.
ANSI
utf-8 without BOM
utf-8 with BOM
Unicode
Unicode/Big Endian
Unicode/Big Endian without containing umlauts
Unicode without containing umlauts

If searching for the word "test", next to the umlaut word in the file, TP finds the the other word and displays the whole line. The umlaut word is shown in a wrong encoding:
http://haeberlen.org/privat/tp/textpad_ ... auts_e.PNG

Searching for the word "test" should find 7 files, but it does find only 4 occurrences. Unicode or Unicode/Big Endian or files which are containing umlauts are not in the list. But if the Unicode file do not contain umlauts it will be found by TP.

Confusing - Test it yourself:
http://haeberlen.org/privat/tp/testfiles.zip

Horst