I have a couple of files that I get the standard character not on the cod page 1252 ... will be converted to the system characters.
I am fine with that. However, I would like to find the characters that were replaced and put some other character(s) there that can be displayed.
Does anyone know how to find the characters that got replaced?
Finding non-code page characters
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
- Posts: 3
- Joined: Sun Jun 24, 2007 6:03 am
Finding non-code page characters
I have the very same problem. I would like to chnage '?' to a less common char. such as '@' or '~' or '_' or '|'. '?' is way too common!
-rlrandallx
-rlrandallx
-
- Posts: 3
- Joined: Sun Jun 24, 2007 6:03 am
Finding 'Extended ASCII' chars. before they change to '?'
Here's a small Perl program to read a file and determine where the "bad" characters are when TextPad says it converted them to '?'
It gives the line #, offset, code value of the char., and tries to print the char. plus one other char. have fun! - rlrandallx
It gives the line #, offset, code value of the char., and tries to print the char. plus one other char. have fun! - rlrandallx
Code: Select all
open(FILE, "<:encoding(UTF-8)", "file.txt") or die $!;
binmode(STDOUT, ":utf8");
my @lines=<FILE>;
my $i=0;
my $j=0;
for ($i=0; $i<=$#lines; $i++)
{
my $l=length($lines[$i]);
for ($j=0;$j<$l;$j++)
{
my $c=substr($lines[$i],$j,1);
my $v=ord($c);
if ( $v > 127 ) # must be in the extended ASCII set or Unicode
{
my $ec=substr($lines[$i],$j,2);
print "l:$i c:$j v:$v $ec\n";
}
}
}