Finding non-code page characters

melvers · Post by **melvers** » Thu Oct 29, 2009 8:15 pm

I have a couple of files that I get the standard character not on the cod page 1252 ... will be converted to the system characters.

I am fine with that. However, I would like to find the characters that were replaced and put some other character(s) there that can be displayed.

Does anyone know how to find the characters that got replaced?

RLRANDALLX · Post by **RLRANDALLX** » Mon May 17, 2010 12:14 am

I have the very same problem. I would like to chnage '?' to a less common char. such as '@' or '~' or '_' or '|'. '?' is way too common!
-rlrandallx

RLRANDALLX · Post by **RLRANDALLX** » Mon May 17, 2010 5:53 am

Here's a small Perl program to read a file and determine where the "bad" characters are when TextPad says it converted them to '?'
It gives the line #, offset, code value of the char., and tries to print the char. plus one other char. have fun! - rlrandallx

Code: Select all

open(FILE, "<:encoding(UTF-8)", "file.txt") or die $!;
binmode(STDOUT, ":utf8");
my @lines=<FILE>;
my $i=0;
my $j=0;
for ($i=0; $i<=$#lines; $i++)
  {
  my $l=length($lines[$i]);
  for ($j=0;$j<$l;$j++)
     {
      my $c=substr($lines[$i],$j,1);
      my $v=ord($c);
      if ( $v > 127 )     # must be in the extended ASCII set or Unicode
         {
         my $ec=substr($lines[$i],$j,2);
         print "l:$i c:$j v:$v $ec\n";
         }
     }
  }

Community