Garbage in, barbage out. I think opening things and reading as utf-8 is good. For added resilience, one has to also espace unicode encoding errors too, and replace them with anything readable. I.e. one may choose to read in binary, and then decode as utf-8 with any errors replaced, ie.:
Garbage in, barbage out. I think opening things and reading as utf-8 is good. For added resilience, one has to also espace unicode encoding errors too, and replace them with anything readable. I.e. one may choose to read in binary, and then decode as utf-8 with any errors replaced, ie.:
b'somebinary string' .decode( 'utf-8' , 'replace')