language agnostic - How to distinguish UTF-8 and ASCII files? -


how distinguish utf-8 (no bom) , ascii files?

if file contains bytes top bit set, not ascii.

so if possibilities ascii or utf-8, it's utf-8.

if file contains bytes top bit clear, it's meaningless distinguish whether it's ascii or utf-8, since represents same series of characters either way. can call ascii.

of course doesn't distinguish utf-8 iso latin or cp1252, , neither confirm so-called utf-8 valid.


Comments