UTF-8 to String in Java -


i having little problem utf-8 charset. have utf-8 encoded file witch want load , analyze. using bufferedreader read file line line.

bufferedreader buffreader = new bufferedreader(new inputstreamreader (new fileinputstream(file),"utf-8")); 

my problem normals string methods (trim() , equals() example) in java not suitable use line readed bufferreader in every iteration of loop created read content of bufferedreader. example in encoded file have "< menu >" witch want program treat is, seen "?? < m e n u >" mixed others strange characters. want know if there way remove charset codifications , keep plain text can use methods of string class without complications. thank you

if jdk not getting old (1.5) can :

locale frlocale = new locale("fr", "fr"); scanner scanner = new scanner(new fileinputstream(file), "utf-8"); scanner.uselocale(frlocale);  (; scanner.hasnextline(); numline++) {  line = scanner.nextline(); } 

the scanner can use delimiters other whitespace. example reads several items in string:

         string input = "1 fish 2 fish red fish blue fish";          scanner s = new scanner(input).usedelimiter("\\s*fish\\s*");          system.out.println(s.nextint());          system.out.println(s.nextint());          system.out.println(s.next());          system.out.println(s.next());          s.close();   prints following output:           1          2          red          blue  

see doc scanner here


Comments