python - Managing Mac OS created filenames with non ASCII characters in windows environments? -


i deal large collection of unknown files, , have been been learning python me filter / sort , otherwise wrangle these files.

a collection looking @ has large number of resource forks, , wrote little script find them, , delete them (next step find them, , move them, thats day).

i found in collection there number of files have non ascii characters in file name, , seems tripping os.delete function.

example file name: ._spec com report 395 (n.b. 3 has small dot underneath it, can't find example, or figure out how show hex of filename...)

i log filenames, log records file: ._spec com report 3?95

the error windowserror, can't find file (the string passing not file known windows os.) put in try clause allow me work rounf it, deal properly.

i tried using unicode switch in walk option `os.walk(u'.') per post: handling ascii char in python string (top answer) , see following error:

traceback (most recent call last):  file "<stdin>", line 3, in <module>  file "c:\python27\lib\encodings\cp850.py", line 12, in encode     return codecs.charmap_encode(input,errors,encoding_map) unicodeencodeerror: 'charmap' codec can't encode character u'\uf022' in position 20: character maps <undefined> 

so guessing answer lies how filename parsed, , wondering if might able point in me in right direction...

code:

import os import sys  rootdir = "c:\target dir walk" destkeep = "keepers.txt" destdelete = "deleted.txt"  matchingtext = "._" files_removed = 1 folder, subs, files in os.walk(rootdir):       outfilekeep = open(destkeep,"a")     outfiledelete = open(destdelete,"a")     filename in files:         matchscore = filename.find(matchingtext)         src = os.path.join(folder, filename)         srcnewline = src + ", " + str(filename) + "\n"         if matchscore == -1:         outfilekeep.writelines(srcnewline)         else:              outfiledelete.writelines(srcnewline)             try:                 os.remove(src)         except windowserror:                 print "i unable delete file:"                 outfilekeep.writelines(srcnewline)             files_removed += 1             if files_removed:                 print '%d files removed' % files_removed             else :                 print 'no files removed'     outfilekeep.close()     outfiledelete.close() 

os.walk(u'.') normal way native-unicode filenames , should work fine; me.

your problem here instead:

srcnewline = src + ", " + str(filename) + "\n" 

str(filename) use default encoding convert unicode string down bytes, , because encoding doesn't have character u+f022(*) unicodeencodeerror. have choose encoding want store in output file doing eg srcnewline= '%s, %s\n' % (src, filename.encode('utf-8')), or (perhaps better) keeping strings unicode , writing them file using codecs.opened file.

(*: private use area character shouldn't used, not can guess...)


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -