python - Managing Mac OS created filenames with non ASCII characters in windows environments? -
i deal large collection of unknown files, , have been been learning python me filter / sort , otherwise wrangle these files.
a collection looking @ has large number of resource forks, , wrote little script find them, , delete them (next step find them, , move them, thats day).
i found in collection there number of files have non ascii characters in file name, , seems tripping os.delete function.
example file name: ._spec com report 395 (n.b. 3 has small dot underneath it, can't find example, or figure out how show hex of filename...)
i log filenames, log records file: ._spec com report 3?95
the error windowserror, can't find file (the string passing not file known windows os.) put in try clause allow me work rounf it, deal properly.
i tried using unicode switch in walk option `os.walk(u'.') per post: handling ascii char in python string (top answer) , see following error:
traceback (most recent call last): file "<stdin>", line 3, in <module> file "c:\python27\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) unicodeencodeerror: 'charmap' codec can't encode character u'\uf022' in position 20: character maps <undefined>
so guessing answer lies how filename parsed, , wondering if might able point in me in right direction...
code:
import os import sys rootdir = "c:\target dir walk" destkeep = "keepers.txt" destdelete = "deleted.txt" matchingtext = "._" files_removed = 1 folder, subs, files in os.walk(rootdir): outfilekeep = open(destkeep,"a") outfiledelete = open(destdelete,"a") filename in files: matchscore = filename.find(matchingtext) src = os.path.join(folder, filename) srcnewline = src + ", " + str(filename) + "\n" if matchscore == -1: outfilekeep.writelines(srcnewline) else: outfiledelete.writelines(srcnewline) try: os.remove(src) except windowserror: print "i unable delete file:" outfilekeep.writelines(srcnewline) files_removed += 1 if files_removed: print '%d files removed' % files_removed else : print 'no files removed' outfilekeep.close() outfiledelete.close()
os.walk(u'.')
normal way native-unicode filenames , should work fine; me.
your problem here instead:
srcnewline = src + ", " + str(filename) + "\n"
str(filename)
use default encoding convert unicode string down bytes, , because encoding doesn't have character u+f022(*) unicodeencodeerror
. have choose encoding want store in output file doing eg srcnewline= '%s, %s\n' % (src, filename.encode('utf-8'))
, or (perhaps better) keeping strings unicode , writing them file using codecs.open
ed file.
(*: private use area character shouldn't used, not can guess...)
Comments
Post a Comment