What's the way to run an Amazon Elastic Mapreduce job that depends on Numpy? -
the map portion of mapreduce job depends on numpy. so, means need have numpy installed part of bootstrap actions.
what i'm thinking of doing building custom numpy package stored on s3 fetched , installed during boostrap actions.
is there better way?
numpy comes installed on amazon elastic mapreduce instances, if want use other modules, can zip them up, distribute them workers distributedcache (using "-cachefile"), , import them python's built-in "zipimport" module.
see: http://www.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/
Comments
Post a Comment