What's the way to run an Amazon Elastic Mapreduce job that depends on Numpy? -


the map portion of mapreduce job depends on numpy. so, means need have numpy installed part of bootstrap actions.

what i'm thinking of doing building custom numpy package stored on s3 fetched , installed during boostrap actions.

is there better way?

numpy comes installed on amazon elastic mapreduce instances, if want use other modules, can zip them up, distribute them workers distributedcache (using "-cachefile"), , import them python's built-in "zipimport" module.

see: http://www.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -