What's the way to run an Amazon Elastic Mapreduce job that depends on Numpy? -

- August 15, 2012

the map portion of mapreduce job depends on numpy. so, means need have numpy installed part of bootstrap actions.

what i'm thinking of doing building custom numpy package stored on s3 fetched , installed during boostrap actions.

is there better way?

numpy comes installed on amazon elastic mapreduce instances, if want use other modules, can zip them up, distribute them workers distributedcache (using "-cachefile"), , import them python's built-in "zipimport" module.

see: http://www.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/

Search This Blog

Support

What's the way to run an Amazon Elastic Mapreduce job that depends on Numpy? -

Comments

Post a Comment

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -