mrjob.local - simulate Hadoop locally with subprocesses

class mrjob.local.LocalMRJobRunner(**kwargs)

Runs an MRJob locally, for testing purposes. Invoked when you run your job with -r local.

Unlike InlineMRJobRunner, this actually spawns multiple subprocesses for each task.

It’s rare to need to instantiate this class directly (see __init__() for details).

New in version 0.6.8: can run Spark steps as well, on the local-cluster Spark master.

LocalMRJobRunner.__init__(**kwargs)

Arguments to this constructor may also appear in mrjob.conf under runners/local.

LocalMRJobRunner‘s constructor takes the same keyword args as MRJobRunner. However, please note:

  • cmdenv is combined with combine_local_envs()
  • python_bin defaults to sys.executable (the current python interpreter)
  • hadoop_input_format, hadoop_output_format, and partitioner are ignored because they require Java. If you need to test these, consider starting up a standalone Hadoop instance and running your job with -r hadoop.