Configuration quick reference
=============================

Setting configuration options
-----------------------------

You can set an option by:

* Passing it on the command line with the switch version (like
  ``--some-option``)
* Passing it as a keyword argument to the runner constructor, if you are
  creating the runner programmatically
* Putting it in one of the included config files under a runner name, like
  this:

  .. code-block:: yaml

    runners:
        local:
            python_bin: python3.6  # only used in local runner
        emr:
            python_bin: python3  # only used in Elastic MapReduce runner

  See :ref:`mrjob.conf` for information on where to put config files.

Options that can't be set from mrjob.conf (all runners)
-------------------------------------------------------

There are some options that it makes no sense to set in the config file.

These options can be set via command-line switches:

.. mrjob-optlist:: no_mrjob_conf

These options can be set by overriding attributes or methods in your job class:

.. use aliases to prevent rst from making our tables huge

.. |a_hadoop_input_format| replace:: :py:attr:`~mrjob.job.MRJob.HADOOP_INPUT_FORMAT`
.. |a_hadoop_output_format| replace:: :py:attr:`~mrjob.job.MRJob.HADOOP_OUTPUT_FORMAT`
.. |a_partitioner| replace:: :py:attr:`~mrjob.job.MRJob.PARTITIONER`

.. |m_hadoop_input_format| replace:: :py:meth:`~mrjob.job.MRJob.hadoop_input_format`
.. |m_hadoop_output_format| replace:: :py:meth:`~mrjob.job.MRJob.hadoop_output_format`
.. |m_partitioner| replace:: :py:meth:`~mrjob.job.MRJob.partitioner`

====================== ======================== ======================== ========
Option                 Attribute                Method                   Default
====================== ======================== ======================== ========
*hadoop_input_format*  |a_hadoop_input_format|  |m_hadoop_input_format|  ``None``
*hadoop_output_format* |a_hadoop_output_format| |m_hadoop_output_format| ``None``
*partitioner*          |a_partitioner|          |m_partitioner|          ``None``
====================== ======================== ======================== ========

These options can be set by overriding your job's
:py:meth:`~mrjob.job.MRJob.configure_args` to call the appropriate method:

.. |extra_args| replace:: :py:meth:`extra_args <mrjob.runner.MRJobRunner.__init__>`
.. |add_passthru_arg| replace:: :py:meth:`~mrjob.job.MRJob.add_passthru_arg`
.. |add_file_arg| replace:: :py:meth:`~mrjob.job.MRJob.add_file_arg`

====================== ======================== ========
Option                 Method                   Default
====================== ======================== ========
*extra_args*           |add_passthru_arg|       ``[]``
====================== ======================== ========

All of the above can be passed as keyword arguments to
:py:meth:`MRJobRunner.__init__() <mrjob.runner.MRJobRunner.__init__>`
(this is what makes them runner options), but you usually don't want to
instantiate runners directly.

Other options for all runners
-----------------------------

These options can be passed to any runner without an error, though some runners
may ignore some options. See the text after the table for specifics.

.. mrjob-optlist:: all

:py:class:`~mrjob.local.LocalMRJobRunner` takes no additional options, but:

* :mrjob-opt:`bootstrap_mrjob` is ``False`` by default
* :mrjob-opt:`cmdenv` uses the local system path separator instead of ``:`` all
  the time (so ``;`` on Windows, no change elsewhere)
* :mrjob-opt:`python_bin` defaults to the current Python interpreter

In addition, it ignores *hadoop_input_format*, *hadoop_output_format*,
*hadoop_streaming_jar*, and *jobconf*

:py:class:`~mrjob.inline.InlineMRJobRunner` works like
:py:class:`~mrjob.local.LocalMRJobRunner`, only it also ignores
*bootstrap_mrjob*, *cmdenv*, *python_bin*,
*upload_archives*, and *upload_files*.


Additional options for :py:class:`~mrjob.dataproc.DataprocJobRunner`
--------------------------------------------------------------------

.. mrjob-optlist:: dataproc


Additional options for :py:class:`~mrjob.emr.EMRJobRunner`
----------------------------------------------------------

.. mrjob-optlist:: emr


Additional options for :py:class:`~mrjob.hadoop.HadoopJobRunner`
----------------------------------------------------------------

.. mrjob-optlist:: hadoop