mrjob.conf - parse and write config files

“mrjob.conf” is the name of both this module, and the global config file for mrjob.

Reading and writing mrjob.conf

mrjob.conf.find_mrjob_conf()

Look for mrjob.conf, and return its path. Places we look:

  • The location specified by MRJOB_CONF
  • ~/.mrjob.conf
  • /etc/mrjob.conf

Return None if we can’t find it.

mrjob.conf.load_opts_from_mrjob_conf(runner_alias, conf_path=None, already_loaded=None)

Load a list of dictionaries representing the options in a given mrjob.conf for a specific runner, resolving includes. Returns [(path, values)]. If conf_path is not found, return [(None, {})].

Parameters:
  • runner_alias (str) – String identifier of the runner type, e.g. emr, local, etc.
  • conf_path (str) – location of the file to load
  • already_loaded (list) – list of real (according to os.path.realpath()) conf paths that have already been loaded (used by load_opts_from_mrjob_confs()).

Relative include: paths are relative to the real (after resolving symlinks) path of the including conf file

This will only load each config file once, even if it’s referenced from multiple paths due to symlinks.

mrjob.conf.load_opts_from_mrjob_confs(runner_alias, conf_paths=None)

Load a list of dictionaries representing the options in a given list of mrjob config files for a specific runner. Returns [(path, values), ...]. If a path is not found, use (None, {}) as its value.

If conf_paths is None, look for a config file in the default locations (see find_mrjob_conf()).

Parameters:
  • runner_alias (str) – String identifier of the runner type, e.g. emr, local, etc.
  • conf_path – locations of the files to load

This will only load each config file once, even if it’s referenced from multiple paths due to symlinks.

Combining options

Combiner functions take a list of values to combine, with later options taking precedence over earlier ones. None values are always ignored.

mrjob.conf.combine_cmds(*cmds)

Take zero or more commands to run on the command line, and return the last one that is not None. Each command should either be a list containing the command plus switches, or a string, which will be parsed with shlex.split(). The string must either be a byte string or a unicode string containing no non-ASCII characters.

Returns either None or a list containing the command plus arguments.

mrjob.conf.combine_dicts(*dicts)

Combine zero or more dictionaries. Values from dicts later in the list take precedence over values earlier in the list.

If you pass in None in place of a dictionary, it will be ignored.

mrjob.conf.combine_envs(*envs)

Combine zero or more dictionaries containing environment variables. Environment variable values may be wrapped in ClearedValue.

Environment variables later from dictionaries later in the list take priority over those earlier in the list.

For variables ending with PATH, we prepend (and add a colon) rather than overwriting. Wrapping a path value in ClearedValue disables this behavior.

Environment set to ClearedValue(None) will delete environment variables earlier in the list, rather than setting them to None.

If you pass in None in place of a dictionary in envs, it will be ignored.

mrjob.conf.combine_jobconfs(*jobconfs)

Like combine_dicts(), but non-string values are converted to Java-readable string (e.g. True becomes ‘true’). Keys whose value is None are blanked out.

mrjob.conf.combine_lists(*seqs)

Concatenate the given sequences into a list. Ignore None values.

Generally this is used for a list of commands we want to run; the “default” commands get run before any commands specific to your job.

Strings, bytes, and non-sequence objects (e.g. numbers) are treated as single-item lists.

mrjob.conf.combine_local_envs(*envs)

Same as combine_envs(), except that paths are combined using the local path separator (e.g ; on Windows rather than :).

mrjob.conf.combine_path_lists(*path_seqs)

Concatenate the given sequences into a list. Ignore None values. Resolve ~ (home dir) and environment variables, and expand globs that refer to the local filesystem.

Can take single strings as well as lists.

mrjob.conf.combine_paths(*paths)

Returns the last value in paths that is not None. Resolve ~ (home dir) and environment variables.

mrjob.conf.combine_values(*values)

Return the last value in values that is not None.

The default combiner; good for simple values (booleans, strings, numbers).