mrjob.compat - Hadoop version compatibility

Utility functions for compatibility with different version of hadoop.

mrjob.compat.jobconf_from_dict(jobconf, name, default=None)

Get the value of a jobconf variable from the given dictionary.

Parameters:
  • jobconf (dict) – jobconf dictionary
  • name (string) – name of the jobconf variable (e.g. 'user.name')
  • default – fallback value

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 2, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set

mrjob.compat.jobconf_from_env(variable, default=None)

Get the value of a jobconf variable from the runtime environment.

For example, a MRJob could use jobconf_from_env('map.input.file') to get the name of the file a mapper is reading input from.

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 2.0, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.map_version(version, version_map)

Allows you to look up something by version (e.g. which jobconf variable to use, specifying only the versions where that value changed.

version is a string

version_map is a map from version (as a string) that a value changed to the new value.

For efficiency, version_map can also be a list of tuples of (LooseVersion(version_as_string), value), with oldest versions first.

If version is less than any version in version_map, use the value for the earliest version in version_map.

mrjob.compat.translate_jobconf(variable, version)

Translate variable to Hadoop version version. If it’s not a variable we recognize, leave as-is.

mrjob.compat.translate_jobconf_dict(jobconf, hadoop_version=None)

Translates the configuration property name to match those that are accepted in hadoop_version. Prints a warning message if any configuration property name does not match the name in the hadoop version. Combines the original jobconf with the translated jobconf.

Returns:a map consisting of the original and translated configuration property names and values.
mrjob.compat.translate_jobconf_for_all_versions(variable)

Get all known variants of the given jobconf variable. Unlike translate_jobconf(), returns a list.

mrjob.compat.uses_yarn(version)

Basically, is this Hadoop 2? This also handles versions in the zero series (0.23+) where YARN originated.

mrjob.compat.version_gte(version, cmp_version_str)

Return True if version >= cmp_version_str.