Python 2 vs. Python 3¶

Raw protocols¶

Both because we don’t want to break mrjob for Python 2 users, and to make writing jobs simple, jobs read their input as strs by default (even though str means bytes in Python 2 and unicode in Python 3).

The way this works in mrjob is that RawValueProtocol is actually an alias for one of two classes, BytesValueProtocol if you’re in Python 2, and TextValueProtocol if you’re in Python 3.

If you care about this distinction, you may want to explicitly set INPUT_PROTOCOL to one of these. If your input has a well-defined encoding, probably you want BytesValueProtocol, and if it’s a bunch of text that’s mostly ASCII, with like, some stuff that... might be UTF-8? (i.e. most log files), you probably want TextValueProtocol. But most of the time it’ll just work.

Bytes vs. strings¶

The following things are bytes in any version of Python (which means you need to use the bytes type and/or b'...' constant in Python 3):

data read or written by Protocols
lines yielded by cat_output()
anything read from cat()

The stdin, stdout, and stderr attributes of MRJobs are always bytestreams (so, for example, self.stderr defaults to sys.stderr.buffer in Python 3).

Everything else (including file paths, URIs, arguments to commands, and logging messages) are strings; that is, strs on Python 3, and either unicodes or ASCII strs on Python 2. Like with RawValueProtocol, most of the time it’ll just work even if you don’t think about it.

python_bin¶

python_bin defaults to python3 in Python 3, and python in Python 2 (except on EMR AMIs prior to 4.3.0, where we use python2.7)

Your Hadoop cluster¶

Whatever version of Python you use, you’ll have to have a compatible version of Python installed on your Hadoop cluster. mrjob does its best to make this work on Elastic MapReduce (see bootstrap_python), but if you’re running on your own Hadoop cluster, this is up to you.

mrjob v0.7.4 documentation

Python 2 vs. Python 3¶

Raw protocols¶

Bytes vs. strings¶

python_bin¶

Your Hadoop cluster¶

Table Of Contents

Need help?

This Page