Glossary

combiner
A function that converts one key and a list of values that share that key (not necessarily all values for the key) to zero or more key-value pairs based on some function. See Concepts for details.
Hadoop Streaming
A special jar that lets you run code written in any language on Hadoop. It launches a subprocess, passes it input on stdin, and receives output on stdout. Read more here.
input protocol
The protocol that converts the input file to the key-value pairs seen by the first step. See Protocols for details.
internal protocol
The protocol that converts the output of one step to the intput of the next. See Protocols for details.
mapper
A function that converts one key-value pair to zero or more key-value pairs based on some function. See Concepts for details.
output protocol
The protocol that converts the output of the last step to the bytes written to the output file. See Protocols for details.
protocol
An object that converts a stream of bytes to and from Python objects. See Protocols for details.
reducer
A function that converts one key and all values that share that key to zero or more key-value pairs based on some function. See Concepts for details.
step
One mapper, combiner, and reducer. Any of these may be omitted from a mrjob step as long as at least one is included.