Glossary
- combiner
- A function that converts one key and a list of values that share that
key (not necessarily all values for the key) to zero or more key-value
pairs based on some function. See Concepts for details.
- Hadoop Streaming
- A special jar that lets you run code written in any language on Hadoop.
It launches a subprocess, passes it input on stdin, and receives output
on stdout. Read more here.
- input protocol
- The protocol that converts the input file to the key-value
pairs seen by the first step. See Protocols for details.
- internal protocol
- The protocol that converts the output of one step to the intput
of the next. See Protocols for details.
- mapper
- A function that converts one key-value pair to zero or more key-value
pairs based on some function. See Concepts for details.
- output protocol
- The protocol that converts the output of the last step to the
bytes written to the output file. See Protocols for details.
- protocol
- An object that converts a stream of bytes to and from Python objects.
See Protocols for details.
- reducer
- A function that converts one key and all values that share that key to
zero or more key-value pairs based on some function. See
Concepts for details.
- step
- One mapper, combiner, and reducer. Any of
these may be omitted from a mrjob step as long as at least one is
included.