mrjob.dataproc - run on Dataproc¶
MRJobon Google Cloud Dataproc. Invoked when you run your job with
DataprocJobRunnerruns your job in an Dataproc cluster, which is basically a temporary Hadoop cluster.
Input, support, and jar files can be either local or on GCS; use
gs://...URLs to refer to files on GCS.
This class has some useful utilities for talking directly to GCS and Dataproc, so you may find it useful to instantiate it without a script:
from mrjob.dataproc import DataprocJobRunner ...