mrjob.util - general utility functions¶
Utility functions for MRJob
-
mrjob.util.
cmd_line
(args)¶ build a command line that works in a shell.
-
mrjob.util.
expand_path
(path)¶ Resolve
~
(home dir) and environment variables in path.If path is
None
, returnNone
.
-
mrjob.util.
file_ext
(filename)¶ return the file extension, including the
.
>>> file_ext('foo.tar.gz') '.tar.gz'
>>> file_ext('.emacs') ''
>>> file_ext('.mrjob.conf') '.conf'
-
mrjob.util.
log_to_null
(name=None)¶ Set up a null handler for the given stream, to suppress “no handlers could be found” warnings.
-
mrjob.util.
log_to_stream
(name=None, stream=None, format=None, level=None, debug=False)¶ Set up logging.
Parameters: - name (str) – name of the logger, or
None
for the root logger - stream (file object) – stream to log to (default is
sys.stderr
) - format (str) – log message format (default is ‘%(message)s’)
- level – log level to use
- debug (bool) – quick way of setting the log level: if true, use
logging.DEBUG
, otherwise uselogging.INFO
- name (str) – name of the logger, or
-
mrjob.util.
random_identifier
()¶ A random 16-digit hex string.
-
mrjob.util.
safeeval
(expr, globals=None, locals=None)¶ Like eval, but with nearly everything in the environment blanked out, so that it’s difficult to cause mischief.
globals and locals are optional dictionaries mapping names to values for those names (just like in
eval()
).
-
mrjob.util.
save_current_environment
(*args, **kwds)¶ Context manager that saves os.environ and loads it back again after execution
-
mrjob.util.
save_cwd
(*args, **kwds)¶ Context manager that saves the current working directory, and chdir’s back to it after execution.
-
mrjob.util.
save_sys_path
(*args, **kwds)¶ Context manager that saves sys.path and restores it after execution.
-
mrjob.util.
save_sys_std
(*args, **kwds)¶ Context manager that saves the current values of sys.stdin, sys.stdout, and sys.stderr, and flushes these filehandles before and after switching them out.
-
mrjob.util.
shlex_split
(s)¶ Wrapper around shlex.split(), but convert to str if Python version < 2.7.3 when unicode support was added.
-
mrjob.util.
strip_microseconds
(delta)¶ Return the given
datetime.timedelta
, without microseconds.Useful for printing
datetime.timedelta
objects.
-
mrjob.util.
to_lines
(chunks)¶ Take in data as a sequence of bytes, and yield it, one line at a time.
Only breaks lines on
\n
(not\r
), and does not add a trailing newline.For efficiency, passes through anything with a
readline()
attribute.
-
mrjob.util.
unarchive
(archive_path, dest)¶ Extract the contents of a tar or zip file at archive_path into the directory dest.
Parameters: dest will be created if it doesn’t already exist.
tar files can be gzip compressed, bzip2 compressed, or uncompressed. Files within zip files can be deflated or stored.
-
mrjob.util.
unique
(items)¶ Yield items from item in order, skipping duplicates.
-
mrjob.util.
which
(cmd, path=None)¶ Like the UNIX which command: search in path for the executable named cmd. path defaults to
PATH
. ReturnsNone
if no such executable found.This is basically
shutil.which()
(which was introduced in Python 3.3) without the mode argument. Best practice is to always specify path as a keyword argument.
-
mrjob.util.
zip_dir
(dir, out_path, filter=None, prefix='')¶ Compress the given dir into a zip file at out_path.
If we encounter symlinks, include the actual file, not the symlink.
Parameters: