Pretending python is a shell

We all like python for scripting, because it’s so much more powerful than a shell. But sometimes we really need to call a shell command because it’s so much easier than writing yet another library in python or adding a dependency:

from whelk import shell
shell.zgrep("-r", "downloads", "/var/log/httpd")
# Here goes code to process the log

You can even pipe commands together:

from whelk import pipe
pipe(pipe.getent("group") | pipe.grep(":1...:"))

Installing

Installing the latest released version is as simple as:

pip install whelk

If you want to tinker with the source, you can install the latest source from github:

git clone https://github.com/seveas/whelk.git

Calling a command

The whelk.shell object can be used to call any command on your $PATH that is also a valid python identifier. Since many commands contain a “-“, it will find those even if you spell it with a “_”. So e.g. run-parts can be found as shell.run_parts().

If your command is not valid as a python identifier, even after substituting dashes for underscores, you can using the shell object as a dict. This dict also accepts full paths to commands, even if they are not on your $PATH.

Attributes of the shell instance are all callables. Arguments to this callable get mapped to arguments to the command via a subprocess.Popen object. Keyword arguments get mapped to keyword arguments for the Popen object:

result = shell.netstat('-tlpn')
result = shell.git('status', cwd='/home/dennis/code/whelk')
result = shell['2to3']('awesome.py')
result = shell['./Configure']('-des', '-Dusedevel')

Oh, and on windows you can leave out the exe suffix, like you would on the command line as well:

result = shell.nmake('test')

Shell commands return a namedtuple (returncode, stdout, stderr) These result objects can also be used as booleans. As in shellscript, a non-zero returncode is considered False and a returncode of zero is considered True, so this simply works:

result = shell.make('test'):
if not result:
    print("You broke the build!")
    print(result.stderr)

The result of pipe(...) is slightly different: instead of a single return code, it actually will give you a list of returncodes of all items in the pipeline. Result objects like this are only considered True if all elements are zero.

Keyword arguments

In addition to the subprocess.Popen arguments, whelk supports a few more keyword arguments:

  • input

    Contrary to the subprocess defaults, stdin, stdout and stderr are set to whelk.PIPE by default. Input for the command can be passed as the input keyword parameter.

    Some examples:

    result = shell.cat(input="Hello world!")
    
    result = shell.vipe(input="Some data I want to edit in an editor")
    
  • output_callback

    To process output as soon as it arrives, specify a callback to use. Whenever output arrives, this callback will be called with as arguments the shell instance, the subprocess, the filedescriptor the data came in on, the actual data (or None in case of EOF) and any user-specified arguments . Here’s an example that uses this feature for logging:

    def cb(shell, sp, fd, data, extra=""):
        if data is None:
            logging.debug("%s<%d:%d> File descriptor closed" % (extra, sp.pid, fd))
        for line in data.splitlines():
            logging.debug("%s<%d:%d> %s" % (extra, sp.pid, fd, line))
    
    shell.dmesg(output_callback=cb)
    shell.mount(output_callback=[cb, "Mountpoints: "])
    
  • raise_on_error

    This makes your shell even more pythonic: instead of returning an errorcode, a CommandFailed exception is raised whenever a command returns with a nonzero exitcode.

    The reason this is not the default, is that for quite a few commands a non-zero exitcode, does not indicate an error at all. For example, the venerable diff command returns 1 if there is a change and 0 if there is none.

  • exit_callback

    If you want slightly more fine-grained control than raise_on_error, you can use this argument to specify a callable to call whenever a process exits, irrespective of the returncode. The callback will be called with as arguments the command instance, the subprocess, the result tuple and any user-provided arguments.

    Both raise_on_exit and exit_callback are most useful when set as a default of a Shell instance, they are not really needed when calling single commands.

    Here’s a real life example of an exit callback, which will retry git operations when the break due to repository locks:

    def check_sp(command, sp, res):
        if not res:
            if 'index.lock' in res.stderr:
                # Let's retry
                time.sleep(random.random())
                return command(*command.args, **command.kwargs)
            raise RuntimeError("%s %s failed: %s" % (command.name, ' '.join(command.args), res.stderr))
    
    git = Shell(exit_callback=check_lock).git
    git.checkout('master')
    
  • run_callback

    A function that will be called whenever the shell instance is about to create a new process. The callback will be called with as arguments the command instance and any user-provided arguments. Here’s an example that logs all starts of applications:

    def runlogger(cmd):
        args = [cmd.name] + list(cmd.args)
        env = cmd.sp_kwargs.get('env', '')
        if env:
            env = ['%s=%s' % (x, env[x]) for x in env if env[x] != os.environ.get(x, None)]
            env = '%s ' % ' '.join(env)
        logger.debug("Running %s%s" % (env, ' '.join(args)))
    
    shell = Shell(run_callback=runlogger)
    

Piping commands together

The whelk.pipe object is similar to the shell object but has a few significant differences:

  • pipe commands can be chained with | (binary or), resembling a shell pipe. pipe takes care of the I/O redirecting.

  • The command is not started immediately, but only when wrapping it in another pipe() call (yes, the object itself is callable), or chaining it to the next.

  • In the result tuple, the returncode is actually a list of returncodes of all the processes in the pipe, in the order they are executed in.

  • The only I/O redirection you may want to override is stderr=whelk.STDOUT, or stderr=open('/dev/null', 'w') to redirect stderr of a process to stdin of the next process, or /dev/null respectively.

Some examples:

result = pipe(pipe.dmesg() | pipe.grep('Bluetooth'))

cow = random.choice(os.listdir('/usr/share/cowsay/cows'))
result = pipe(pipe.fortune("-s") | pipe.cowsay("-n", "-f", cow))

Setting default arguments

If you want to launch many commands with the same parameters, you can set defaults by passing parameters to the Shell constructor. These are passed on to all commands launched by that shell, unless overridden in specific calls:

from whelk import Shell
my_env = os.environ.copy()
my_env['http_proxy'] = 'http://webproxy.corp:3128'
shell = Shell(stderr=Shell.STDOUT, env=my_env, encoding='utf8')

shell.wget("http://google.com", "-o", "google.html")

Python compatibility

Whelk is compatible with python 3.4 and up, python 2 is no longer supported. If you find an incompatibility, please report a bug at https://github.com/seveas/whelk.