Writing Bash Scripts like Python
25 Mar 2020In the previous post, I talked about implementing python-like logging in bash.
In that post, I said that it’s generally a bad idea to write code in one language as if it were another. But for bash, I’m willing to make an exception.
As I mentioned before, I’m no bash expert. I generally try to avoid it. But there are times when bash really is “the right tool for the job”. So in this post, I’m going to talk about how I approach writing bash scripts, and what I bring over from python.
Functions
Firstly, while it’s not necessary to declare functions with the function
keyword, I like to do it anyway for clarity. It’s especially helpful with syntax highlighting
hello() {
# do stuff
}
function hello() {
# do stuff
}
Per the usual advice, I aim for small functions that do one thing.
In generally, I would say that any pipeline or non-trivial usage of an external tool should be wrapped as a function with a description name.
For example, suppose I have a csv file with file names and sizes (and possibly some other metadata) and I want to get a list of paths sorted from largest to smallest, I might do
function paths_by_size() {
sort "$1" -rn -d, -k2 | cut -d, -f1
}
function main() {
for path in $(paths_by_size "metadata.csv"); do
# stuff
done
}
You could even go a step further and split paths_by_size
into a ‘sort by second field’ function and a ‘get first field’ function
Parameters
It can help to give function parameters names, especially when the function receives more than one
Variables declared in functions should be marked as local
to avoid ‘leakage’
function add() {
local x=$1
local y=$2
echo $(($x + $y))
}
Default Parameters
Bash has a syntax for setting default values for un-set parameters
function hello() {
local name=${1:-world}
echo "hello $name!"
}
Variadic Arguments
That is, a function parameter that can capture multiple values. In python, we’d use *args
In bash, you can capture all function arguments with $@
or $*
(depending on whether you care about splitting on whitespace)
If you want some positional arguments as well, then you can use shift
function log() {
local level=$1
shift
local message="$@"
# etc
}
log $INFO hello world
# INFO:hello world
Modules
In python, we have this construct if __name__ == '__main__'
. For those unfamiliar, the check is saying “is this file being run as a script”. If False, then the file is being imported as a module.
This is important, because when we’re importing the file as module, we don’t want it to ‘execute’ anything, we just want access to its function, classes, etc.
I did some googling, and it turn out there is an equivalent construct in bash
function main() {
# do stuff
}
if ! (return 0 2>/dev/null); then
main $@
fi
If we try return 0
in the shell we get an explanation of why it works
$ return 0
bash: return: can only `return' from a function or sourced script
In other words, if return 0
fails, then we’re not in a sourced (imported) script.
As with the python version, this allows us to source
(import) the script into other scripts for re-use or testing or whatever
source logging.sh
log $ERROR oops
As an additional point, if you use script options like set -eo pipefail
, it’s a good idea to put them inside the if. Another script sourcing it might not expect (or want) those options being set.
Returns and Errors
Bash has the return
keyword for returning an integer return code
function doit() {
return 1
}
This can be used to indicate if the function encountered an error - a return value of zero means success, a non-zero value means an error. The specific non-zero value can be used to indicate a specific error.
We can check whether a function returned non-zero with a simple if
if doit; then
# success
fi
if ! doit; then
# failure
fi
If we want to check the specific return code, we can use $?
- e.g. if [ $? -eq 2 ]; then ...
Additionally, a function can ‘return’ some value via stdout - for example
function hello() {
echo "hello $1!"
}
function main() {
msg=$(hello "world")
echo "$msg
}
And we can combine the two types of return for a sort of error handling
function divide() {
if [ $2 -eq 0 ]; then
echo "ERROR: zero division"
return 1
else
echo $(($1/$2))
fi
}
function main() {
if ! res=$(divide $x $y); then
echo "$res" >&2
exit 1
else
echo "$res"
fi
}
In fact, this is more similar to how Go does error handling
if value, err := divide(x, y); err != nil {
panic(err);
} else {
fmt.Println(value);
}
If I were to compare it to python, I would say it’s most like doing the function call in a try-except
try:
value = divide(x, y)
except Exception as e:
sys.exit(str(e))
else:
print(value)
Odds and Ends
The following are just some general ideas
- per python convention (amongst others), global variables should be uppercase, and local variables should be lowercase
- in general, avoid global variables unless they are constants; definitely don’t mutate global variables
- don’t use
exit
outside ofmain
. If another script tries to import (source) and use a function that callsexit
then that script is going to exit unexpectedly - since bash function signatures don’t declare arguments, it’s a good idea to put them in a comment/docstring
- run your bash scripts through something like shellcheck to pickup potential issues
- check dependencies - if you want to use a particular shell command, make sure it exists/is installed with something like
function require() {
hash "$1" || exit 127
}
require jq
Putting it Together
To round it all out, I’m going to show an example of a complete python script and its (nearest) equivalent in bash.
The scripts use the github API to get the real name for a given username
#!/usr/bin/env python
import logging
import sys
import requests
GITHUB_URL = "https://api.github.com/users/"
def get_name(username):
url = GITHUB_URL + username
logging.info("fetching %s", url)
resp = requests.get(url)
if resp.ok:
return resp.json()['name']
raise Exception(resp.text)
def main(username):
try:
name = get_name(username)
except Exception:
logging.critical("couldn't fetch name for %s", username)
sys.exit(1)
else:
print(name)
if __name__ == '__main__':
main(sys.argv[1])
#!/usr/bin/env bash
source "logging.sh"
source "require.sh"
require jq
GITHUB_URL="https://api.github.com/users/"
function get_name() {
local url="$GITHUB_URL$1"
log $INFO "fetching $url"
curl -s "$url" | jq -er .name
}
function main() {
local username="$1"
if ! name=$(get_name "$username"); then
log $CRITICAL "couldn't get name for $username"
exit 1
else
echo "$name"
fi
}
if ! (return 0 2>/dev/null); then
main "$1"
fi
Like I said, I’m no bash expert, so I’m not going to claim these are ‘best practices’. They might not even be good practices.
But this blog doesn’t have a comments section, so you can’t tell me otherwise ;)
Chris.
[Sometimes I write bash just to wind up my bash-phobic colleague]