Tangentially, http://www.shellcheck.net/ is an awesome static linter for shell code. Available online, as a command-line tool, but most importantly as a plugin for your favorite $EDITOR.
Whenever I attempt to learn a new language I find that static analysis tools and linters, if available, help me get over any initial confusion about what constitutes good idiomatic code. Guides like OP help too, but nothing beats a good linter. The thing is though, I've been writing Bash shell scripts for going on 15 years now, and I still overlook little details [0] all the time.
I've been mulling in the back of my head for a while a theory about the need for style guides and linters: if a language makes them a near necessity for reasons other than aesthetics then the language is probably best avoided. I'm not sure if agree totally with that. However, I think the most frustrating experiences I've had while programming are the result of languages which allow for multiple ways, in terms of syntax, of doing the same thing which behave the same most of the time.
When writing bash, there are so many hoops to jump through (e.g. syntactic traps, bashisms to avoid) that people who like me who use these languages only occasionally will shoot their foot repeatedly.
For this population (and I think it's big, given the glue nature of *sh), a guide won't help much because occasional practice won't give them any time to sink in; interactive linters are a godsend.
Good rules overall, we follow similar guidelines. Any comments on:
* readonly - describing these in library scripts is a bit dangerous since the readonly is not specific to the file. So, unless you have some namespacing pattern for globals, this will bite you
* I didn't see any recommendation but always start with set -eux -o pipefail. set -e has it's own set of pitfalls but it's best to learn what the pitfalls are and work with it :-) It helps in the long run.
* When using set -u, use the FOO={FOO:-} syntax to initialize defaults for all your env vars.
On point 2: indeed, given that they are restricting themselves to bash, I would have expected a note of `set -eu -o pipefail`, at least. `-x` (tracing), isn't appropriate for many scripts, though.
On point 3: `: ${FOO:-some default value}` is the typical pattern for ensuring a default value is set.
I tend to avoid `set -e`. To borrow the Python adage: "explicit is better than implicit". In shell it translates to "if you need error handling, add it where necessary".
Except `set -e` (along with pipefail) is exactly what prevents you from implicitly catching failures, forcing you to be explicit (rather than implicit) with your error-handling, which is closer to how Python itself deals with unhandled exceptions. It has its limitations and caveats, and you can accomplish the same goals somewhat differently, but it is inaccurate to say that using `set -e` is tantamount to doing things implicitly.
> Bash is the only shell scripting language permitted for
> executables.
> [...]
> The only exception to this is where you're forced to by whatever
> you're coding for. One example of this is Solaris SVR4 packages
> which require plain Bourne shell for any scripts.
> [...]
> When to use Shell
>
> * If you're mostly calling other utilities and are doing
> relatively little data manipulation, shell is an acceptable choice
> for the task.
>
> * If performance matters, use something other than shell.
>
> * If you find you need to use arrays for anything more than
> assignment of ${PIPESTATUS}, you should use Python.
>
> * If you are writing a script that is more than 100 lines long,
> you should probably be writing it in Python instead. Bear in mind
> that scripts grow.
>
> * Rewrite your script in another language early to avoid a
> time-consuming rewrite at a later date.
Limiting the domain of shell scripts is the best advice in here. Shell scripts are terrible to maintain, and you should use perl or python if it's complicated.
But limiting the domain of shell scripts also probably means that you're not going to be using any advanced features of bash, so you should write it in sh instead. It's undoubtedly more portable, and it also serves to enforce the idea that you shouldn't write complicated shell scripts. I'll take a leap and say it outright, if you're using a feature of bash that isn't in bourne shell, you shouldn't be writing it in either of them.
The internet is littered with bash scripts that are compatible with bourne shell, but the shebang says '#!/bin/bash'. Well some systems don't have bash! But everyone has bourne shell.
I'm disappointed by their choice of bash and how this guide might reinforce that behaviour, but other than that, the actual style guidelines are great.
> * If you are writing a script that is more than 100 lines long,
> you should probably be writing it in Python instead. Bear in mind
> that scripts grow.
>
> * Rewrite your script in another language early to avoid a
> time-consuming rewrite at a later date.
Hehe. I did on DLang with a script that we have to update some instances of our application on develop machines. Our old bash scripts was becoming bigger and can't handle any more some few cases.
I've never learned perl in depth but I've never seen a perl script that didn't seem either a) overly complicated or b) terse to the point of being opaque.
Shell scripts, for all their warts, tend to be fairly readable and easy to follow unless deliberately obfuscated. I would include Python here to a point as well, though Python is certainly friendly to architecture astronauts also.
I've started using #!/usr/bin/env bash for compatibility with non-FHS distros such as NixOS.
Happy to see arrays omitted; if you find yourself in need of bash arrays you should definitely consider a proper language.
I tend to stick to POSIX shell to the extent possible, but readonly variables and [[ ]] syntactic sugar makes sense when maintainability is more important than portability.
Frankly, I find hard-coding #!/bin/bash sensible in a large corporate setting such as Google. It prevents PATH manipulation and they obviously have a uniform and controlled environment.
Bash arrays can be convenient, but at that level of complexity Python or Ruby are much easier to read and understand (and necessarily maintain).
(Tedious disclaimer: my opinion only, not speaking for anybody else. I'm an SRE at Google.)
We run everything on prodimage, so operating system compatibility isn't really an issue.
Realistically, any time somebody sends me a code review with a shell script in it, my first comment is going to be "Why is this a shell script? Rewrite it in a supported language."
Re: arrays -- I generally agree, but they are best solution to certain otherwise simple problems, like storing command arguments. `cmd $FLAGS` is objectively more dangerous than `cmd "${FLAGS[@]}"` due to the globbing and word-splitting issues the former exposes. In my experience, devs are more likely to opt for the former when the latter is forbidden than switch languages (because that's work).
Any reason why they don't mandate something like "set -euo pipefail" at the beginning? I find it invaluable for writing and debugging scripts, and in general for avoiding weird (or dangerous) errors.
Also has some other useful BASH links and comments. Here is a quote:
> BASH is the most widely-used and widely-supported shell for Linux. There are other shells that are better than BASH in various ways, but we feel that none of these other shells are better enough to warrant replacing BASH as the de-facto standard when writing shell scripts. BASH is installed by default on almost all Unix-based operating systems, and the majority of the world’s shell scripts are written in BASH. For this reason, we suggest that all of our developers learn BASH.
> BASH scripts are a domain-specific programming language that is well-suited to managing processes and files. That being said, the large number of special characters appropriated for process management, its text expansions, and its unusual syntax make BASH poorly-suited for general purpose programming. Accordingly, we think that BASH should only be used for scripts that are predominantly concerned with processes and files.
i try to write my scripts in such a way that if all newlines were lost, the script would still run. semicolons even where optional.
requiring the use of bash for non-interactive use? good grief.
is it possible this company has a linux bias?
the usefulness of a minimal scripting shell cannot be denied. even with linux distribs that use bash, we almost always see busybox in use. busybox is more like the almquist sh than bash.
with sh, bash-only features may not work.
for example, shellshock did not work with almquist sh.
why are bash script not given a .bash file extension to distinguish them from sh scripts (.sh extension)?
.ksh extension is often used for korn shell scripts.
Given that every server and most desktops at Google are Linux, and that it created at least two operating systems based on Linux (Android and ChromeOS) -- yes, I think it would be safe to say there's a Linux bias :)
Why would they have any other bias? Is there any other viable alternative on the server side they should car about? Are Google realistically going to switch to it? It's not like 1990, where you coded so that your script worked with 5+ UNIX vendors.
>with sh, bash-only features may not work.
Sounds like a self-inflicted problem. Just arrange so bash handles your scripts (which they do).
>.ksh extension is often used for korn shell scripts.
Yes but they're hidden behind the little "Play" button at the left of each point. Clicking it reveals further explanation and, often, examples.
Not very discoverable; even though this feature is documented at the top right of the page, I skipped it too and only found it by accidentally clicking one of the buttons while obsessively-compulsively selecting text I was reading :D
Why can't we have setuid on shell scripts which are non-writeable?
I have often wished for a facility to allow unprivileged users to execute specific predefined tasks, and writing C programs for such things would be a huge pain.
The issue is that there's an unavoidable race condition which renders SUID for interpreted (ie. #!) executables insecure.
Suppose there's some SUID bash script owned by root in the system that starts with "#!/bin/bash".
I (an unprivileged user) create a symlink "./foo" to "/path/to/SUID_script". I execute "./foo".
The kernel follows foo, reads the SUID script's #!, and so runs "/bin/bash ./foo". Honoring the SUID, bash is run as root.
Here's the race. In between the kernel doing this and bash finishing initialisation and reading its script arg, I swap out the "foo" symlink to instead point at "./my_evil_code".
So bash reads "./foo" and executes the content as root. I now have arbitrary code execution as root.
Are there ways this could be fixed? Possibly. But not easily.
It's not enough to just say "SUID can't work through a symlink" - I could've symlinked to the SUID script's parent directory, for example.
You can't say "the kernel should pass the fully resolved path to the interpreter", this will break a million scripts that rely on changing behaviour based on $0.
I share your pain about wanting an easy, non-compiled means of creating a small SUID program. But I don't see any good way unless something changes. Maybe a util that takes a #! interpreted file, adds an ELF header and some machine code that execs the desired interpreter and feeds it static script content?
Isn't this exactly what sudo / sudoers is for? You can specify a single command or even restrict down to which arguments an unprivileged user can pass to that one command.
Pro-tip: if you're going to go this route, stick these lines in a file in sudoers.d as you're likely to end up with a lot of them, this will help make your config more readable.
That's a good point. I've been trying to find a good guide on make a nice MOTD (Message of the day Script) and some ASCII art since I've been playing around with my envrioment more.
This is the most logical option. Tabs to add an indentation level, the width of that is unimportant and can be configured by each user. Spaces to align, since they're like any other character in a mono spaced font thereby perfectly matching the line above.
It is easy to catch an indentation error with two spaces, while it is harder to make an indentation error with just two spaces. It also easier to type two spaces than 3 or 4 spaces, which is important when program is typed without help of an autoindentation tool. Two spaces are enough for fixed length font, typical in terminal, but also saves screen space, which is just 80 characters in width.
Indentation
Indent 2 spaces. No tabs.
Use blank lines between blocks to improve readability.
Indentation is two spaces.
Whatever you do, don't use tabs.
For existing files, stay faithful to the existing indentation.
I don't wish to dig up a holy war, but why would you want to use spaces instead of tabs? That's what they're for, isn't it? One tab = one indentation level. No need for messing around with spaces, which serve other purposes.
I would be happy with 100% tabs or 100% spaces, but in practice it seems like allowing tabs inevitably leads to source files indented with a combination of both, either by accident or because someone wants to have intermediate levels of indentation (for multiline conditionals for example).
Once you have mixed tabs and spaces, everything goes to hell because now that code's formatting is tied to a specific person's idea of how many spaces should a tab display as, or whether tabs refer to tab stops or just a fixed number of spaces.
I never had strong feelings about it until encountering some really pathological code bases, which had gotten to the point where there was literally no valid tab->spaces setting that would make everything look correct.
Ultimately, though, these things really just don't matter. I use 4 spaces per indent level, but it's not like I'm incapable of reading and writing code that uses tabs, or 2 spaces, or 8 spaces. Really the only thing that matters is that there is a standard and that it is consistently applied within a project.
The usual argument is that a space is always the same width, whereas a tab can have varying widths depending on your editor.
So, if you indent something with 4 spaces, then other people viewing your code will also always see 4 spaces, therefore ensuring that whatever you found to be readable indentation, does actually come out like that on the other end.
My personal reason : Because it makes aligning stuff harder than it should be. I often have conditions that span multiple lines, and if they aren't aligned, it makes everything harder to read. By using spaces, I make sure the alignment is good for everybody.
Some people absolutely want indentation to be one specific size for everybody and don't fully understand that what might look readable for them at 2 space-widths, would be more readable for some others at 4 space-widths.
Tabs are accessibility. People use spaces for the same misguided reasons they use px font sizes in css.
ASCII defines tab as shift to next tab stop, at each 8 character. Terminals are following ASCII standard, so tab is too wide to use for indentation with fixed font at 80 characters.