Grep and SED Cheatsheet

If you work with the AWS CLI, chances are you are going to want to use grep on the various list and get operations to make the results more readable. And if you want a way to mimic bulk operations, you can combine it with sed for

First, let me get a rant out of the way: why, AWS CLI? why do you not standardize on get- or list-?? Truly aggravating, but I guess it just means you need to keep the link to the CLI reference bookmarked for convenience.

grep

Moving on. When you run a list, get, or describe, you'll get probably more information than you need, and grep is an easy way to filter down to just the lines you want. Take as an example the below output from s3api list-objects --bucket "my-aws-library":

S3 List Objects

All I wanted was a list of filenames. Grep to the rescue! While there are numerous flags to modify the behavior of grep, there are only a few that I use regularly. These are what I think are the most useful grep flags:

  • -o this says 'only return the lines that match the regex pattern'
  • -c this says 'just count the lines that match the regex pattern'
  • -v this inverts the match - iow, only return/count lines that don't match the regex pattern
  • -E interpret the pattern as an extended regular expression; this is helpful if you're accustomed to the VSCode flavor of regex

So, armed with these flags, let's adjust our list objects command like so:

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"'

Like using regex in VSCode, using .+" will do a greedy match of everything up until the final " on the line. This will produce a list with just the filenames:

S3 List Objects after grep

SED

Well, limiting the result to just a list of filenames is useful, but what I really want is to do stuff to those files. While the AWS console makes it fairly easy to move, delete, and copy files within an account, moving between accounts must be done through the CLI. There are some policies that need to be created first to ensure that the files can be copied, which are described here. But the final step from the instructions on that page is to copy using the CLI:

aws s3 cp s3://source-DOC-EXAMPLE-BUCKET/object.txt s3://destination-DOC-EXAMPLE-BUCKET/object.txt --acl bucket-owner-full-control

Well, I have a lot of objects in this bucket, and I don't want to write that many CLI commands. Now is the time to enlist sed

SED enables regex pattern-based substitution. The format is sed 's/pattern/replacement/'. The s/ tells sed that this is a substitution, and the final / tells it when the replacement is done. So, let's add on to our grepped object list like this:

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"' | sed 's/Key/Filename/'

Now our results look like this:

S3 List Objects after grep and sed

Okay, that was a pretty simple substitution, but it doesn't get us to the list of CLI commands we need. This is going to get more complicated, so I like to work in an editor and put the expressions on separate lines, just so I can see what I'm doing (I join the lines back up before pasting in bash).

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"' | sed 's/
[regex for the pattern to be replaced will go here]/
[regex for the replacement will go here]/' 

I'm also going to take advantage of a few other sed features:

  • In order to take advantage of extended regular expressions, we'll use the -E flag here, as well.
  • Since I will be working on paths that contain /, I'm going to use a different substitution delimiter. Alternate delimiters are allowed with sed, including @%|;:. I'm going to use @, since it won't collide with any of my text.
  • You can also use grouping and back-references in sed. To create a grouping, enclose a portion of the pattern in parentheses. To use a back-reference to a grouping in the replacement text, use the ordinal of the grouping escaped with a slash, e.g \1.

First, we'll build the pattern to be replaced. We want everything between the second set of double-quotes, and we'll be reusing that so we put it in a group:

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"' | sed -E 's@
"Key": "(.+)"@
[regex for the replacement will go here]@'

Next, we'll copy in the AWS CLI command, replacing the key name with the back-reference to the grouping:

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"' | sed -E 's@
"Key": "(.+)"@
aws s3 cp s3://source-DOC-EXAMPLE-BUCKET/\1 s3://destination-DOC-EXAMPLE-BUCKET/\1 --acl bucket-owner-full-control@'

Now, when we run it we get this:

The finished command list

Slick! I love writing code to write code. Since I'm too lazy to even copy and paste, I'm going to dump it directly into a file:

aws s3api list-objects --bucket "my-aws-library" | grep -o -E '"Key": .+"' | sed -E 's@
"Key": "(.+)"@
aws s3 cp s3://source-DOC-EXAMPLE-BUCKET/\1 s3://destination-DOC-EXAMPLE-BUCKET/\1 --acl bucket-owner-full-control@'
> s3commands

Now, I can run all those copy commands in bash by running source s3commands. And scene.

Resources