Sat Dec 28 2024

Use Wildcards to Copy Multiple Files with AWS CLI

When working with data stored in Amazon S3, you may often need to copy multiple files that match a certain pattern to your local machine or another S3 bucket. However, traditional wildcard usage with the AWS CLI isn’t as straightforward as you might expect. Let’s explore how to achieve this using the AWS CLI.

Why Wildcards Don’t Work Directly

The AWS CLI does not support wildcards directly in the same way your local shell does. When you attempt a command like aws s3 cp s3://data/2016-08* ., it results in an error. This is because the AWS CLI needs explicit instructions for filtering files during operations.

How to Filter Files with cp

Instead of directly using wildcards in the source path, you can achieve the desired result by leveraging the --recursive, --exclude, and --include flags. Here’s how you can copy files that start with 2016-08 from an S3 bucket to your local directory:

aws s3 cp s3://data/ . --recursive --exclude "*" --include "2016-08*"

Understanding the Flags

  • --recursive: This flag is necessary for copying multiple files in nested directories.
  • --exclude: Pass * to exclude all files by default.
  • --include: Specify "2016-08*" to include only those files starting with 2016-08, overriding the exclusion set by the exclude flag.

The order of the exclude and include parameters is crucial. By excluding all files first, you ensure that only the specific ones you include get copied.

Tailoring the Command for Your Needs

Feel free to adjust the --include pattern to fit the specific filename structure you’re dealing with. Suppose you need files from August 2016; simply modify the pattern accordingly. This method is versatile for all sorts of date-based or event-based file naming conventions.

Additional Resources

For further details, consult the AWS CLI S3 Command Documentation, which offers more insights into the use of include and exclude filters.