Counting objects in S3
Counting objects in S3
Blobs in a bucket
S3 organizes content into buckets. You place named blobs of data into buckets, and it’s not uncommon to place millions, or billions of objects into a bucket.
How many blobs are in that bucket anyway?
Over time, you might accumulate quite a number of items in a bucket; too many to see in the AWS console and too many to list in a terminal window.
In addition, you have to consider two additional factors:
How timely do you need to be? Is it ok to have only an approximate, or slightly stale count?
Do you want to count VERSIONS of the objects? foo.txt might have seven versions of itself in the bucket. Do you wish to count that as one object or seven.
How, then, do you calculate the number of objects in a bucket?
There are three ways, each slightly different. These are summarized in the following table and discussed in more detail below.
=============================================================================================
Technique: aws s3 ls --summary --recursive
Result: Full list, summary count, not including versions
Incurs a cost -- not trivial for large collections
=============================================================================================
=============================================================================================
Technique: aws s3api list-objects
Result: Equivalent to the above, but more verbose
=============================================================================================
=============================================================================================
Technique: Cloudwatch metrics
Result: Free, easiest to see in the console.
Includes object versions in the count
Separates counts for "All Storage Types" and "Standard Storage"
which can be important if you are using, say, Glacier
=============================================================================================
Way one: aws s3 ls
aws s3 ls
Using the AWS command line tool, you can use the following command:
> aws s3 ls --recursive s3://your_bucket_name_goes_here
This will spit out a listing of objects in the bucket akin to a directory listing. Here is an example:
2016-02-09 20:17:46 7604 404.html
2016-02-09 10:45:25 35 assets/css/examples.css
...
2016-02-09 20:17:46 12948 index.html
...
2016-02-09 20:17:46 70166 index.xml
2016-02-09 20:17:46 13533 examples/foo.html
...
Unfortunately, for buckets with billions of objects in them, this means a listing billions of lines long.
You can ammend the command with a
--summarize
> aws s3 ls --recursive --summarize s3://your_bucket_name_goes_here
The summary will look something like this
Total Objects: 2,307,998
Total Size: 2716513646
You can keep the long listing from showing, but not trim off any time, by doing the following
> aws s3 ls --recursive --summarize s3://your_bucket_name_goes_here | tail
Way two: aws s3api list-objects
Using the AWS command line tool, you can use the following command:
> aws s3api list-objects --bucket your_bucket_name_goes_here
You don’t use the “s3://” syntax here. This API returns either XML or Json, depending on your preferences and is quite details (aka verbose).
You will have to calculate statistics, like summary counts, yourself.
Way three: Cloudwatch
You can see an object count directly in the cloudwatch section for your bucket. This technique is free, but includes a count of the number of versions of objects in your bucket, should you have versioning enabled.