Counting objects in S3


Counting objects in S3

Blobs in a bucket

S3 organizes content into buckets. You place named blobs of data into buckets, and it’s not uncommon to place millions, or billions of objects into a bucket.

How many blobs are in that bucket anyway?

Over time, you might accumulate quite a number of items in a bucket; too many to see in the AWS console and too many to list in a terminal window.

In addition, you have to consider two additional factors:

  1. How timely do you need to be? Is it ok to have only an approximate, or slightly stale count?

  2. Do you want to count VERSIONS of the objects? foo.txt might have seven versions of itself in the bucket. Do you wish to count that as one object or seven.

How, then, do you calculate the number of objects in a bucket?

There are three ways, each slightly different. These are summarized in the following table and discussed in more detail below.

=============================================================================================
Technique:  aws s3 ls --summary --recursive   
Result:     Full list, summary count, not including versions
            Incurs a cost -- not trivial for large collections
=============================================================================================

=============================================================================================
Technique:  aws s3api list-objects             
Result:     Equivalent to the above, but more verbose
=============================================================================================

=============================================================================================
Technique:  Cloudwatch metrics             
Result:     Free, easiest to see in the console.
            Includes object versions in the count
            Separates counts for "All Storage Types" and "Standard Storage"
              which can be important if you are using, say, Glacier
=============================================================================================

Way one:
aws s3 ls

Using the AWS command line tool, you can use the following command:

> aws s3 ls --recursive s3://your_bucket_name_goes_here

This will spit out a listing of objects in the bucket akin to a directory listing. Here is an example:

2016-02-09 20:17:46       7604 404.html
2016-02-09 10:45:25         35 assets/css/examples.css
...
2016-02-09 20:17:46      12948 index.html
...
2016-02-09 20:17:46      70166 index.xml
2016-02-09 20:17:46      13533 examples/foo.html
...

Unfortunately, for buckets with billions of objects in them, this means a listing billions of lines long.

You can ammend the command with a

--summarize
option, as shown below, to get a summary at the end. Unfortunately, this doesn’t really help with the long listing. A billion entry bucket can take a very long time to list its contents.

> aws s3 ls --recursive --summarize s3://your_bucket_name_goes_here

The summary will look something like this

Total Objects: 2,307,998
   Total Size: 2716513646

You can keep the long listing from showing, but not trim off any time, by doing the following

> aws s3 ls --recursive --summarize s3://your_bucket_name_goes_here | tail

Way two: aws s3api list-objects

Using the AWS command line tool, you can use the following command:

> aws s3api list-objects --bucket your_bucket_name_goes_here

You don’t use the “s3://” syntax here. This API returns either XML or Json, depending on your preferences and is quite details (aka verbose).

You will have to calculate statistics, like summary counts, yourself.

Way three: Cloudwatch

You can see an object count directly in the cloudwatch section for your bucket. This technique is free, but includes a count of the number of versions of objects in your bucket, should you have versioning enabled.