Blog

Inside the Minds of the Machine

APIs

Setting up an AWS S3 bucket for read-only web access

We recently needed to setup a podcast hosting solution with our own hostname, and we chose to use an AWS S3 bucket. It was much harder than we expected, mostly because the documentation was confusing and scattered.

What we wanted was an S3 bucket that would be available at a specific hostname – lets call it ‘files.xyz.com’ – where a small group of editorial people could upload mp3 files, and for those mp3 files to be publically available via http requests. Amazon has lots of sample policies and examples, but not for this particular case. I was able to string together what we wanted from documentation from several different places, however.

I won’t detail how to setup a basic AWS account with the billing etc, so if you haven’t done so you’ll need to go to their site and do so.

Step 1: Setup an S3 bucket that is named with your complete hostname

Setting up an S3 bucket is easy, but doing so in a way that allows us to use our own custom hostname is not at all obvious and a bit frustrating, though easy to implement. Go to the S3 console and click ‘Create Bucket’.

However, if you want to have an S3 bucket that is accessible at ‘files.xyz.com’, when you create your bucket you have to name it ‘files.xyz.com’. This needs to be the exact hostname you’re using. This will NOT by itself make the bucket appear at ‘files.xyz.com’, but it is a required part of the puzzle.

Step 2: Setup your DNS

Amazon encourages you to use their Route53 service for DNS hosting and to setup an ‘A’ record, but it was much easier for us to setup a CNAME record and we already have a DNS solution.

Once you’ve created your bucket (in our case, ‘files.xyz.com’), the URL for that bucket is http://files.xyz.com.s3.amazonaws.com/ which also translates to http://s3.amazonaws.com/files.xyz.com (this assumes that somebody else hasn’t already used the bucket name you selected).

If you use a CNAME record to point ‘files.xyz.com’ to ‘files.xyz.com.s3.amazonaws.com’, their S3 service does a primitive version of ‘name-based virtual hosting’, where it looks at the hostname of the incoming request (‘files.xyz.com’) and assumes that’s the bucket name to use (‘s3.amazonaws.com/files.xyz.com’). Amazon explains the process here.

If your bucket doesn’t have the same exact name as the hostname the request is coming in as, Amazon will not find the bucket and you’ll see a 404. For instance ‘www.files.xyz.com’ is not the name of our ‘files.xyz.com’ bucket, and if you need to handle those responses also you’ll need to setup another bucket and put in a redirect.

Step 3: Create a bucket policy to make all the content public

By default when you upload files to an S3 bucket, those files are ‘private’. Instead of giving a simple checkbox somewhere to change that default, AWS requires you to add a JSON-formatted ‘bucket policy’ to the bucket.

In the S3 console, click on your bucket, click on ‘Properties’, then expand the ‘Permissions’ menu. There you’ll see a link to add or edit the bucket policy. Here’s the policy you’ll need, though you need to replace ‘files.xyz.com’ with the name of your bucket:

{
  "Version": "2008-10-17",
  "Statement": [
     {
      "Sid": "AllowPublicRead",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::files.xyz.com/*"
     }
  ]
}

Here’s what’s going on in that JSON:

  • ‘Effect’ – ‘Allow’;
  • ‘Principal’ – the person accessing it, in this case ‘*’ AKA everybody;
  • ‘Action’ – ‘s3:GetObject’ aka Get a file (in S3-speak all files are ‘objects’); and
  • ‘Resource’ – ‘arn:aws:s3:::files.xyz.com/*’, which expands to amazon resource name (arn):AWS:S3:::your.bucket.name/* (your bucket and ‘/*’ AKA everything within it).

Now everything in your bucket is readable by the world. Without this ‘bucket policy’, whenever a staffer uploaded a file he or she would have to manually click on that file afterwards and choose ‘Make Public’.

Step 4: Create a user policy to give read/write access to the bucket

I want to give access to a group of people so that all of them have access to the bucket. The way we handle this in AWS is to go into the Identity and Access Management console and create a ‘policy’, then create a ‘group’ and attach the ‘policy’ to the ‘group’, and then add create and add users to that group. We’ll start with the policy.

In the IAM console, click on ‘Policies’ and select ‘Create Policy’, then click ‘Create Your Own Policy’.

You’ll need to provide a name – use something very specific and descriptive. Our policy gives readwrite access to our S3 ‘files.xyz.com’ bucket, so ‘S3FilesXYZComReadWrite’ would be good. The name has to be alphanumeric only, CamelCase recommended, no spaces. You should put in a nice description as well.

In the ‘Policy Document’ window paste in the policy below, as usual replacing ‘files.xyz.com’ with your bucket name.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::files.xyz.com"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::files.xyz.com/*"
        }
    ]
}

What’s happening here are two things:

  • First, allow whoever is affected by this policy ListBucket access (list the files and folders) to the ‘files.xyz.com’ bucket;
  • Second, allow whoever is affected by this policy Get, Put, and Delete access to all of the objects within the bucket. This second statement also allows get and delete access to previous versions of objects – S3 bucket ‘objects’ have automatic versioning – and allows the user to change the ACL (“access control list” – make something public or private).

Click ‘Validate Policy’ and you’ll see if there are any errors – the most likely one would be that the ‘resource doesn’t exist’, because the name of the bucket doesn’t match an existing bucket. If all is well, ‘create the policy’.

Note that if you ‘cancel’ at any point you’ll have to start from scratch!

Step 5: Create the group and attach the policy to it.

In the IAM console click on ‘Groups’ and ‘Create New Group’.

Give the group a descriptive name. A user can be in unlimited groups, and groups can have up to 10 policies attached to them. If only a very specific and limited number of people should be able to have read/write access to this bucket a group name like ‘ReadWriteAccessToFilesXYZCom’ might be appropriate.

The next step is to click the ‘Attach Policy’ button. Find the policy you created earlier in the list, select it, and click the ‘Attach Policy’ button again.

Step 6: Create users and add them to the group.

If you already have users in IAM setup, adding them to the group is simple – select the user in the IAM console and click ‘Add User to Groups’.

If you need to create users, click the ‘Create User’ dialogue and enter the usernames. I don’t want my users to have to bother with the AWS Management Console, because instead they’ll use a client like Cloudberry (windows) or Cyberduck (mac), so I’ll leave the default ‘Generate an access key for each user’ selected.

As I enter the usernames, I’ll be given the opportunity to ‘download access keys’ – where you’ll get key/secret pairs that you can save and then distribute to the users you create. If you forget to do so you can always go back in and generate new access key/secret pairs for each user.

We don’t need to bother with creating passwords, as those are just to allow the user to login to the AWS Management Console. Our users will primarily be using Cyberduck, and that uses our access key/secret pairs.

Step 7: Making Cyberduck work and fixing an SSL problem

For Mac users, Cyberduck (free) is the best client for uploading/managing files in an S3 bucket – it is also very popular for working with FTP sites and the like, so Mac users will very likely be familiar with it. When setting up a new bookmark in Cyberduck, they’ll use ‘s3.amazonaws.com’ as the Server (this is the default), then put in the access key as the username, the ‘secret’ as the password.

The ‘path’ will be whatever the name of the bucket you created was – eg ‘/files.xyz.com/’.

When the user tries to connect, they’ll almost certainly get a warning from Cyberduck that there’s an ‘SSL Mismatch’ between ‘files.xyz.com.s3.amazonaws.com’ and ‘*.s3.amazonaws.com’. This is a known bug with Cyberduck that affects any S3 bucket that has a ‘.’ in its name, and our custom-host-name bucket certainly does.

Fortunately, there’s a solution: Go into the ‘Terminal’ application in the Mac and enter the following line:

defaults write ch.sudo.cyberduck s3.bucket.virtualhost.disable true

and hit the ‘return’ button.

This tells Cyberduck to stop trying to do the automatic ‘virtual host’ translation from ‘s3.amazonaws.com/files.xyz.com’ to ‘files.xyz.com.s3.amazonaws.com’.

Now everything should work!