Getting Started with S3

September 29th, 2006 by morgan

OK, once you have done the steps mentioned in the article “Getting Started with AWS” you should be able to actually do something. Probably the easiest service to start with is S3 (short for Simple Storage Service).

What is S3?

Good Question. The AWS folks describe S3 as:

… a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

OK, so that is about half marketing-speak and about a third gobbledy gook. Imagine S3 as a very large, very simple place to store files. It isn’t a hierarchical file system, instead everything is stored in buckets which are containers for objects stored on S3. Also, each object has a key which uniquely identifies it. You use a bucket like a directory, an object like a file, and its key like the file name.

The documentation says that:

Every object in Amazon S3 can be uniquely addressed through the combination of the Service endpoint, bucket name, and key, as in http://s3.amazonaws.com/doc/2006-03-01/AmazonS3.wsdl, where “doc” is the name of the bucket, and “2006-03-01/AmazonS3.wsdl” is the key.

Actually Doing Something

This all sounds great, but in reality we want to actually do something with this giant file system. Now, here is the tricky part. S3 is really set up as a developers system, and the tools are in a bit of a rough form.

At this point, you have two choices:

  1. Buy a tool that has S3 support in it.
  2. Pick a language and code it yourself.
  3. Use ready made tools.

There are already lists of tools that support S3, and examples of #2 and #3 in the S3 Code Samples, and they are worth checking out. Each of them have varying degrees of configuration required, and what you use is really up to you. I would recommend #1 if you are using Windows or aren’t either an experienced developer or willing to learn the hard way.

I am a fortunate enough to use Mac OS X, so I really have my run of things I can use. I picked s3curl because I am lazy and don’t want to have to make, configure, or compile anything and am comfortable on the command line. This didn’t work (missing packages in PERL, methinks) so I switched to s3-shell, although I did have to install ant to do it. It worked fine, but it is missing some pretty basic things, like the ability to set access to the buckets and objects you have created. This won’t do!
The best solution I could find was jSh3ll. It is a ready made Java tool comes compiled and has a lot of nice command line options, including the ability to run scripted commands from an input file. So, for now that is that I am going to be giving descriptions with.

The Nitty Gritty

Now, start jSh3ll, and enter in the following commands (substituting in your own access_key_id and secret_access_id)

$ java -jar dist/jSh3ll.jar

Welcome to jSh3ll (Amazon S3 command shell for Java) (c) 2006 SilvaSoft, Inc.
Type ‘help’ for command list.

jSh3ll> host s3.amazonaws.com
jSh3ll> user access_key_id
jSh3ll> pass secret_access_id
jSh3ll>

At this point, you should be logged in. To see what commands are available, you can use help:

jSh3ll> help
bucket [bucketname]
count [prefix]
createbucket
delete
deleteall [prefix]
deletebucket
exit
get
getacl [’bucket’|'item’]
getfile
getfilez
gettorrent
head [’bucket’|'item’]
host [hostname]
list [prefix] [max]
listatom [prefix] [max]
listrss [prefix] [max]
listbuckets
pass [password]
put
putfile
putfilez
putfilewacl [’private’|'public-read’|'public-read-write’|'authenticated-read’]
putfilezwacl [’private’|'public-read’|'public-read-write’|'authenticated-read’]
quit
setacl [’bucket’|'item’] [’private’|'public-read’|'public-read-write’|'authenticated-read’]
time [’none’|'long’|'all’]
threads [num]
user [username]
jSh3ll>

Next, create a bucket (substitute its name for bucket-name below) and upload a file to that location.

jSh3ll> bucket bucket-name
Bucket set to ‘bucket-name
jSh3ll> createbucket
Created bucket ‘bucket-name
[runtime: 2.614s]
jSh3ll> list
Item list for bucket ‘bucket-name
jSh3ll>

This shouldn’t return anything, as the bucket should be empty. Now, let’s try uploading something (substitute your file’s name for file-name below):

jSh3ll> putfile key-name file-name
Stored item ‘bucket-name/key-name
jSh3ll> list
Item list for bucket ‘bucket-name
key=key-name, owner=user-name, size=XXXX bytes, last modified=Fri Sep 29 23:45:36 CDT 2006
[runtime: 2.077s]
jSh3ll>

At this point, you should see key-name, as this is the name for the file on the remote system. Also, at this point this exercise is costing you money! Just to really prove to yourself that it is there, you can look at it through your web browser. First, we have to open up the Access Control List (or ACL) so that it the bucket and object can be seen:

jSh3ll> setacl bucket bucket-name public-read
Set ACL for bucket ‘bucket-name‘ to public-read
jSh3ll> setacl item key-name public-read
Set ACL for item ‘bucket-name/key-name‘ to public-read

At this point, your file should be available at http://bucket-name.s3.amazonaws.com/key-name. If you wanted to be really clever and give your file a key-name like docs/critical/README then it would appear like a truly hierarchical file system.

Cleaning Up

The last thing you need to do is to delete the file and bucket to make sure you don’t incur any extra cost.

jSh3ll> delete key-name
Deleted ‘bucket-name/key-name
jSh3ll> deletebucket
Deleted bucket ‘bucket-name

To verify this you can check the URL again and make sure that you get back an error message.

Getting Started With AWS

September 29th, 2006 by morgan

These are my notes on getting started with Amazon Web Services, primarily for use with S3, SQS, and EC2. Hopefully these can help other people on a similar path.

Signing Up

OK, the first thing to do in order to get started with Amazon Web Services (AWS) is to create an account. This sounds trivial, and it pretty much is. However, there are a couple of things you should know:

  1. Your AWS account is not the same as your Amazon account. You will need to re-register if you want to use the services.
  2. You need to provide a way to bill you for your usage, most people will probably provide a credit card. You can change this later if you need to.
  3. The email address you provide isn’t all that important. You can change it later, and verification is done through certificates at the services level.

Access

Once you have signed up for your account, the next thing you need to do is to generate an Access Key ID. Your Access Key ID is a text string that you provide AWS when you are attempting to do operations, in S3, EC2, or whatever. It uniquely identifies you as you, in an unsecure way. You can always access this through your Web Services Account Information, usually an icon in the upper right hand corner of each AWS web page.

Because the Access Key ID not secure, AWS needs a way to know that it is really you. It would be really, really bad if someone stole got a hold of your AWS identity and used it for nefarious reasons (especially since it is tied to your credit card). So, you also get a Secret Access Key that allows you to be verified in a secure manner.

Security

The Secret Access Key is also a text string, but it is used differently than the one that is used differently than the Access Key ID. The Secret Access Key is used to sign requests to secure services to make sure that AWS knows that it is really you. Although it isn’t the same as a password, you can consider this the eqivalent of your PIN number for your ATM. It is very, very important that you keep this a secret. You can always access this through your Web Services Account Information, usually an icon in the upper right hand corner of each AWS web page.

As the documentation says:

IMPORTANT: Your Secret Access Key is a secret, and should be known only by you and AWS. You should never include your Secret Access Key in your requests to AWS. You should never e-mail your Secret Access Key to anyone. It is important to keep your Secret Access Key confidential to protect your account.

’nuff said.

In addition, AWS offers X.509 Certificates, which are required for some services. However, we don’t have to worry about them now to get started.

More Info

If you want to more about the Access Key IDs and Secret Access Keys, take a look at:

If you want more info about how to use AWS, take a look at:

Starting with Amazon EC2 and S3

September 29th, 2006 by morgan

Well, I just got into the beta for Amazon’s EC2. I am pretty excited about it, as I have had some ideas about ETL and information quality that might be a nice fit for this architecture. While it isn’t distributed computing per se, it offers some very interesting opportunities around scalability, especially in working with data transformation.

I will try to get someting up and running this weekend. I am going to keep careful notes, and will try to add a step-by-step guide to help others get started. There is ample documentation on the AWS site, but I would like to do my part to make things a bit easier for newbies.

Links

Also, all of my postings are available in the AWS category, which will be updated regularly.

Quote for the Week of 2006-09-30

September 29th, 2006 by morgan

It is not worth an intelligent man’s time to be in the majority. By definition, there are already enough people to do that.
G. H. Hardy (1877 - 1947)

Musings on Metadata and Compliance

September 29th, 2006 by morgan

Frank Dravis has a new post on metadata and its growing importance in the marketplace. Dravis wonders how metadata became a subject of interest for non-technical folk …

In years past, metadata was the domain of data architects. It helped them understand what data they had and how it related to the sources and operations from which it came and to which it went. At the first mention of metadata business users would roll their eyes and head for the conference room door. Surely metadata was the stuff of arcane IT discussions best had out of earshot of people driving and running the business.

Then metadata management progressed and someone had the silly idea of articulating the business value, the value to the business side of the house, for metadata. The value came from the resolution of an age old problem. A corporate manager is sitting in a conference room looking at their regular monthly sales report and it is different from what they expected based on anecdotal evidence from the field: the numbers are too low.

Personally, I think that this recent interest is driven by a few things:

  1. Regulation and the threat of real penalties for inaccuracies in reporting. People got interested enough to protect their own hides.
  2. The rise of ERP and BPM in the marketplace. If everything is in one place then metadata suddenly becomes a lot easier to manage.

Truthfully, I wonder how all of this is going to turn out. I know there are lots of people who want to sell metadata software, but in my experience it takes a lot of resource (time, effort, and expertise) to maintain a comprehensive metadata environment. The threat of jail time helps to keep people motivated enough to save their necks, but not enough to make something useful. Being locked into an ERP package can mean the same thing, only it is your data that is locked.

about


This is the about me section, you will prob. want to edit this. If you want to change the image you may do so by changing the avatar.jpg located in the NewZen images directory.

search

navigation

archives

categories