We know from the past 5 years of Gartner Magic Quadrants that AWS is a leader among IaaS vendors, placing the furthest for ‘completeness of vision’ and ‘ability to execute.’ AWS’ rapid pace of innovation contributes to its position as the leader in the space. The cloud provider releases hundreds of product and service updates every year. So, which of those are the most popular amongst our enterprise clients?
We analyzed data from our customers for the year, from a combined 100,000+ instances running monthly. The most popular AWS products and services, represented by the percentage of 2nd Watch customers utilizing them in 2016, include Amazon’s two core services for compute and storage – EC2 and S3 – and Amazon Data Transfer, each at 100% usage. Other high-ranking products include Simple Queue Service (SQS) for message queuing (84%) and Amazon Relational Database Service or RDS (72%). Usage for these services remains fairly consistent, and we would expect to see these services across most AWS deployments.
There are some relatively new AWS products and services that made the “most-popular” list for 2016 as well. AWS Lambda serverless computing (38%), Amazon WorkSpaces, a secure virtual desktop service (27%), and Kinesis, a real-time streaming data platform (12%), are quickly being adopted by AWS users and rising in popularity.
The fas-growing services in 2016, based on CAGR, include AWS CloudTrail (48%), Kinesis (30%), Config for resource inventory, configuration history, and change notifications (24%), Elasticsearch Service for real-time search and analytics (22%), Elastic MapReduce, a tool for big data processing and analysis, (20%) and Redshift, the data warehouse service alternative to systems from HP, Oracle and IBM (14%).
The accelerated use of these products demonstrates how quickly new cloud technologies are becoming the standard in today’s evolving market. Enterprises are moving away from legacy systems to cloud platforms for everything from back-end systems to business-critical, consumer-facing assets. We expect growth in each of these categories to continue as large organizations realize the benefits and ease of using these technologies.
Download the 30 Most Popular AWS Products infographic to find out which others are in high-demand.
-Jeff Aden, Co-Founder & EVP Business Development & Marketing
Check out our new AWS Scorecard for a look at what we’re seeing companies typically use for their cloud services. Taken from AWS usage trends among 2nd Watch customers for July-October, 2014.
Organizations using Amazon EC2 are typically broken down in the following percentages:
38% use Small instances
19% use Medium
15% use XLarge
The very large (which include 2XLarge, 4XLarge and 8XLarge), and the very small (Micro) account for only 27% collectively.
Among our customers:
94% use Amazon’s Simple Storage Service (S3)
66% use Amazon’s Simple Notification Service (SNS) for push messaging
41% use Amazon’s Relational Database Service (RDS) to set up, operate, and scale a relational database in the cloud.
Around three-quarters of customers run Linux instances, with the remaining using Windows. However Windows systems accounted for 31% of all computing hours, and more money is typically spent on Windows instances.
Last week, AWS announced their 42nd price reduction since 2008. This significant cut impacts many of their most popular services including EC2, S3, EMR, RDS and ElastiCache. These savings range from 10% to 65%, depending on the service you use. As you can see from the example below, this customer scenario results in savings of almost $150,000 annually, which represents a 36% savings on these services!!!
This major move not only helps existing AWS users but makes the value proposition of shifting from on-premise to the AWS cloud even greater. If you are not on AWS now, contact us to learn how we can help you take advantage of this new pricing and everything AWS has to offer.
As an AWS Premier Consulting Partner, our mission is to get you migrated to and running efficiently in the Amazon Web Services (AWS) cloud. The journey to get into the AWS cloud can be complicated, but we’ll guide you along the way and take it from there, so you can concentrate on running your business rather than your IT infrastructure.
2nd Watch provides:
Fast and Flawless enterprise grade cloud migration
Cloud IT Operations Management that goes beyond basic infrastructure management
Cloud cost/usage tracking and analytics that helps you control and allocate costs across your enterprise
The jump to the cloud can be a scary proposition. For an enterprise with systems deeply embedded in traditional infrastructure like back office computer rooms and datacenters the move to the cloud can be daunting. The thought of having all of your data in someone else’s hands can make some IT admins cringe. However, once you start looking into cloud technologies you start seeing some of the great benefits, especially with providers like Amazon Web Services (AWS). The cloud can be cost-effective, elastic and scalable, flexible, and secure. That same IT admin cringing at the thought of their data in someone else’s hands may finally realize that AWS is a bit more secure than a computer rack sitting under an employee’s desk in a remote office. Once the decision is finally made to “try out” the cloud, the planning phase can begin.
Most of the time the biggest question is, “How do we start with the cloud?” The answer is to use a phased approach. By picking applications and workloads that are less mission critical, you can try the newest cloud technologies with less risk. When deciding which workloads to move, you should ask yourself the following questions; Is there a business need for moving this workload to the cloud? Is the technology a natural fit for the cloud? What impact will this have on the business? If all those questions are suitably answered, your workloads will be successful in the cloud.
One great place to start is with archiving and backups. These types of workloads are important, but the data you’re dealing with is likely just a copy of data you already have, so it is considerably less risky. The easiest way to start with archives and backups is to try out S3 and Glacier. Many of today’s backup utilities you may already be using, like Symantec Netbackup and Veeam Backup & Replication, have cloud versions that can directly backup to AWS. This allows you to use start using the cloud without changing much of your embedded backup processes. By moving less critical workloads you are taking the first steps in increasing your cloud footprint.
Now that you have moved your backups to AWS using S3 and Glacier, what’s next? The next logical step would be to try some of the other services AWS offers. Another workload that can often be moved to the cloud is Disaster Recovery. DR is an area that will allow you to more AWS services like VPC, EC2, EBS, RDS, Route53 and ELBs. DR is a perfect way to increase your cloud footprint because it will allow you to construct your current environment, which you should already be very familiar with, in the cloud. A Pilot Light DR solution is one type of DR solution commonly seen in AWS. In the Pilot Light scenario the DR site has minimal systems and resources with the core elements already configured to enable rapid recovery once a disaster happens. To build a Pilot Light DR solution you would create the AWS network infrastructure (VPC), deploy the core AWS building blocks needed for the minimal Pilot Light configuration (EC2, EBS, RDS, and ELBs), and determine the process for recovery (Route53). When it is time for recovery all the other components can be quickly provisioned to give you a fully working environment. By moving DR to the cloud you’ve increased your cloud footprint even more and are on your way to cloud domination!
The next logical step is to move Test and Dev environments into the cloud. Here you can get creative with the way you use the AWS technologies. When building systems on AWS make sure to follow the Architecting Best Practices: Designing for failure means nothing will fail, decouple your components, take advantage of elasticity, build security into every layer, think parallel, and don’t fear constraints! Start with proof-of-concept (POC) to the development environment, and use AWS reference architecture to aid in the learning and planning process. Next your legacy application in the new environment and migrate data. The POC is not complete until you validate that it works and performance is to your expectations. Once you get to this point, you can reevaluate the build and optimize it to exact specifications needed. Finally, you’re one step closer to deploying actual production workloads to the cloud!
Production workloads are obviously the most important, but with the phased approach you’ve taken to increase your cloud footprint, it’s not that far of a jump from the other workloads you now have running in AWS. Some of the important things to remember to be successful with AWS include being aware of the rapid pace of the technology (this includes improved services and price drops), that security is your responsibility as well as Amazon’s, and that there isn’t a one-size-fits-all solution. Lastly, all workloads you implement in the cloud should still have stringent security and comprehensive monitoring as you would on any of your on-premises systems.
Overall, a phased approach is a great way to start using AWS. Start with simple services and traditional workloads that have a natural fit for AWS (e.g. backups and archiving). Next, start to explore other AWS services by building out environments that are familiar to you (e.g. DR). Finally, experiment with POCs and the entire gambit of AWS to benefit for more efficient production operations. Like many new technologies it takes time for adoption. By increasing your cloud footprint over time you can set expectations for cloud technologies in your enterprise and make it a more comfortable proposition for all.
One of the main differentiators between traditional on premise data centers and Cloud Computing through AWS is the speed at which businesses can scale their environment. So often in enterprise environments, IT and business struggle to have adequate capacity when they need it. Facilities run out of power and cooling, vendors cannot provide systems fast enough or the same type of system is not available, and business needs sometimes come without warning. AWS scales out to meet these demands in every area.
Compute capacity is expanded, often automatically with auto scaling groups, which add additional server instances as demands dictate. With auto scaling groups, demands on the environment cause more systems to come online. Even without auto scaling, systems can be cloned with Amazon Machine Images (AMIs) and started to meet capacity, expand to a new region/geography, or even be shared with a business partner to move collaboration forward.
Beyond compute capacity, storage capacity is a few mouse clicks (or less) away from business needs as well. Using Amazon S3, storage capacity is simply allocated as it is used dynamically. Customers do not need to do anything more than add content and storage, and that is far easier than adding disk arrays! With Elastic Block Storage (EBS), these are added as quickly as compute instances are. Storage can be added and attached to live instances or replicated across an environment as capacity is demanded.
Growth is great, and we’ve written a great deal about how to take advantage of the elastic nature of AWS before, but what about the second part of the title? Price! It’s no secret that as customers use more AWS resources, the price increases. The more you use, the more you pay; simple. The differentiators come into play with that same elastic nature; when demand drops, resources can be released and costs saved. Auto scaling can retire instances as easily as it adds them, storage can be removed when no longer needed, and with usage of resources, bills can actually shrink as you become more proficient in AWS. (Of course, 2ndWatch Managed Services can also help with that proficiency!) With traditional data centers, once resources are purchased, you pay the price (often a large one). With the Cloud, resources can be purchased as needed, at just a fraction of the price.
IT wins and business wins – enterprise level computing at its best!
Backup and disaster recovery often require solutions that add complexity and additional cost to properly synchronize your data and systems. Amazon Web Services™ (AWS) helps drive this cost and complexity with a number of services. Amazon S3 provides a highly durable (99.999999999%) storage platform for your backups. This service backs up your data to multiple availability zones (AZ) to provide you the ultimate peace of mind for your data. AWS also provides an ultra-low cost service for long-term cold storage that is aptly named Glacier. At $0.01 per GB / month this service will force you to ask, “Why am I not using AWS today?”
AWS has developed the AWS Storage Gateway to make your backups secure and efficient. For only $125 per backup location per month, you will have a robust solution that provides the following features:
Secure transfers of all data to AWS S3 storage
Compatible with your current architecture – there is no need to call up your local storage vendor for a special adapter or firmware version to use Storage Gateway
Designed for AWS – this provides a seamless integration of your current environment to AWS services
AWS Storage Gateway and Amazon EC2 (snapshots of machine images) together provide a simple cloud-hosted DR solution. Amazon EC2 allows you to quickly launch images of your production environment in AWS when you need them. The AWS Storage Gateway seamlessly orchestrates with S3 to provide you a robust backup and disaster recovery solution that meets anyone’s budget.
Not long ago, 2nd Watch published an article on Amazon Glacier. In it Caleb provides a great primer on the capabilities of Glacier and the cost benefits. Now that he’s taken the time to explain what it is, let’s talk about possible use cases for Glacier and how to avoid some of the pitfalls. As Amazon says, “Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.” What immediately comes to mind are backups, but most AWS customers do this through EBS snapshots, which can restore in minutes, while a Glacier recall can take hours. Rather than looking at the obvious, consider these use cases for Glacier Archival storage: compliance (regulatory or internal process), conversion of paper archives, and application retirement.
Compliance often forces organizations to retain records and backups for years, customers often mention a seven year retention policy based on regulatory compliance. In seven years, a traditional (on premise) server can be replaced at least once, operating systems are upgraded several times, applications have been upgraded or modified, and backup hardware/software has been changed. Add to that all the media that would need to be replaced/upgraded and you have every IT department’s nightmare – needing to either maintain old tape hardware or convert all the old backup tapes to the new hardware format (and hope too many haven’t degraded over the years). Glacier removes the need to worry about the hardware, the media, and the storage fees (currently 1¢ per GB/month in US-East) are tiny compared to the cost of media and storage on premise. Upload your backup file(s) to S3, setup a lifecycle policy, and you have greatly simplified your archival process while keeping regulatory compliance.
So how do customers create these lifecycle policies so their data automatically moves to Glacier? From the AWS Management Console, once you have an S3 bucket there is a Property called ‘Lifecycle’ that can manage the migration to Glacier (and possible deletion as well). Add a rule (or rules) to the S3 bucket that can migrate files based on a filename prefix, how long since their creation date, or how long from an effective date (perhaps 1 day from the current date for things you want to move directly to Glacier). For the example above, perhaps customers take backup files, move them to S3, then have them move to Glacier after 30 days and delete after 7 years.
Before we go too far and setup lifecycles, however, one major point should be highlighted: Amazon charges customers based on GB/month stored in Glacier and a one-time fee for each file moved from S3 to Glacier. Moving a terabyte of data from S3 to Glacier could cost little more than $10/month in storage fees, however, if that data is made up of 1k log files, the one-time fee for that migration can be more than $50,000! While this is an extreme example, consider data management before archiving. If at all possible, compress the files into a single file (zip/tar/rar), upload those compressed files to S3 and then archive to Glacier.
If you haven’t heard of Amazon Glacier, you need to check it out. As its name implies, you can think of Glacier as “frozen” storage. When considering the speed of EBS and S3, Glacier by comparison moves glacially slow. Consider Glacier as essentially a cloud-based archival solution that works similarly to old-style tape backup. In the past, backups first ran to tape, then were stored locally in case of immediate access requirements, and were then taken off-site once a certain date requirement was met (once a week, once a month, etc.). Glacier essentially works as the last stage of that process.
When a snapshot in S3, for instance, gets to be a month old, you can instruct AWS to automatically move that object to Glacier. Writing it to Glacier happens pretty much immediately, though being able to see that object on your Glacier management console can take between 3-5 hours. If you need it back, you’ll issue a request, but that can take up to 24 hours to be resolved. Amazon hasn’t released the exact mechanics of how they’re storing the data on their end, but large tape libraries are a good bet since they jive with one of Glacier’s best features: its price. That’s only $0.01 per gigabyte. Its second best feature is 11 nines worth of “durability” (which refers to data loss) and 4 nines worth of “reliability” (which refers to data availability). That’s 99.999999999% for those who like the visual.
Configuring Glacier, while a straightforward process, will require some technical savvy on your part. Amazon has done a nice job of representing how Glacier works in an illustration:
As you can see, the first step is to download the Glacier software development kit (SDK), which is available for Java or .NET. Once you’ve got that, you’ll need to create your vault. This is an easy step that starts with accessing your Glacier management console, selecting your service region (Glacier is automatically redundant across availability zones in your region, which is part of the reason for its high durability rating), naming your vault, and hitting the create button. I’m using the sandbox environment that comes with your AWS account to take these screen shots, so the region is pre-selected. In a live environment, this would be a drop-down menu providing you with region options.
The vault is where you’ll store your objects, which equate to a single file, like a document or a photo. But instead of proceeding directly to vault creation from the screen above, be sure and set up your vault’s Amazon Simple Notification Service (SNS) parameters.
Notifications can be created for a variety of operations and delivered to systems managers or applications using whatever protocol you need (HTML for a homegrown web control or email for your sys admin, for example). Once you create the vault from the notifications screen, you’re in your basic Glacier management console:
Uploading and downloading documents is where it gets technical. Currently, the web-based console above doesn’t have tools for managing archive operations like you’d find with S3. Uploading, downloading, deleting or any other operation will require programming in whichever language for which you’ve downloaded the SDK. You can use the AWS Identity and Access Management (IAM) service to attach user permissions to vaults and manage billing through your Account interface, but everything else happens at the code level. However, there are third-party Glacier consoles out there that can handle much of the development stuff in the background while presenting you with a much simpler management interface, such as CloudBerry Explorer 3.6. We’re not going to run through code samples here, but Amazon has plenty of resources for this off its Sample Code & Libraries site.
On the upside, while programming for Glacier operations is difficult for non-programmers, if you’ve got the skills, it provides a lot of flexibility in designing your own archive and backup processes. You can assign vaults to any of the various backup operations being run by your business and define your own archive schedules. Essentially, that means you can configure a hierarchical storage management (HSM) architecture that natively incorporates AWS.
For example, imagine a typical server farm running in EC2. At the first tier, it’s using EBS for immediate, current data transactions, similar to a hard disk or SAN LUN. When files in your EBS store have been unused for a period of time or if you’ve scheduled them to move at a recurring time (like with server snapshots), those files can be automatically moved to S3. Access between your EC2 servers and S3 isn’t quite as fast as EBS, but it’s still a nearline return on data requests. Once those files have lived on S3 for a time, you can give them a time to live (TTL) parameter after which they are automatically archived on Glacier. It’ll take some programming work, but unlike with standard on-premises archival solutions, which are usually based on a proprietary architecture, using Java or .NET means you can configure your storage management any way you like – for different geographic locations, different departments, different applications, or even different kinds of data.
And this kind of HSM design doesn’t have to be entirely cloud-based. Glacier works just as well with on-premises data, applications, or server management. There is no minimum or maximum amount of data you can archive with Glacier, though individual archives can’t be less than 1 byte or larger than 40 terabytes. To help you observe regulatory compliance issues, Glacier uses secure protocols for data transfer and encrypts all data on the server side using key management and 256-bit encryption.
Pricing is extremely low and simple to calculate. Data stored in Glacier is $0.01 per gigabyte. Upload and retrieval operations run only $0.05 per 1000 requests, and there is a pro-rated charge of $0.03 per gigabyte if you delete objects prior to 90 days of storage. Like everything else in AWS, Glacier is a powerful solution that provides highly customizable functionality for which you only pay for what you use. This service is definitely worth a very close look.
For quite some time I’ve been meaning to tinker around with using Amazon S3 for a backup tool. Sure I’ve been using S3 backed Dropbox for years now and love it, and there are a multitude of other desktop client apps out there that do the same sort of thing with varying price points and feature sets (including Amazon’s own cloud drive). The primary reason I wanted to look into using something specific to S3 is because it is economical and very highly available and secure, but it also scales well in a more enterprise setting. It is just a logical and compelling choice if you are already running IAAS in AWS.
If you’re unfamiliar with rsync, it is a UNIX tool for copying files or sets of files with many cool features. Probably the most distinctive feature is that it does differential copying, which means that it will only copy files that have changed on the source. This means if you have a file set containing thousands of files that you want to sync between the source and the destination it will only have to copy the files that have changed since the last copy/sync.
Being an engineer my initial thought was, “Hey, why not just write a little python program using the boto AWS API libs and librsync to do it?”, but I am also kind of lazy, and I know I’m not that forward-thinking, so I figured someone has probably already done this. I consulted the Google machine and sure enough… 20 seconds later I had discovered Duplicity (http://duplicity.nongnu.org/). Duplicity is an open source GPL python based application that does exactly what I was aiming for – it allows you to rsync file to an S3 bucket. In fact, it even has some additional functionality like encryption and passwords protecting the data.
A little background info on AWS storage/backups
Tying in to my earlier point about wanting to use S3 for EC2 Linux instances, traditional Linux AWS EC2 instance backups are achieved using EBS snapshots. This can work fairly well but has a number of limitations and potential pitfalls/shortcomings.
Here is a list of advantages and disadvantages of using EBS snapshots for Linux EC2 instance backup purposes. In no way are these lists fully comprehensive:
Easily scriptable using API tools
Pre-backed functionality built into the AWS APIs and Management Console
Non-selective (requires backing up an entire EBS volume)
EBS is more expensive than S3
Backing up an entire EBS volume can be overkill for what you actually need backed up and result in a lot of extra cost for backing up non-essential data
Pitfalls with multiple EBS volume software RAID or LVM sets
Multiple EBS volume sets are difficult to snapshot synchronously
Using the snapshots for recovery requires significant work to reconstruct volume sets
No ability to capture only files that have changed since previous backup (ie rsync style backups)
Only works on EBS back instances
Compare that to a list of advantages/disadvantages of using the S3/Duplicity solution:
Inexpensive (S3 is cheap)
Data security (redundancy and geographically distributed)
Works on any Linux system that has connectivity to S3
Should work on any UNIX style OS (includes Mac OSX) as well
Only copies the deltas in the files and not the entire file or file-set
Supports “Full” and “Incremental” backups
Data is compressed with gzip
FOSS (Free and Open Source Software)
Works independently of underlying storage type (SAN, Linux MD, LVM, NFS, etc.) or server type (EC2, Physical hardware, VMWare, etc.)
Relatively easy to set up and configure
Uses syntax that is congruent with rsync (e.g. –include, –exclude)
Can be restored anywhere, anytime, and on any system with S3 access and Duplicity installed
Slower than a snapshot, which is virtually instantaneous
Not ideal for backing up data sets with large deltas between backups
No out-of-the-box type of AWS API or Management Console integration (though this is not really necessary)
No “commercial” support
On to the important stuff! How to actually get this thing up and running
Things you’ll need:
The Duplicity application (should be installable via either yum, apt, or other pkg manager). Duplicity itself has numerous dependencies but the package management utility should handle all of that.
An Amazon AWS account
Your Amazon S3 Access Key ID
Your Amazon S3 Secret Access Key
A list of files/directories you want to back up
A globally unique name for an Amazon S3 bucket (the bucket will be created if it doesn’t yet exist)
If you want to encrypt the data:
A GPG key
The corresponding GPG key passphrase
Obtain/install the application (and its pre-requisites):
If you’re running a standard Linux distro you can most likely install it from a ‘yum’ or ‘apt’ repository (depending on distribution). Try something like “sudo yum install duplicity” or “sudo apt-get install duplicity”. If all else fails, (perhaps you are running some esoteric Linux distro like Gentoo?) you can always do it the old-fashioned way and download the tarball from the website and compile it (that is outside of the scope of this blog). “Use the source Luke.” If you are a Mac user you can also compile it and run it on Mac OSX (http://blog.oak-tree.us/index.php/2009/10/07/duplicity-mac), which I have not ed/verified actually works.
NOTE: On Fedora Core 18, Duplicity was already installed and worked right out of the box. On a Debian Wheezy box I had to apt-get install duplicity and python-boto. YMMV
Generate a GPG key if you don’t already have one:
If you need to create a GPG key use ‘gpg –gen-key’ to create a key with a passphrase. The default values supplied by ‘gpg’ are fine.
NOTE: record the GPG Key value that it generates because you will need that!
NOTE: keep a backup copy of your GPG key somewhere safe. Without it you won’t be able to decrypt your backups, and that could make restoration a bit difficult.
Run Duplicity backing up whatever files/directories you want saved on the cloud. I’d recommend reading the main page for a full rundown on all the options and syntax.
Overwhelmed or confused by all of this command line stuff? If so, Deja-dup might be helpful. It is a Gnome based GUI application that can perform the same functionality as Duplicity (turns out the two projects actually share a lot of code and are worked on by some of the same developers). Here is a handy guide on using Deja-dup for making Linux backups: (http://www.makeuseof.com/tag/dj-dup-perfect-linux-backup-software/)
This is pretty useful, and for $4 a month, or about the average price of a latte, you can store nearly 50GB compressed of de-duped backups in S3 (standard tier). For just a nickel you can get at least 526MB of S3 backup for a month. Well, that and the 5GB of S3 you get for free.
“We think Amazon has a three- to four-year headstart on product depth and pricing and a decade on global infrastructure,” said Jeff Aden, president of 2nd Watch, a Seattle-based systems integrator that has deployed 200 core production enterprise systems using AWS. “You’re talking potentially five to 10 years out until there’s a serious contender.” While most acknowledge AWS’ lead in the market, some might beg to differ with the challenge its rivals are facing in catching up.
I [Schwartz] asked Aden if he was exclusively tied to Amazon. He said he’s cloud-agnostic but AWS rivals have not been able to match the cost and level of infrastructure 2nd Watch requires to date. Aden said his company spends an extensive amount of time investigating alternatives, notably the newly expanded Windows Azure Infrastructure Services, as well as OpenStack-based services from HP, IBM and Rackspace.
“We continually on Windows Azure and look at it,” Aden said. “It’s great for the marketplace overall, because competition leads to better products, but there are certain things that we have to around security and being able to manage the services before we make recommendations on how to use it.”