Cloud Computing and Cloud Infrastructure Myths

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]

Most common cloud computing questions

The most common question we hear about cloud computing is What is the cloud?. There are a lot of terms, vendor specific definitions and confusion about cloud infrastructure so we’ll first define cloud computing before moving on.

Solid Logic’s cloud computing definition: Instantly scalable, programmatically-controllable compute, storage, and networking resources. 

This definition is also commonly referred to as Infrastructure-As-A-Service (IaaS). Infrastructure-as-a-Service abstracts the physical aspects of IT Infrastructure and provides a set of application programming interfaces (APIs) to control all aspects of the infrastructure. It is very powerful and allows you to basically manage a data center from a development environment or software application.

Many of the people we speak to have never used Amazon Web Services (AWS)Rackspace Cloud or another IaaS cloud provider, for different reasons. We’ve used IaaS for everything from High-Performance Computing to Video Hosting to low-cost development/test or non-production infrastructure. Our experience serves as a guide on which workloads fit well within an IaaS structure and which ones do not. It also allows us to prescribe  a customized, phased approach to cloud integration that minimizes cost and business risk in different ways.

The next comment that normally comes up when speaking to people about cloud infrastructure is: “The cloud sounds great, I hear it saves a lot of money but its just too risky/insecure/complex for us.” 

Organizations that have not yet embraced IaaS or “the cloud”  in their business generally do so for similar reasons. Most of the reasons center around perceptions that may be outdated or untrue – it depends on their scenario.

In our experience, their reasons generally fall into one of the categories below:

  • Cloud Performance (CPU, Disk, Network, Bandwidth, etc.) – I heard cloud servers are slow. The disks are slow and unpredictable. 
  • Budgeting/cost modeling – How do I know or estimate what my costs will be?
  • Cloud Security – It can’t be secure. Its called ‘Public Cloud’. Can other people access my files or servers?
  • Cloud Reliability – Netflix went down so it’s not reliable. What do I do if it goes down?
  • Cloud Compliance – No way, can’t do it – I’m subject to ABC, DEF or XYZ compliance requirements
  • Cloud Audit requirements – No way, the auditors will never buy-in to this.
  • Employee training – How do I find people to manage this?
  • Steep learning curve – How do I get started? Its seems really complex.

Cloud misperceptions abound

As the saying goes, perception is reality, and there are also a lot of misconceptions that increase fear of the technology and prevent people from moving suitable workloads to the cloud.

Popular news sources perpetuate the myths about cloud computing. It seems that every time Amazon Web Services (AWS) (who is by far the largest cloud provider) has any sort of hiccup or downtime, reporters jump on the bandwagon that cloud infrastructure is useless and breaks too often. Here is a link to a Google news search for this: https://www.google.com/news?ncl=dvYSd5T83PVQigMPa1-2GMz-snaDM&q=aws+down&lr=English&hl=en

[wdca_ad id=”2619″ ]

How we’re addressing these concerns

We’re going to address each of these concerns by sharing much of what we’ve learned along the way. We hope to shed some light on what seems to be an increasingly complicated market with more and more terminology and complex jargon used every day.

  1. We’re working on a comprehensive cloud computing benchmarking report. The report will make an apples-to-apples comparison between cloud instance sizes and existing in-house infrastructure. It will use common benchmarking tests that anyone can replicate in their environment. It will allow organizations to make informed business decisions on whether or not they could benefit from integrating “the cloud” into their IT infrastructure and software development approach. Sign up here for a copy of the cloud computing benchmark report.
  2. We’re going to present some cost models and budgets for common scenarios. We’ll integrate both tangible and non-tangible costs and benefits that we’ve searched for but haven’t seen included anywhere else. Contact us for a cost model for a specific use case.
In all we’ll address each of the bullet points above in detail. Stay tuned…

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]

2012 Second Presidential Debate WordCloud

We like to do some ad-hoc text analysis from time to time to break things up a bit and work with new tools and software. We’ve done some similar things with Twitter #hashtag text analysis titled Michigan Lean Startup Conf. Twitter Visualizations.

In the spirit of the upcoming election and debates, I thought it would be interesting to put out some something to summarize the words used by both of the candidates in the 2012 Second Presidential Debate on October 16, 2012. We grabbed the text from here. We’re not diving into anything overly complex here but it does put last night’s debate in a different context that we found interesting.

The way the graphic turned out is interesting: president, governor, jobs, thats people. 

Link to the WordCloud: http://solidlogic.com/wp-content/uploads/2012/10/wordcloud_debate_transcript.png

2012 Second Presidential Debate Word Cloud

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]

How to build a word cloud

The easiest way to build a word cloud is to use one of the great free online tools like Wordle to build the graphic. If you need a more customized approach or need to create something like this in software, you can use several software tools to make it a lot easier. More details to come on the methods and code behind this later on but its based on Python and R, both of which we use quite a bit for data analysis and development projects.   The code for this was created by myself and our CIO, Michael Bommarito. Its based on some of the work he’s previously made available here: Wordcloud of the Arizona et al. v. United States opinion and Archiving Tweets with Python.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

To get customized analysis like this, or to ask us anything else please use the contact us.

 

Event: AWS Michigan Meetup (Presenting) – 10/09/12

Legal Informatics w/ CloudSearch & High-Performance Financial Market Apps

Solid Logic’s CEO, Eric Detterman, and CIO, Mike Bommarito, will be presenting at the <a title="AWS Michigan Meetup" href="http://www.awsmichigan doxycycline tablets for acne.org/events/85530922/”>AWS Michigan Meetup at Tech Brewery (map) in Ann Arbor, MI. I’ll be presenting on how we use Amazon Web Services (AWS) in the quantitative financial trading space with a case study and more.

Mike will be presenting on Legal Informatics using AWS CloudSearch. He will also be demonstrating an early prototype of an private enterprise information search and e-discovery application we’re creating. Mike also has a copy of his presentation available here.

Event Date: Tuesday, October 9th, 2012 @ 6:30pm

Event: AWS Michigan Meetup

More info: http://www.awsmichigan.org/events/85530922/

Below is a copy of my presentation so you can view at it at your convenience.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Hadoop – Latency = Google Dremel = Apache Drill???

Hadoop is one of the current IT buzzwords of the day and for good reason – it allows an organization to get meaning and actionable analysis out of “big data” that was previously unusable because it was too big (size constraints). This technology certainly solves a lot of problems but………

What happens if your problem doesn’t easily fit into the the Hadoop framework ?

Most of the work that we do in the financial sector falls into this category. It just doesn’t make sense to re-write existing code to fit into the Hadoop paradigm. Example case study here and blog post here.

As in any business, new ideas lose their ‘edge’ as they sit on the shelf or due to delays in the idea execution stage – primarily because of opportunity costs and increased chances of a competitor creating a product around the idea. The faster a concept can be brought to market, the larger the advantage to be had by the creator. This is especially true in the financial trading tech sector where advancements are measured in minutes/hours/days vs. weeks to months. Because of this, we’re always looking for new and creative ways to solve data and “big data” problems quicker.

Enter Apache Drill

One of the more interesting articles we came across recently focused on a new Apache project that aims to reduce the time to get answers out of a large data set. The project is named Apache Drill and here is a quick overview slide deck.

The Apache Drill project aims to create a tool similar to Google’s Dremel to facilitate faster queries across large datasets. Here is another take on the announcement from Wired. We’re excited about this because of the direct way this will impact our work and specifically the workloads that require real-time or near real-time answers.

Apache Drill Video Overview

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Website Security – Interesting 65Gbps DDos Case Study

The web can be a scary place with all sorts of website and internet security issues that may arise when you’re running a public facing site. Issues like cross site scripting, SQL injection, email harvesting, comment spam, DDos attacks and others, occur on a daily basis.

There are many different ways to combat these problems with varying levels of success. The is a huge industry of web security software and tools out there. The industry is changing rapidly due to changes in infrastructure because of cloud computing. The one approach that does not work well (and never has) is the set it and forget approach that many people use when they create a new site. Since website security is an on-going challenge, it’s best to use professional level services and stay up to date with everything. Unfortunately, some of these services can be quite pricey.

We take website security and performance very seriously and offer a range of services in these areas. We use multiple services and techniques to protect all of the public (and private) sites we create. One of the methods is a security service called CloudFlare which we will describe below and walk through a case study they published over the weekend.  Here is a quick overview of the service: CloudFlare is a quickly growing, venture capital backed web security and performance start-up.

CloudFlare presently serves over 65 BILLION (yes Billion, not Million) pageviews a month across their network of sites they support.

Here is some perspective on their size from a VentureBeat article: “We do more traffic than Amazon, Wikipedia, Twitter, Zynga, AOL, Apple, Bing, eBay, PayPal and Instagram combined,” chief executive Matthew Prince told VentureBeat. “We’re about half of a Facebook, and this month we’ll surpass Yahoo in terms of pageviews and unique visitors.”

They have a great list of features:

  • Managed Security-As-A-Service
  • Completely configurable web firewall
  • Collaborative security and threat identification – Hacker identification
  • Visitor reputation security checks
  • Block list, trust list
  • Advanced security – cross site scripting, SQL injection, comment spam, excessive bot crawling, email harvesters, denial of service
  • 20+ data centers across globe
  • First-level cache to reduce server load and bandwidth
  • Site-level optimizations to improve performance

Over the weekend they had some interesting events happen in their European data centers and wrote a couple blog posts about it. Linked here and summarized below:

What Constitutes a Big DDoS?

A 65Gbps DDoS is a big attack, easily in the top 5% of the biggest attacks we see. The graph below shows the volume of the attack hitting our EU data centers (the green line represents inbound traffic). When an attack is 65Gbps that means every second 65 Gigabits of data is sent to our network. That’s the equivalent data volume of watching 3,400 HD TV channels all at the same time. It’s a ton of data. Most network connections are measured in 100Mbps, 1Gbps or 10Gbps so attacks like this would quickly saturate even a large Internet connection.

 

To launch a 65Gbps attack, you’d need a botnet with at least 65,000 compromised machines each capable of sending 1Mbps of upstream data. Given that many of these compromised computers are in the developing world where connections are slower, and many of the machines that make up part of a botnet may not be online at any given time, the actual size of the botnet necessary to launch that attack would likely need to be at least 10x that size.

 

In terms of stopping these attacks, CloudFlare uses a number of techniques. It starts with our network architecture. We use Anycast which means the response from a resolver, while targeting one particular IP address, will hit whatever data center is closest. This inherently dilutes the impact of an attack, distributing its effects across all 23 of our data centers. Given the hundreds of gigs of capacity we have across our network, even a big attack rarely saturates a connection.

 

At each of our facilities we take additional steps to protect ourselves. We know, for example, that we haven’t sent any DNS inquiries out from our network. We can therefore safely filter the responses from DNS resolvers. We can therefore drop the response packets at our routers or, in some cases, even upstream at one of our bandwidth providers. The result is that these types of attacks are relatively easily mitigated.

 

What was fun to watch was that while the customer under attack was being targeted by 65Gbps of traffic, not a single packet from that attack made it to their network or affected their operations. In fact, CloudFlare stopped the entire attack without the customer even knowing there was a problem. From the network graph you can see after about 30 minutes the attacker gave up. We think that’s pretty cool and, as we continue to expand our network, we’ll get even more resilient to attacks like this one.

Link to original post: http://blog.cloudflare.com/65gbps-ddos-no-problem

The big takeaway for us is that we’re in a better spot by using CloudFlare. There are very few security software tools or services out there that would be able to handle this sort of attack, mitigate it and then describe it in such a short period of time.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]