Archives for September 2012

Hadoop – Latency = Google Dremel = Apache Drill???

Hadoop is one of the current IT buzzwords of the day and for good reason – it allows an organization to get meaning and actionable analysis out of “big data” that was previously unusable because it was too big (size constraints). This technology certainly solves a lot of problems but………

What happens if your problem doesn’t easily fit into the the Hadoop framework ?

Most of the work that we do in the financial sector falls into this category. It just doesn’t make sense to re-write existing code to fit into the Hadoop paradigm. Example case study here and blog post here.

As in any business, new ideas lose their ‘edge’ as they sit on the shelf or due to delays in the idea execution stage – primarily because of opportunity costs and increased chances of a competitor creating a product around the idea. The faster a concept can be brought to market, the larger the advantage to be had by the creator. This is especially true in the financial trading tech sector where advancements are measured in minutes/hours/days vs. weeks to months. Because of this, we’re always looking for new and creative ways to solve data and “big data” problems quicker.

Enter Apache Drill

One of the more interesting articles we came across recently focused on a new Apache project that aims to reduce the time to get answers out of a large data set. The project is named Apache Drill and here is a quick overview slide deck.

The Apache Drill project aims to create a tool similar to Google’s Dremel to facilitate faster queries across large datasets. Here is another take on the announcement from Wired. We’re excited about this because of the direct way this will impact our work and specifically the workloads that require real-time or near real-time answers.

Apache Drill Video Overview

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Website Security – Interesting 65Gbps DDos Case Study

The web can be a scary place with all sorts of website and internet security issues that may arise when you’re running a public facing site. Issues like cross site scripting, SQL injection, email harvesting, comment spam, DDos attacks and others, occur on a daily basis.

There are many different ways to combat these problems with varying levels of success. The is a huge industry of web security software and tools out there. The industry is changing rapidly due to changes in infrastructure because of cloud computing. The one approach that does not work well (and never has) is the set it and forget approach that many people use when they create a new site. Since website security is an on-going challenge, it’s best to use professional level services and stay up to date with everything. Unfortunately, some of these services can be quite pricey.

We take website security and performance very seriously and offer a range of services in these areas. We use multiple services and techniques to protect all of the public (and private) sites we create. One of the methods is a security service called CloudFlare which we will describe below and walk through a case study they published over the weekend.  Here is a quick overview of the service: CloudFlare is a quickly growing, venture capital backed web security and performance start-up.

CloudFlare presently serves over 65 BILLION (yes Billion, not Million) pageviews a month across their network of sites they support.

Here is some perspective on their size from a VentureBeat article: “We do more traffic than Amazon, Wikipedia, Twitter, Zynga, AOL, Apple, Bing, eBay, PayPal and Instagram combined,” chief executive Matthew Prince told VentureBeat. “We’re about half of a Facebook, and this month we’ll surpass Yahoo in terms of pageviews and unique visitors.”

They have a great list of features:

  • Managed Security-As-A-Service
  • Completely configurable web firewall
  • Collaborative security and threat identification – Hacker identification
  • Visitor reputation security checks
  • Block list, trust list
  • Advanced security – cross site scripting, SQL injection, comment spam, excessive bot crawling, email harvesters, denial of service
  • 20+ data centers across globe
  • First-level cache to reduce server load and bandwidth
  • Site-level optimizations to improve performance

Over the weekend they had some interesting events happen in their European data centers and wrote a couple blog posts about it. Linked here and summarized below:

What Constitutes a Big DDoS?

A 65Gbps DDoS is a big attack, easily in the top 5% of the biggest attacks we see. The graph below shows the volume of the attack hitting our EU data centers (the green line represents inbound traffic). When an attack is 65Gbps that means every second 65 Gigabits of data is sent to our network. That’s the equivalent data volume of watching 3,400 HD TV channels all at the same time. It’s a ton of data. Most network connections are measured in 100Mbps, 1Gbps or 10Gbps so attacks like this would quickly saturate even a large Internet connection.

 

To launch a 65Gbps attack, you’d need a botnet with at least 65,000 compromised machines each capable of sending 1Mbps of upstream data. Given that many of these compromised computers are in the developing world where connections are slower, and many of the machines that make up part of a botnet may not be online at any given time, the actual size of the botnet necessary to launch that attack would likely need to be at least 10x that size.

 

In terms of stopping these attacks, CloudFlare uses a number of techniques. It starts with our network architecture. We use Anycast which means the response from a resolver, while targeting one particular IP address, will hit whatever data center is closest. This inherently dilutes the impact of an attack, distributing its effects across all 23 of our data centers. Given the hundreds of gigs of capacity we have across our network, even a big attack rarely saturates a connection.

 

At each of our facilities we take additional steps to protect ourselves. We know, for example, that we haven’t sent any DNS inquiries out from our network. We can therefore safely filter the responses from DNS resolvers. We can therefore drop the response packets at our routers or, in some cases, even upstream at one of our bandwidth providers. The result is that these types of attacks are relatively easily mitigated.

 

What was fun to watch was that while the customer under attack was being targeted by 65Gbps of traffic, not a single packet from that attack made it to their network or affected their operations. In fact, CloudFlare stopped the entire attack without the customer even knowing there was a problem. From the network graph you can see after about 30 minutes the attacker gave up. We think that’s pretty cool and, as we continue to expand our network, we’ll get even more resilient to attacks like this one.

Link to original post: http://blog.cloudflare.com/65gbps-ddos-no-problem

The big takeaway for us is that we’re in a better spot by using CloudFlare. There are very few security software tools or services out there that would be able to handle this sort of attack, mitigate it and then describe it in such a short period of time.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Techonomy Detroit on 9/12/2012

xexorwpniz