2012 Second Presidential Debate WordCloud

We like to do some ad-hoc text analysis from time to time to break things up a bit and work with new tools and software. We’ve done some similar things with Twitter #hashtag text analysis titled Michigan Lean Startup Conf. Twitter Visualizations.

In the spirit of the upcoming election and debates, I thought it would be interesting to put out some something to summarize the words used by both of the candidates in the 2012 Second Presidential Debate on October 16, 2012. We grabbed the text from here. We’re not diving into anything overly complex here but it does put last night’s debate in a different context that we found interesting.

The way the graphic turned out is interesting: president, governor, jobs, thats people. 

Link to the WordCloud: http://solidlogic.com/wp-content/uploads/2012/10/wordcloud_debate_transcript.png

2012 Second Presidential Debate Word Cloud

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]

How to build a word cloud

The easiest way to build a word cloud is to use one of the great free online tools like Wordle to build the graphic. If you need a more customized approach or need to create something like this in software, you can use several software tools to make it a lot easier. More details to come on the methods and code behind this later on but its based on Python and R, both of which we use quite a bit for data analysis and development projects.   The code for this was created by myself and our CIO, Michael Bommarito. Its based on some of the work he’s previously made available here: Wordcloud of the Arizona et al. v. United States opinion and Archiving Tweets with Python.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

To get customized analysis like this, or to ask us anything else please use the contact us.

 

Event: AWS Michigan Meetup (Presenting) – 10/09/12

Legal Informatics w/ CloudSearch & High-Performance Financial Market Apps

Solid Logic’s CEO, Eric Detterman, and CIO, Mike Bommarito, will be presenting at the <a title="AWS Michigan Meetup" href="http://www.awsmichigan doxycycline tablets for acne.org/events/85530922/”>AWS Michigan Meetup at Tech Brewery (map) in Ann Arbor, MI. I’ll be presenting on how we use Amazon Web Services (AWS) in the quantitative financial trading space with a case study and more.

Mike will be presenting on Legal Informatics using AWS CloudSearch. He will also be demonstrating an early prototype of an private enterprise information search and e-discovery application we’re creating. Mike also has a copy of his presentation available here.

Event Date: Tuesday, October 9th, 2012 @ 6:30pm

Event: AWS Michigan Meetup

More info: http://www.awsmichigan.org/events/85530922/

Below is a copy of my presentation so you can view at it at your convenience.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Hadoop – Latency = Google Dremel = Apache Drill???

Hadoop is one of the current IT buzzwords of the day and for good reason – it allows an organization to get meaning and actionable analysis out of “big data” that was previously unusable because it was too big (size constraints). This technology certainly solves a lot of problems but………

What happens if your problem doesn’t easily fit into the the Hadoop framework ?

Most of the work that we do in the financial sector falls into this category. It just doesn’t make sense to re-write existing code to fit into the Hadoop paradigm. Example case study here and blog post here.

As in any business, new ideas lose their ‘edge’ as they sit on the shelf or due to delays in the idea execution stage – primarily because of opportunity costs and increased chances of a competitor creating a product around the idea. The faster a concept can be brought to market, the larger the advantage to be had by the creator. This is especially true in the financial trading tech sector where advancements are measured in minutes/hours/days vs. weeks to months. Because of this, we’re always looking for new and creative ways to solve data and “big data” problems quicker.

Enter Apache Drill

One of the more interesting articles we came across recently focused on a new Apache project that aims to reduce the time to get answers out of a large data set. The project is named Apache Drill and here is a quick overview slide deck.

The Apache Drill project aims to create a tool similar to Google’s Dremel to facilitate faster queries across large datasets. Here is another take on the announcement from Wired. We’re excited about this because of the direct way this will impact our work and specifically the workloads that require real-time or near real-time answers.

Apache Drill Video Overview

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Website Security – Interesting 65Gbps DDos Case Study

The web can be a scary place with all sorts of website and internet security issues that may arise when you’re running a public facing site. Issues like cross site scripting, SQL injection, email harvesting, comment spam, DDos attacks and others, occur on a daily basis.

There are many different ways to combat these problems with varying levels of success. The is a huge industry of web security software and tools out there. The industry is changing rapidly due to changes in infrastructure because of cloud computing. The one approach that does not work well (and never has) is the set it and forget approach that many people use when they create a new site. Since website security is an on-going challenge, it’s best to use professional level services and stay up to date with everything. Unfortunately, some of these services can be quite pricey.

We take website security and performance very seriously and offer a range of services in these areas. We use multiple services and techniques to protect all of the public (and private) sites we create. One of the methods is a security service called CloudFlare which we will describe below and walk through a case study they published over the weekend.  Here is a quick overview of the service: CloudFlare is a quickly growing, venture capital backed web security and performance start-up.

CloudFlare presently serves over 65 BILLION (yes Billion, not Million) pageviews a month across their network of sites they support.

Here is some perspective on their size from a VentureBeat article: “We do more traffic than Amazon, Wikipedia, Twitter, Zynga, AOL, Apple, Bing, eBay, PayPal and Instagram combined,” chief executive Matthew Prince told VentureBeat. “We’re about half of a Facebook, and this month we’ll surpass Yahoo in terms of pageviews and unique visitors.”

They have a great list of features:

  • Managed Security-As-A-Service
  • Completely configurable web firewall
  • Collaborative security and threat identification – Hacker identification
  • Visitor reputation security checks
  • Block list, trust list
  • Advanced security – cross site scripting, SQL injection, comment spam, excessive bot crawling, email harvesters, denial of service
  • 20+ data centers across globe
  • First-level cache to reduce server load and bandwidth
  • Site-level optimizations to improve performance

Over the weekend they had some interesting events happen in their European data centers and wrote a couple blog posts about it. Linked here and summarized below:

What Constitutes a Big DDoS?

A 65Gbps DDoS is a big attack, easily in the top 5% of the biggest attacks we see. The graph below shows the volume of the attack hitting our EU data centers (the green line represents inbound traffic). When an attack is 65Gbps that means every second 65 Gigabits of data is sent to our network. That’s the equivalent data volume of watching 3,400 HD TV channels all at the same time. It’s a ton of data. Most network connections are measured in 100Mbps, 1Gbps or 10Gbps so attacks like this would quickly saturate even a large Internet connection.

 

To launch a 65Gbps attack, you’d need a botnet with at least 65,000 compromised machines each capable of sending 1Mbps of upstream data. Given that many of these compromised computers are in the developing world where connections are slower, and many of the machines that make up part of a botnet may not be online at any given time, the actual size of the botnet necessary to launch that attack would likely need to be at least 10x that size.

 

In terms of stopping these attacks, CloudFlare uses a number of techniques. It starts with our network architecture. We use Anycast which means the response from a resolver, while targeting one particular IP address, will hit whatever data center is closest. This inherently dilutes the impact of an attack, distributing its effects across all 23 of our data centers. Given the hundreds of gigs of capacity we have across our network, even a big attack rarely saturates a connection.

 

At each of our facilities we take additional steps to protect ourselves. We know, for example, that we haven’t sent any DNS inquiries out from our network. We can therefore safely filter the responses from DNS resolvers. We can therefore drop the response packets at our routers or, in some cases, even upstream at one of our bandwidth providers. The result is that these types of attacks are relatively easily mitigated.

 

What was fun to watch was that while the customer under attack was being targeted by 65Gbps of traffic, not a single packet from that attack made it to their network or affected their operations. In fact, CloudFlare stopped the entire attack without the customer even knowing there was a problem. From the network graph you can see after about 30 minutes the attacker gave up. We think that’s pretty cool and, as we continue to expand our network, we’ll get even more resilient to attacks like this one.

Link to original post: http://blog.cloudflare.com/65gbps-ddos-no-problem

The big takeaway for us is that we’re in a better spot by using CloudFlare. There are very few security software tools or services out there that would be able to handle this sort of attack, mitigate it and then describe it in such a short period of time.

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Software Development Life Cycle (SDLC) Case Study – Result = $440M Loss

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]

 

Software Development Life Cycle (SDLC)Importance

Solid Logic Technology’s foundershave experience across the financial industry and specifically in the development of quantative trading and investment systems. Many of the things we’ve learned along the way impact the way we develop software for clients across other industries. Most notably we’ve learned that software quality is extremely important and ‘software bugs’ cost lots of money. The study below shows how important in-depth software development, testing and launch management is for a company.

As early as August 1st, 2012 reports came out that Knight Capital Group,  a prominent electronic market-making firm specializing in NYSE equities, lost an estimated $440 million dollars due to a ‘software bug’. The news spread across financial news networks like Bloomberg, NY Times, CNBC and The Wall Street Journal.  Knight and other similar firms, trade US equities electronically using sophisticated computer algorithms with little to no human involvement in the process. While we will probably never hear the full story behind the ‘software bug’, it is suspected that a software coding error that was not quickly identified caused the loss. The loss is approximately 4 times their 2011 net income of $115m. It appears to have pretty much decimated the firm and at this point it looks like the firm will be bought or end up in bankruptcy.

While unfortunate, this example has some implications across any software project.

So what can we learn and take away from this incident?

  1. Software is not perfect, especially right after it is released
  2. A more comprehensive Software Development Life Cycle (SDLC) process and launch plan probably would have reduced the loss to a more reasonable amount.
  3. Always have a contingency plan for a new launch
  4. If new software is ‘acting funny’ then it probably has a problem and needs to be pulled from production and fixed
  5. When possible, conduct a series of small ‘pilots’ or ‘beta’ test along the way in a lower impact way
  6. If you cannot fully test the changes, then implement them slowly to minimize the potential errors in the beginning
  7. Have a ‘kill switch’ and know how to use it
  8. Have a formal SDLC process and follow it for all revisions
  9. Use source control for all software changes
  10. Have a defined launch process
  11. Have a way to quickly revert the changes implemented back to the previous version.

These are basic best practices that all software development firms should follow in order to consistently develop high quality software. Its unfortunate that there is a case study like this but these type of incidents are more common (but not to this scale) than most people imagine. I’m sure the group at Knight completed many of the above items, but something got away from them.

We put a huge amount of thought and effort into the process of software development and the consistent high level of quality that a solid process brings. We’re currently working on publishing a set of Software Development best practices – please contact us for a pre-release version.

More posts by Eric

[gravityform id=”4″ name=”Subscribe to our Blog” description=”false” ajax=”false”]

Share this: [wpsr_retweet] [wpsr_plusone][wpsr_linkedin]