Brad Feld, the founding investor in Newsgator, Technorati and Feedburner - circa 2004/5 : similarly incubated, funded Gnip, the what he calls the plumbing of feeds, blogs, and social media. It’s really fascinating, and frankly very needed as RSS/Atom, API’s and the like are great, but what if there was a one stop shop for all data u wanted to automagically do stuff do, and u could focus on the magic not the plumbing.
Anyway, Brad linked to a Gnip post he encouraged his entrepreneur to make (thats a positive feedback loop if ever i’ve heard of one : VC encouraging entrepreneur to blog about what others might paranoidly consider proprietary, unique and closed) Via Feld.com Blog : “One of our investments in our Glue theme is Gnip. I had a meeting yesterday with Jud Valeski (the CTO / co-founder) and a few other folks, including the two founders of a new seed investment (in the Glue theme) we are planning on closing in early January.. As part of our discussion, I encouraged Jud to toss a blog post up talking about the stack Gnip is using, the architecture, and the volume of data that is currently flowing through Gnip on a daily basis.”
The Gnip technology stack they disclose :
- nginx - HTTP server, load balancing
- JRE 1.6 - Core logic, REST Interface
- TerraCotta - shared memory for clustering/redundancy
- ejabberd - inbound XMPP server
- Ruby - data importing, cluster management
- Python - data importing
Some of the numbers :
- 99.9%: the Gnip service has 99.9% up-time.
- 2.5 million unique activities are HTTP POSTed (pushed) into Gnip’s Publisher front door each day.
- 2.8 million activities are HTTP POSTed (pushed) out Gnip’s Consumer back door each day.
- 2.4 million activities are HTTP GETed (polled) from Gnip’s Consumer back door each day.
Their technical framework and approach is fascinating via their Numbers + Architecture post : “We optimized for activity retrieval (outbound) as opposed to delivery into Gnip (inbound). That means every outbound POST/GET, is moving static data off of disk; no math gets done. Every inbound activity results in processing to ensure proper Filtration and distribution; we do the “hard” work on delivery. We view our core system as handling ephemeral data. This has allowed us, thus far, to avoid having a database in the environment. That means we don’t have to deal with traditional database bottlenecks. To be sure, we have other challenges as a result, but we decided to take on those as opposed to have the “database maintenance and administration” ball and chain perpetually attached. So, in order to share contentious state across multiple VMs, across multiple machine instances, we use shared memory in the form of TerraCotta. I’d say TerraCotta is “easy” for “simple” apps, but challenges emerge when you start dealing with very large data sets in memory (multiple giga-bytes).”
Other Related Reading :
- gnip YouTube, Flickr, Analytics Publisher Platform
- Venture Beat on Friend Feeds “SUP” format
- ReadWriteWebBrowzmi’s Firefox Browser Based XMPP implementation to plug in your friends.












































