Sunday, June 15, 2008

Scaling the Beast (Time and Hardware)

These are interesting times. Computer power keeps increasing while at the same time falling in price. Bandwidth continues to expand. As a result of all this horsepower languages continue to proliferate. I have in the last few years spent a lot of time with ruby and python, which are two very powerful and yet in relative terms, slow running. The great thing is that it no longer matters.

But this is old news. Now it feels like we are heading for an even bigger shift. The virtualization and commoditization of data centers is in full swing. Companies are offering full hosting stacks as a full blown service. Check out AWS, Mosso, 3Terra, and App Engine for just a taste of what is coming. These offerings include not only pay as you go hosting and hardware services from a very small scale up (some even start at free!) but are also starting to offer powerful software services such as data (not just file) storage, message queueing, and content delivory.

Software components running on the grid, storing their data on the grid, being accessed from other components on the grid. All running on this virtualized platform. No OS patches, no external security concerns, brain dead deployment. Point and click (or invoke your batch script) and send your component to the sky, spin up instances on demand, and revel in an ever approaching complete abstraction from almost all of the scaling, cost, and security issues that plague web developers today (twitter anyone?).

Because of this up coming shift away from hardware instances and to a component or service based computing model, many language concerns are slowly becoming obsolete. As virtual machines continue to gain popularity with developers, OSes and compile time dependencies are slowly being replaced by platforms. These platforms behave almost the same regardless of the underlying system. They all have the ability to make and receive network calls to provide and consume services. As a result, more and more services are being accessed via the network using simple RESTful or other largely text based apis. This frees the burden on languages of writing, porting, and maintaining each service for every single language out there. As a result, standard libraries can shrink, and services grow.

CouchDB, Solar, SimpleDB, Google's Database API are all fine examples of this from a data perspective. The Stomp messaging protocal to talk to JMS services, XMPP for instant messaging, presence, and a host of other interesting applications. IMAP, POP3, SMTP and many other internet protocals that have been around for eons. All these protocals allow components to expose and consume services without having to worry about many of the complexities that have been associated with component based architectures. Not only does this let you only write one api that can be consumed by any language, but also skirts the memory management issues of making calls to libraries in process. Issues like who cleans up the stack? Who allocates and frees which memory? All thorny issues that for the most part simply get in the way and soak up brain cycles. As a secondary effect, it sets up each component to scale out. Many bad habits that developers can fall into just aren't available to a loosely based component architecture. There are firmer parameters that when followed allow for near infinite scaling (disregarding cost issues for the moment). These parameters encourage good programming practices (horizontal scaling, loosely coupled, separately versioned components)

As more and more functionality moves out of in process libraries and into network components, developers will be freed to make language decisions not on libraries but on the syntax of the language itself. This should be good for everyone. There is no one size fits all language, and developers all have different personalities. The ability to choose platform based on components rather than virtual machine, language, or OS is something I am personally really looking forward to.