The Summit Blog

Syndicate content
The MakaluMedia Weblog
Updated: 18 hours 2 min ago

Towards the Semantic Web

Fri, 2008/03/07 - 11:58

Imagine a world where much of the data that’s flowing around the net takes on meaning, thereby becoming knowledge. Imagine a world where a system in Germany can infer that I’m the uncle of Wade, based on data from a system in the US indicating that my brother Page is the father of Wade. Imagine a world where an online product review from Steve is displayed for me ahead of the other 500 reviews, because of inferred trust derived from the knowledge that both Steve and I happen to share a common friend, you.

Although we’re still a long way off, recent events — in which MakaluMedia staff have played an important part — have brought us a few steps closer to such a world.

MakaluMedia hacker and researcher Arto Bendiken has long been interested in distributed systems, and “information about information”, and naturally developed an interest in the Resource Description Framework (RDF). RDF, in short, is a technology that allows the representation of data as “knowledge”. If two independent systems store their data in RDF, and share common semantic “vocabularies”, then the two systems can effectively share their “knowledge”. What does RDF look like? A simple example has been taken from this Quick Introduction to RDF.

@prefix : <http://www.example.org/> . :john a :Person . :john :hasMother :susan . :john :hasFather :richard . :richard :hasBrother :luke .

As it happens, our Drupal team, led by Arto, has been working for a few years on a project with our colleagues at M.C. Dean and Raincity Studios in the development of a sophisticated collaboration and communication platforms for the US government, based on the Drupal platform. (Certainly it represents one of the largest and most complex Drupal instances in the world.) This platform presently supports more than 60 international clients servicing use cases ranging from policy definition collaboration, to natural crisis management, to school operations in the African continent.

This project represents a natural fit for RDF technology, given the value realized in sharing “knowledge”, not just “data”, between the various instances of the platform, as well as the growing number of other RDF-enabled systems around the world. Towards this end, the project team has been working intensively during the past months to design, develop and begin to integrate an RDF storage, management and access framework into Drupal. And since a primary objective of this project is to release the developed products as freely available open-source software, much of this RDF work can be tracked and accessed from the Drupal RDF project page.

Given that Drupal forms the core technology of the platform , the project team naturally maintains a close relationship with the its founder and leader, Dries Buytaert. In guiding the evolution of the Drupal platform, Dries has always demonstrated a willingness to take bold steps in the direction of progress, and this has been evidenced once again this week. In his keynote speech at the Drupalcon Boston 2008 conference, Dries made the big announcement that the future versions of Drupal (beginning with version seven), will be based on RDF.

Drupal presently dominates the market of open-source content management systems, and so this announcement represents a huge step forward to the building of a truly “Semantic Web”. If interested, you can read various reactions from the blogosphere at Network World and SitePoint.

We are tremendously proud to have been a part of this progress, and look forward to continuing work towards a world of networked knowledge.

Catalog Choice registers half a million users!

Mon, 2008/01/28 - 18:47

Catalog Choice on the Today Show.

On January 24, Catalog Choice saw its biggest day yet, when it was covered in a fantastic piece on NBC’s “The Today Show”:

Over the course of the day, the catalogchoice.org website saw over two million page views, and registered 60,000 new user accounts, bringing the total number of registered users, three days later, to over 500,000!

In addition, “Catalog Choice” was the number one search term for the day on Google:

Untitled

Coping with the traffic.

Coping with a sudden increase in traffic, orders of magnitude more than typical, was a challenge. The front-end web application servers quickly became overloaded, and later the back-end DB server became overloaded (we were servicing over 2,000 DB queries per second!) Since it’s still not possible (with our hosting providers, at least) to bring on additional servers on-demand, we quickly made several modifications to the application:

  1. We made a number of layout modifications in the application that would allow us to cache content to a far greater extent.

  2. These same modifications also targeted the reduction of DB queries.

With these modifications, we were able to cope well with the secondary traffic surge.

Lessons learned.

It’s quite possible that Catalog Choice is now one of the largest Ruby on Rails applications running on the internet, in terms of number of users. Over the past few months of operation, we’ve learned some lessons:

  1. Although not related to Rails, we’ve learned that it’s a good idea, especially for a site with this broad of a user base, to be conservative on the use of client-side technology. When originally launched, we had implemented elegant page transitions, catalog finder live type-ahead, and other similar UI features — all done with JavaScript (AJAX) in a way that gave the site a desktop-application feel. We considered this acceptable practice, as we were designing for IE 6/7, Safari 2/3 and Firefox 2/3.

    However, when you have 500,000 users, even 1% on older browsers represents quite a large crowd! So we’ve since modified the site to work in a far more traditional manner, relying very little on client-side JavaScript, and where necessary, degrading very gracefully.

  2. For hosting, our infrastructure, like many these days, is based on virtual machines. We have N number of front-end web application servers, each practically maxed out in terms of CPU and memory. Based on the experience with the Today Show traffic, we’re thinking now that it might be better to have 2N front-end servers, each with half the CPU and memory, since it’s a lot easier to quickly add CPU and memory to an existing server (to meet demand), than it is to bring on additional VMs. (This is, assuming 2N front-end servers with half the memory are roughly comparable in cost to N servers with double the CPU and memory, which might not be the case.)

It has been a very exciting experience to watch the site grow, analyze the usage patterns, and adjust the application and its user interface to not only improve the usability and user experience, but to adapt to the changing user profiles (i.e. now that over 500,000 of our visitors are no longer first-timers, and that we have over 1,000 merchants in the system.)

How the site is doing.

When the site first launched, the consumer response was (and continues to be) nothing short of amazing. It is clear that this site is meeting a very big need in the United States; that is the reduction of unwanted paper catalogs. The industry’s response was, expectedly, lukewarm, especially after the Direct Marketers Association (the DMA) issued an email to all its members to “Just say no!” to Catalog Choice.

However, with half a million vocal consumers behind it, Catalog Choice has become an influential heavyweight. A website feature we launched last week alerts users to which specific merchants have refused to honor their opt-out requests, and provides the merchants customer support telephone number, just in case the consumer would like to give them a call. Within 24 hours, after being inundated with phone calls from angry customers, we had merchants changing their minds :-)

A misconception in the industry (promoted by the DMA) is that Catalog Choice seeks to do away with catalogs altogether. That couldn’t be further from the truth. Catalog Choice is about doing away with just those catalogs that are unsolicited and unwanted.

All in all, Catalog Choice has been a fantastic project for MakaluMedia. We’re fortunate to be one of very few companies having the opportunity to build and operate such a large-scale Rails site, and a site that serves such a meaningful social purpose!