A week ago I stretched my rudimentary knowledge of high school biology to come up with an analogy that, for me, shed some light on web services. Today, I am going to try an analogy from physics which I know even less about.
Most industrial processes produce byproducts. More and more, we see the output of these processes (even the waste) as a potential input for another process. An example of this is co-generation – generating energy from the waste heat of existing industrial processes. Like co-generation in an industrial process, many web services are driven by the data that is created as a byproduct of a user’s interaction with another service.
When you use a search engine to find mortgage deals, you reveal your intention as a byproduct of your search. That intention data becomes the input to a different system – the search engines advertising system. The intention data is a byproduct of the search, not the purpose of the search; it is data exhaust. When Amazon recommends a book, they are using the exhaust from your prior purchases and those of people who have purchased similar books to show you books you may like. People who tag links in Del.icio.us do it so they can find those sites again. As a byproduct, Del.icio.us has created a curated list of interesting things on the web that has become a powerful discovery tool for lots of people who have never tagged anything.
The analogy, however, seems to break down when you compare the potential efficiency of “co-generation” in physics and in web services. In physics, the potential of a co-generation system is limited by the Second Law of Thermodynamics which says (I think) that the available energy in a closed system can never increase. This means that no matter how efficient your co-generation system, you always lose the use of some of the energy in the system. Otherwise we would have perpetual motion machines. It appears, however, that the exhaust of a web service can be more useful than the service itself. Webmasters linked to other sites to make their sites more useful to their users. That took a certain amount of energy and provided a certain utility to their users. They did not intend to power a search engine. But then Google captured this exhaust and used it to organize the web.
So the extra point question is… does the apparent increasing utility of data violate the Second Law of Thermodynamics? Is data different, or are we fooled by the inherent efficiency of data storage, transmission, and manipulation so that we don’t even notice the incremental energy lost? Perhaps I have not accounted for all of the energy in the system (i.e. the users interaction with the data adds energy). This is more than an academic question. If data is somehow different, even if only the efficiency of its reuse is greater, then it will impact the way web services companies package, deliver and get compensated for the use of their data assets.