Are Web Services Scale Free? ============================ <p align="right"><i>by Robert van Engelen</i></p> This article originally appeared in <> in June 16, 2005. Introduction ------------ Many man-made and naturally occurring phenomena, including city sizes, incomes, word frequencies, and earthquake magnitudes, are distributed according to a *power-law distribution*. A power-law implies that small occurrences are extremely common, whereas large instances are extremely rare. For example, there are few large earthquakes but many small ones. This regularity or "law" is sometimes also referred to as *Zipf* and *Pareto*. Power-law distributions have been observed for the degree of connectivity of nodes on the Web [[1]](#Faloutsos:99), the distribution of network messages, web-site and weblogs rankings [[2]](#Kottke:03), class relationships in object-oriented source code [[6]](#Wheeldon:03), and the number of references to run-time objects during the execution of Java programs [[4]](#Potanin:05) to name a few. The power law of natural phenomena and computer and network artifacts demonstrates a striking common property: scale-free geometry. Scale-free geometry is very different from the geometry of a graph in which links are randomly distributed among nodes. In a random graph geometry, nearly all the nodes have approximately the same number of links. Thus, every random graph has as its "typical scale" the average number of links per node. By contrast, the Web has no typical scale to its connectivity, a property closely related to fractals [[3]](#PhysicsWeb). I thought it would be interesting to see if XML Web services are also scale free. If the complexity of Web services is scale free, there is no typical Web service application size or complexity, which in turn might suggest that there are no limiting factors inhibiting the use of the technology in real-world environments for developing increasingly complex Web service systems. Thus, it is reasonable to say that the scale-free properties of Web service application size or complexity (not popularity!) are observed, the technology has outgrown the "toy examples" stage. Testing the Hypothesis ---------------------- To test the scale-free property of a reasonably large collection of representative Web service applications, I downloaded and installed the entire set of over 400 WSDL files from XMethods [[7]](#XMethods), a free repository of publicly advertised open Web services. Now, to determine power law distributions, we need to check if the number of occurrences <i>N</i><sub><i>k</i></sub> of some event of size <i>k</i> is proportional to <i>k</i> raised to some power. The easiest way to see a power law is to plot <i>N</i> versus <i>R</i> on logarithmic scales; if the distribution follows a power law, we would expect to see a straight line with slope <i>s</i>. One drawback is that it is not possible to directly compare the sizes of the WSDL files to rank them, because the XML-based layout of WSDL files can vary significantly. Some files are hand-made, others are automatically generated, and a few include large portions of documentation. A couple of WSDL files require importing other WSDL files and/or XML Schema files. To normalize the WSDLs, I used the <i>wsdl2h</i> translator of the gSOAP toolkit [[5]](#gSOAP) to convert each WSDL into a C++ header file format. The <i>wsdl2h</i> tool automatically imports any dependent WSDLs and Schemas and verifies the correctness of the content (about 10 WSDLs out of 408 were rejected due to content errors). I then counted and ranked the number of lines of source code in the output produced by <i>wsdl2h</i> for each WSDL. The results are shown in the figure below. The Web service rank according to application size in number of generated source code lines indeed follows a power law distribution. <div class="chart"><a href="images/powerlaw.png" data-lightbox="image-1"><img src="images/powerlaw.png"/></a></div> It appears that the world of Web services is indeed scale-free, just like the Web. The data conforms quite well to a power law curve. The R-squared value, a measure of how well the curve fits the data (1.0 is a perfect fit), is 0.94. By comparison, the study of the top 200 weblogs popularity rankings has an R-squared value of 0.95 [[2]](#Kottke:03). What is surprising about this result is that it usually takes several years for an emerging technology to mature and succeed to gain sufficient support before a power law distribution is observed. This is remarkable, given that Web services are perceived as more difficult to develop compared to "regular" programs, which does not appear to be the case. Complexity versus Size: Does it Matter? --------------------------------------- At this point you may have argued that the study merely ranked the normalized WSDL size and not application complexity. Complexity can be measured in many different ways, based on the frequencies of application's programming constructs, number of edges in subroutine call graphs, data structure graphs, and so on. Determining program complexity is a research topic in itself. However, you will see that when the application size follows the power law so does complexity. The relationship between application size <i>n</i> and complexity <i>m</i> is arguably nonlinear and can be of the form: <table border="0" width="100%"><tr><td> <table align="center" cellspacing="0" cellpadding="0"><tr><td nowrap="nowrap" align="center"> <i>m</i> = <i>c</i>&nbsp;<i>n</i><sup><i>a</i></sup> </td></tr></table> </td></tr></table> where <i>c</i> &gt; 0 and <i>a</i> <u>&gt;</u> 1. For example, when <i>a</i> = 2 the relationship is quadratic reflecting a tight coupling of all the components within an application. However, the particular values of <i>a</i> and <i>c</i> do not change the scale free property. By substituting <i>c</i>&nbsp;<i>n</i><sup><i>a</i></sup> for <i>x</i> in the power law formula <table border="0" width="100%"><tr><td> <table align="center" cellspacing="0" cellpadding="0"><tr><td nowrap="nowrap" align="center"> <i>y</i> = <i>a</i>&nbsp;<i>x</i><sup><i>s</i></sup> </td></tr></table> </td></tr></table> and by taking logarithms on both sides <table border="0" width="100%"><tr><td> <table align="center" cellspacing="0" cellpadding="0"><tr><td nowrap="nowrap" align="center"> log <i>y</i> = log <i>a</i> + <i>s</i> log(<i>c</i>&nbsp;<i>n</i><sup><i>a</i></sup>) </td></tr></table> </td></tr></table> we have <table border="0" width="100%"><tr><td> <table align="center" cellspacing="0" cellpadding="0"><tr><td nowrap="nowrap" align="center"> log <i>y</i> = log <i>a</i> + log <i>c</i> + <i>a</i> <i>s</i> log <i>n</i> </td></tr></table> </td></tr></table> which is a power law distribution with slope <i>a</i> <i>s</i>. Thus, the value of <i>a</i> merely affects the slope of the line on the log-log scale (the value of <i>c</i> affects the position of the line with respect to the <i>y</i> axis). Therefore, if the application's code size follows the power law so does its complexity. References ---------- <a href="#CITEFaloutsos:99" name="Faloutsos:99">[1]</a> M.&nbsp;Faloutsos, P.&nbsp;Faloutsos, and C.&nbsp;Faloutsos. On power-law relationships of the internet topology. In Computer Communication Review, 29(4), 1999. <a href="#CITEKottke:03" name="Kottke:03">[2]</a> J.&nbsp;Kottke. Weblogs and power laws, February 2003. Available from <a href="#CITEPhysicsWeb" name="PhysicsWeb">[3]</a> PhysicsWeb. The physics of the Web, July 2001. Available from <a href="#CITEPotanin:05" name="Potanin:05">[4]</a> Alex Potanin, James Noble, Marcus Frean, and Robert Biddle. Scale-free geometry in OO programs. In Communications of the ACM, 48(5):99-103, 2005. <a href="#CITEgSOAP" name="gSOAP">[5]</a> Robert van Engelen. The gSOAP toolkit for C and C++ web services. <a href="#CITEWheeldon:03" name="Wheeldon:03">[6]</a> Richard Wheeldon and Steve Counsell. Power law distributions in class relationships. In Third IEEE International Workshop on Source Code Analysis and Manipulation, page 45, 2003. <a href="#CITEXMethods" name="XMethods">[7]</a> XMethods. XMethods service listings. Available from [![To top](images/go-up.png) To top](#)