Schema Slicing Methods to Reduce Development Costs of WSDL-Based Web Services ============================================================================= <p align="right"><i>by Dr. Robert van Engelen and Wei Zhang, July 9, 2018.<br>Genivia Research Labs</i></p> A version of this article appeared in the proceedings of the IEEE International Conference on Web Services, San Fransisco, July 6 2018. Web Services provide a standards-based open platform for integrating distributed service components. The development of large distributed XML Web Services is greatly simplified with XML data binding tools that automate XML parsing and serialization by binding XML to native data structures. This paper presents a schema slicing method to remove unused schema components from schemas, thereby significantly reducing the XML data binding code size of WSDL-based Web Services. Our results show that schema slicing applied to large Web Services, such as ONVIF, results in the removal of 70% of the schema components on average. Our method also obtains significant schema size reductions for several popular WSDL-based Web Services, such as eBay Web Services (10% reduction), PayPal Web Services (18% reduction), Microsoft Exchange Web Services (4% reduction), Amazon S3 Web Services (22% reduction) and ESRI ArcGIS Web Services (42% to 59% reduction). We implemented schema slicing in the popular gSOAP toolkit. 1. Introduction --------------- Web Services provide a standards-based open platform for integrating distributed service components. Web Services components developed with the aid of XML data binding tools can be large due to the inclusion of large XML schemas in WSDLs (Web Services Description Language). However, a quick investigation of WSDL-based Web Services reveals that significant portions of their schemas are actually not used. Take for example the ONVIF (Open Network Video Interface Forum) [10] Web Services, which import a common onvif.xsd schema. Each ONVIF WSDL defines a specific part of the ONVIF Web Services, but does not require the full onvif.xsd schema to reference the actual elements and types used by the specific ONVIF protocol messages. We found that on average 70% of the ONVIF schema components can be removed without affecting the functionality of the WSDL-specific ONVIF Web Services. Furthermore, the WSDLs published by several well-known vendors use a fraction of the imported schemas. We found that significant portions of the popular Amazon S3 (AWS-S3), Microsoft Exchange Web Services (EWS), PayPal Web Services, eBay Web Services and ESRI ArcGIS Web Services schemas are redundant. In this paper we present a new *schema slicing* method that automatically reduces a schema (or a set of schemas) referenced by one or more WSDLs to a minimum. Schema slicing effectively removes all schema components from schemas that have no use for the implementation of Web Services. Our schema slicing algorithm proceeds in three steps. First, a schema model and component dependence graph is constructed for all XSDs that are embedded in WSDLs and are imported. The dependence graph models the dependencies between XSD components and the WSDL dependencies on XSD components. This permits our slicing algorithm to perform an accurate in-depth analysis of the component relationships. Our model covers advanced XSD features, such as schema inclusions, overriding, and redefinitions, element substitutions, element and attribute groups which may form cyclic relationships via other groups. Second, a depth-first recursive acyclic traversal of the graph marks all schema components either as used or as unused. Third and finally, the schemas are sliced by removing unused components. The term *schema slicing* is borrowed from *program slicing* [16], [24], a method that reduces a program to a minimal form. Starting from a subset of a program's behavior, slicing produces a minimal program that still produces the original behavior. Likewise, schema slicing produces minimal WSDLs and XSDs that define optimized Web Services that still produce the original behavior. By comparison, *class slicing* introduced by Tip [17] reduces class hierarchies and removes unused class members by analyzing program code for references to derived and base class members. By contrast, our schema slicing method removes entire schema components and hierarchies when found to be unused by the operations of WSDL-based Web Services. A DTD schema reduction method is introduced by Duta et al. [4] based on a normalized schema constructed using Libkin's algorithm [2]. Their approach reduces the size of a DTD but does not remove schema components that are unused, which is addressed by our work. Silva [15] and Sahu et al. [14] propose methods to slice XML documents against DTDs. The result of slicing an XML document is a new XML document (a slice) composed by those parts of the original document satisfying some criterion (the slicing criterion) specified by the user. By contrast, our work slices schemas instead of XML documents and is fully automatic based on the XML messaging requirements defined by WSDL-based Web Services. The remainder of this paper is organized as follows. Section 2 presents the gSOAP toolkit in which we implemented the schema slicing method. The details of the schema slicing algorithm is presented in Section 3. The results of applying schema slicing to ONVIF and other large Web Services are presented in Section 4. Section 5 concludes with a summary of our results. 2. The gSOAP Toolkit -------------------- The gSOAP toolkit [18] is an open source Web Services development toolkit for C and C++ that maps WSDLs and schemas to C/C++ data types and vice versa, also known as *XML data bindings* [8] and *compiler-based schema-specific XML parsing* [3]. XML data bindings are typically based on the XSD schema standard approved by the W3C in 2001. The strength of gSOAP is high-performance XML serialization with XML validation [7], [19], [20], [22]. The gSOAP toolkit supports advanced XML protocols, covering the W3C standards for WSDL 1.1/2.0, SOAP and REST Web Services and OASIS standards for WS-Security, WS-Trust, WS-Discovery, WS-Addressing, WS-ReliableMessaging and WS-Policy, see for example [21], [23]. The gSOAP toolkit is extensively used in industry to develop small to large Web Services that are sometimes called *big'web services* [12]. The ONVIF consortium of companies lists gSOAP as one of the primary Web Services software development toolkits to develop ONVIF Web Services. The optimization of ONVIF Web Services and other large Web Services is the subject of this paper. 3. Schema Slicing ----------------- Our schema slicing method implemented in gSOAP removes unused parts of XSD schemas imported and/or embedded in WSDLs. ### 3.1 XML Data Bindings Manipulating, reading and writing XML is greatly simplified by the use of powerful XML data binding tools that automate the generation of XML serialization code through conversion of XSD files into XML serializers [8], [22]. In other words, an XML data binding binds each schema component to a data structure type defined in the target native programming language. Native data types can be serialized in XML while conforming to the XML schema. Essentially the type-safe property of statically-typed programming language makes for an effective XML validation vehicle, thus removing the need for XML validation post XML parsing. For example, an XML data binding for C++ converts each schema complex type to a C++ class with members representing the child elements and attributes. An XML data binding tool then auto-generates the XML (de)serialization code. Deserialization is performed with efficient XML pull parsing to populate a C or C++ data structure. This means that XML parsing is schema-specific and XML validation is efficiently integrated in the deserialization process [3], which ensures strong type-safe XML deserialization that increases the security and reliability of SOAP/REST XML Web Services communications. In this paper we will mostly focus on XML data bindings. The impact of schema slicing has a measurable impact on the code size produced by an XML data binding. When Web Services are primarily developed with XML data binding tools such as gSOAP, removing unused schema parts results in code size reductions of the auto-coded XML serializers. When XML data binding tools are not used in the development of Web Services, the Web Services development cost reduction obtained by schema slicing can be even more pronounced since software developers can focus their efforts on the reduced schemas to manually implement the SOAP/REST XML messaging in WSDL-based Web Services software. ### 3.2 Problem Definition The auto-generated XML (de)serialization code by an XML data binding tool may be excessive in size when many schema components are unused by the Web Service operations defined in WSDLs. But what exactly are "unused schema components"? Schema types and elements are clearly used when referenced by the WSDL Web Service operations. A Web Service could not function without them. All other schema components are unused when not directly or indirectly referenced by other schema components that are marked as used. These unused components are irrelevant to the functioning of a WSDL-based Web Service and can be removed. However, from a schema perspective, this simple dual categorization does not suffice, because the top-level root elements should always be considered as used. By the W3C standard the XSD schema root elements define the content of all XML documents described and validated by the schema and thus should not be removed. For clarity, we define the following three categories of *used*, *unused* and *orphaned* schema components: - *used* components are the schema components used in WSDLs and the components that are directly or indirectly referenced by other used schema components; - *unused* components are not directly or indirectly referenced by used schema components; - *orphaned* schema types are unused types that are not directly or indirectly referenced by any top-level root element or attribute in a schema. Consider for example the XSD and WSDL given in Fig. 1: [xml] <schema xmlns="" ...> <complexType name="OrphanedType">...</complexType> <complexType name="UnusedType">...</complexType> <complexType name="UsedType">...</complexType> <element name="UnusedElement" type="UnusedType"/> <element name="UsedElement" type="UsedType"/> </schema> <definitions xmlns="" ...> <message name="Message"> <part name="parameters" element="tns:UsedElement"/> </message> <portType name="Interface"> <operation name="Operation"> <input message="Message"/> <output message="Message"/> </operation> </portType> <binding name="Binding"> <soap:binding style="document" ...> <operation name="Operation"> <soap:operation soapAction="Action"> <input><soap:body use="literal"/></input> <output><soap:body use="literal"/></output> </operation> </binding> </definitions> Fig. 1. Example XSD and WSDL with used, unused and orphaned types and top-level root elements. The WSDL defines just one SOAP document/literal service operation based on the definition of just one SOAP message that declares that both the input and the output SOAP messages use the `UsedElement` XSD root element. The WSDL references the `UsedElement` defined in the XSD and this element in turn references `UsedType`, meaning that both are required XSD components. The XSD also defines a `OrphanedType` complex type that is not referenced by any of the XSDs and WSDLs of the Web Services and could therefore be removed. Furthermore, the top-level root `UnusedElement` element references `UnusedType`. However, the latter two schema components are not used by any of the XSDs and WSDLs of the Web Services and could therefore also be removed despite the fact that `UnusedElement` is a root element of XML documents described and validated by the XSD. From a WSDL-based Web Services optimization perspective we can remove all unused and orphaned schema components without invalidating the functioning of the Web Services. By contrast however, from an XSD schema validation perspective, which is more strict, we should only consider removing the orphaned schema types to prevent any failures to validate XML documents described by the schema. We will refer to these two slicing strategies as *WSDL slicing* and *XSD slicing*, respectively. WSDL slicing is more aggressive and may remove schema root elements, while XSD slicing retains all root elements. ### 3.3 Schema Slicing Algorithm A schema slicing method must be safe to use and not remove schema components that are (in)directly used by a WSDL in order to preserve the messaging behavior of the WSDL-based Web Services and to still permit the Web Services to perform correct and accurate XML validation. Any tiny error in the slicing algorithm will lead to disastrous errors in the resulting XML data binding. A slicing algorithm must terminate with a minimal slice of schema components such that all components that WSDL operations depend on are preserved. Termination requires care to avoid infinite recursion on schema component references that form cycles. Complex types can form cyclic dependencies via indirect references to other complex types, which must be properly handled by a slicing algorithm. Also, a group can be part of a cyclic dependence by referencing other groups that reference the group. To prevent the loss of document root elements and the loss of elements and attributes that can be used in place of `xs:any` and `xs:anyAttribute`, the WSDL slicing algorithm must mark all top-level root elements and attributes as used by default when slicing schemas. When `xs:anyType` is used by a schema then we have two choices for slicing: mark all `simpleType` and `complexType` components as used, which means that slicing is effectively disabled, or consider representing `xs:anyType` content as a DOM in a Web Services application without an XML data binding and without XML validation. This decision is left to the Web Services architects and developers. For example, WSDL slicing must also properly handle `xs:substitutionGroup`, because top-level root elements may otherwise end up not being marked as used and will be removed leading to an incorrect XML data binding with missing type definitions resulting in C/C++ compilation errors. Our algorithm implemented in gSOAP has three optimization levels, namely -O2 activates XSD slicing, -O3 activates XSD slicing and removes unused top-level root attributes, -O4 activates full WSDL slicing. First, a dependence graph is built that models the dependences between XSD components and the WSDLs used. Next, the dependence graph is traversed to perform the following operations to compute the transitive closure of the dependence graph starting at the roots: - 1) for each schema with an `xs:import`, add the referenced imported schemas to a dictionary associated with the schema, to facilitate the marking of schema components in the imported schemas as *used* when the imported components are *used* by components in the current schema; - 2) process `xs:include`, `xs:override`, `xs:redefine` by modifying the dependence graph of the schema that contains them, with modifications compliant to the XSD 1.0/1.1 semantics; - 3a) if -O2 is enabled: for all (imported) schemas to be sliced, mark the top-level root elements and attributes as *used* and mark all other schema components as *unused*; - 3b) if -O3 is enabled: for all (imported) schemas to be sliced, mark the top-level root elements as *used* and mark all other schema components as *unused*; - 3c) if -O4 is enabled: for all (imported) schemas to be sliced, mark all schema components as *unused*; - 4) for each local or top-level `xs:element`, `xs:attribute`, `xs:group`, `xs:attributeGroup` marked *used*, mark their local/referenced types as *used*; - 5) for each `xs:simpleType` and `xs:complexType` marked *used*, mark the `xs:extension` or `xs:restriction` base as *used* and mark all local components in the (nested) `xs:sequence`, `xs:all` and/or `xs:choice` as *used*; - 6) if -O4 is enabled: for each top-level `xs:element` with `xs:substitutionGroup`, mark the element as *used* if the (abstract) element of the `xs:substitutionGroup` is marked *used*; - 7a) for each WSDL 1.1 message part referencing a `xs:simpleType`, `xs:complexType` or `xs:element`, mark the type or element as *used*; - 7b) for each WSDL 2.0 operation input and/or output that references an `xs:element` in a schema, mark the referenced element as *used*; - 7c) for each WSDL 2.0 SOAP header and SOAP fault that references an `xs:element`, mark the referenced element as *used*; - 8) go to step 4) until no more used components have *unused* local components or referenced components. The transitive closure of the dependence graph is computed in steps 4-8, resulting in the separation of graph nodes into reachable, i.e. the components that are *used*, and unreachable, i.e. the components that are *unused* (and orphaned). 4. Results ---------- ONVIF (Open Network Video Interface Forum) [10] is a global and open industry forum with the goal of facilitating the development and use of a global open standard for the interface of physical IP-based security products. The standard defines communication protocols for IP products within video surveillance and other physical security areas. The ONVIF Core Specification standardizes the ONVIF network interface of network video products. It defines a network video communication framework based on relevant IETF and Web Services standards, including security and IP configuration requirements. More specifically, the Core Specification version 1.0 covers the areas of IP configuration, device discovery, Device management, Media configuration, Real time viewing, Event handling, PTZ camera control, Video analytics, and Security. Also two sets of WSDLs were combined and compiled together as combo 1 and combo 2, which are typical combinations of protocols documented by the ONVIF forum: <table class="doxtable"> <tr><th>ONVIF WSDL</th><th>WSDLs imported</th><th>#msgs</th><th>#comps</th></tr> <tr><td>devicemgmt.wsdl</td><td></td><td>180</td><td>806</td></tr> <tr><td>event.wsdl</td><td>bw-2.wsdl rw-2.wsdl</td><td>54</td><td>75</td></tr> <tr><td>display.wsdl</td><td></td><td>20</td><td>635</td></tr> <tr><td>deviceio.wsdl</td><td>devicemgmt.wsdl</td><td>238</td><td>869</td></tr> <tr><td>imaging.wsdl</td><td></td><td>22</td><td>639</td></tr> <tr><td>media.wsdl</td><td></td><td>164</td><td>778</td></tr> <tr><td>ptz.wsdl</td><td></td><td>56</td><td>671</td></tr> <tr><td>receiver.wsdl</td><td></td><td>16</td><td>631</td></tr> <tr><td>recording.wsdl</td><td></td><td>42</td><td>661</td></tr> <tr><td>search.wsdl</td><td></td><td>28</td><td>643</td></tr> <tr><td>remotediscovery.wsdl</td><td></td><td>6</td><td>n/a</td></tr> <tr><td>replay.wsdl</td><td></td><td>8</td><td>623</td></tr> <tr><td>analytics.wsdl</td><td></td><td>26</td><td>643</td></tr> <tr><td>analyticsdevice.wsdl</td><td></td><td>34</td><td>649</td></tr> <tr><td>combo 1 </td><td>devicemgmt.wsdl media.wsdl</td><td>344</td><td>971</td></tr> <tr><td>combo 2 </td><td>deviceio.wsdl display.wsdl receiver.wsdl recording.wsdl search.wsdl</td><td>344</td><td>952</td></tr> </table> The number of service messages and the number of schema components are listed in the rightmost columns of the table. The number of components are the schema top-level root elements, attributes and simple/complex types. Local components are not counted. We omitted remotediscovery.wsdl from our results because this WSDL is a wrapper for WS-Discovery. The WS-Discovery schema is integrated in the gSOAP library. Therefore, slicing this WSDL does not remove any WS-Discovery schema components. The results of slicing with wsdl2h option -O3 are omitted from the results reported in this section because the results obtained with this option were the same as the results obtained for option -O2. <div class="chart"><a href="images/figoptperc1.png" data-lightbox="image-2"><img alt="Fig. 2." src="images/figoptperc1.png"/></a></div> Fig. 2 above shows the percentage of unused schema components removed with XSD slicing (-O2) and WSDL slicing (-O4) applied to the ONVIF WSDLs. The schema reductions by slicing are significant and the methods are very effective for most ONVIF WSDLs except for event.wsdl which is a relatively small WSDL with 75 schema components compared to the average ONVIF WSDL size of 640 components. From the figure we observe that on average -O2 removed 55.5% of the schema components and -O4 removed 70.5% of the schema components. WSDL slicing with option -O4 results in the greatest reduction of schema components that are actually used to just 29.5% on average. We also applied slicing to seven popular Web Services WSDLs from the modest-size OPC Data Access Web Services [11] and Amazon S3 Web Services [1], to medium-size ESRI ArcGiS Web Services [6], to large-size PayPal Payments SOAP Web Services [13], Microsoft Exchange Web Services [9] and eBay SOAP Web Services [5] listed in the table below: <table class="doxtable"> <tr><th>ONVIF WSDL</th><th>vendor protocol</th><th>#msgs</th><th>#comps</th></tr> <tr><td>OPC_DA.wsdl</td><td>OPC Data Access 1.0 </td><td>16 </td><td>54</td></tr> <tr><td>AmazonS3.wsdl</td><td>Amazon S3</td><td>32</td><td>72</td></tr> <tr><td>MapServer.wsdl</td><td>ESRI ArcGIS MapServer</td><td>54</td><td>221</td></tr> <tr><td>GeodataServer.wsdl</td><td>ESRI ArcGIS GeodataServer</td><td>42</td><td>246</td></tr> <tr><td>PayPalSvc.wsdl</td><td>PayPal</td><td>115</td><td>519</td></tr> <tr><td>EWS.wsdl</td><td>Microsoft Exchange</td><td>120</td><td>611</td></tr> <tr><td>ebaySvc.wsdl</td><td>eBay V.1039</td><td>435</td><td>1260</td></tr> </table> The number of service messages and the number of schema components are listed in the rightmost columns of the table. The number of schema components are the schema top-level root elements, attributes and simple/complex types. Local components and top-level element and attribute groups are not counted in the column, since these are expanded by schema normalization performed internally by wsdl2h. For example, EWS.wsdl defines four element groups and one attribute group that are not separately counted in the table. <div class="chart"><a href="images/figoptpercall1.png" data-lightbox="image-2"><img alt="Fig. 3." src="images/figoptpercall1.png"/></a></div> Fig. 3 above shows the percentage of unused schema components removed with XSD slicing (-O2) and WSDL slicing (-O4) applied to the WSDLs in this table. From the figure we observe that the impact of the the WSDL slicing (-O4) on schema reductions vary from a low 3.6% reduction for EWS to as much as a 58.9% reduction for the ESRI ArcGIS GeodataServer. On average, the reduction obtained is 26% by WSDL slicing with -O4. The higher percentages for the ESRI ArcGIS services is due to the inclusion of common ESRI ArcGIS XSD types in all ESRI ArcGIS WSDLs, whether or not the XSD types are used or not by the service operations. Even more surprising are the results for the other Web Services in the figure (OPC_DA.wsdl, AmazonS3.wsdl PayPalSvc.wsdl, EWS.wsdl and ebaySvc.wsdl), because the service operations of these WSDLs use schemas that are designed to specifically define the WSDL operation parameter types. In other words, there is no technical reason for any of the unused components to be included in these schemas. 5. Conclusions -------------- This paper introduced a new schema slicing method to minimize the size of WSDL-based Web Services. Schema slicing effectively removes unused schema components to obtain a lean Web Services application executable footprint by implementing schema slicing in the popular gSOAP toolkit. Our results show that on average 70% of the ONVIF schema components are removable by slicing. This results in significant code size reductions of ONVIF Web services without affecting the functioning of these services. On average 26% of the schema components are removable from some of the most popular Web Services without affecting the client side and service side functioning, such as the eBay Web Services (10% reduction), PayPal Web Services (18% reduction), Microsoft Exchange Web Services (4% reduction), Amazon S3 Web Services (22% reduction) and ESRI ArcGIS Web Services MapServer and GeodataServer (42% and 59% reduction). References ---------- [1] Amazon. AWS-S3 web services. Accessed: 2018-02-05. [2] M. Arenas and L. Libkin. A normal form for XML documents. ACM Trans. Database Syst., 29(1):195–232, Mar. 2004. [3] K. Chiu and W. Lu. A compiler-based approach to schema-specific XML parsing. In The First International Workshop on High Performance XML Processing, 2004. [4] A. C. Duta, K. Barker, and R. Alhajj. RA: an XML schema reduction algorithm. In ADBIS 2006, Advances in Databases and Information Systems, Communications of the Tenth East-European Conference on Advances in Databases and Information Systems, Thessaloniki, Hellas, September 3-7, 2006, 2006. [5] eBay. Web services. Accessed: 2018-02- 05. [6] ESRI. ArcGIS web services. Accessed: 2018- 02-05. [7] M. R. Head, M. Govindaraju, R. Van Engelen, and W. Zhang. Benchmarking XML processors for applications in grid web services. In SC 2006 Conference, Proceedings of the ACM/IEEE, pages 30–30. IEEE, 2006. [8] B. McLaughlin. Java & XML data binding. O’Reilly Media, Inc., 2002. [9] Microsoft. Exchange web services. Accessed: 2018-02-05. [10] ONVIF. Open network video interface forum. Accessed: 2018-02-05. [11] OPC Foundation. Data access web services. Accessed: 2018-02-05. [12] C. Pautasso, O. Zimmermann, and F. Leymann. Restful webservices vs. big’web services: making the right architectural decision. In Proceedings of the 17th international conference on World Wide Web, pages 805–814. ACM, 2008. [13] PayPal. Payments web services. Accessed: 2018-02-05. [14] M. Sahu and D. P. Mohapatra. Slicing XML documents using dependence graph. In ICDCIT, volume 7753 of Lecture Notes in Computer Science, pages 444–454. Springer, 2013. [15] J. Silva. A program slicing based method to filter XML/DTD documents. In SOFSEM (1), volume 4362 of Lecture Notes in Computer Science, pages 771–782. Springer, 2007. [16] F. Tip. A survey of program slicing techniques. Technical report, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, The Netherlands, 1994. [17] F. Tip, J.-D. Choi, J. Field, and G. Ramalingam. Slicing class hierarchies in C++. ACM SIGPLAN Notices, 31(10):179–197, 1996. [18] R. A. van Engelen. The gSOAP toolkit 2.0., 2002. Accessed: 2018-02-05. [19] R. A. van Engelen. Pushing the SOAP envelope with web services for scientific computing. In IEEE International Conference on Web Services, pages 346–352. IEEE, 2003. [20] R. A. van Engelen. Constructing finite state automata for high performance web services. In IEEE International Conference on Web Services. Citeseer, 2004. [21] R. A. van Engelen. A framework for service-oriented computing with C and C++ web service components. ACM Transactions on Internet Technology (TOIT), 8(3):12, 2008. [22] R. A. van Engelen and K. A. Gallivan. The gSOAP toolkit for web services and peer-to-peer computing networks. In Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium on, pages 128–128. IEEE, 2002. [23] R. A. van Engelen and W. Zhang. An overview and evaluation of web services security performance optimizations. In IEEE International Conference on Web Services, pages 137–144. IEEE, 2008. [24] M. Weiser. Program slicing. In International Conference on Software Engineering, pages 439–449. IEEE Press, 1981. [![To top](images/go-up.png) To top](#) <p align="right"><i>Copyright (c) 2018, Robert van Engelen, Genivia Inc. All rights reserved.</i></p>