XML-XSD Introduction
XML stands of extensible markup language. XML helps in defining your own markup language. For example we all know about HTML which is hyper text markup language. However HTML is standard and the markups are well defined. If we want to have a line break than we say <br/>, and being a standard all browsers understand it in the same way. HTML is limited to what has been defined by the standard. So to provide the ability to define one's own markup language XML was introduced. It took the world by storm and almost all the technologies, tools frameworks adopted it quickly. Before XML came into picture, people used to represent the data using CSV(comma separated files) or similar separators. However these plain text files were hard to read by human and also the it was not possible to validate the files for syntax. Also the lack of standards in defining resulted in lack of tools to handle them in a robust way.Let's write a simple XML and note that how the XML is easier to read and compare this if we have to represent the same information in a CSV file.
user.xml
<?xml version="1.0" encoding="UTF-8"?> <user startDate="04-04-2008"> <homeAddress country="India"> <houseNo>D-1</houseNo> <society>Akshay Park</society> <locality>Thergaon</locality> <city>Pune</city> <pin>411033</pin> </homeAddress> <officeAddress country="India"> <houseNo>10</houseNo> <society>Akshay Center</society> <locality>Thergaon</locality> <city>Pune</city> <pin>411033</pin> </officeAddress> <productBought> <product productNo="AAA123"> <name>YoYo</name> <quantity>4</quantity> <price>89</price> <comment>Green Colour</comment> </product> <product productNo="XYYZ123"> <name>Rocking Chair</name> <quantity>1</quantity> <price>3000</price> </product> </productBought> </user>
Let's look into some of the rules of XML syntax.
- XML Naming convention: Blanks space are not permitted in XML names. Names are case sensitive. <product> and <Product> are two different elements.A name must start with an alphabetical letter or an underscore.
- Prolog: The top of XML <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> is known as prolog. It is not necessary but a good practice.If present ther version is mandatory, other two are optional. encoding identifies the character set used to encode the data. standalone identifies whether the sources accesses external data sources.
- A XML contains only on root element. For example in the user.xml, the root element is <user>
- A comment is represented as <!- - this is a comment -->
Elements and Attributes
An element in XML might have more elements nested in it. Also it can have attributes.
<officeAddress country="India"> <houseNo>10</houseNo> <society>Akshay Center</society>
In the above case, country is an attribute and houseNo is an element. The choice has to be done whether you want to represent a data as an attribute or as element. Some rule of thumbs are:
- If multiple instances of child element is possible than it has to be represented as element.
- Usually the inherent property of an element is represented as attribute and the subelements are represented as element.Remember it's just a rule of thumb. Also attributes reduce the size of XML to some extent.
CDATA
CDATA section allows to mark a section of text as literal so that it will not be parsed for tags and symbols but will instead be considered just a string of characters.
<![CDATA[ <html> <p> </html> ]]>
Namespace
Namespaces are used to prevent naming collisions. If the namespace are not explicitly defined, the XML elements are considered to reside in a default namespace. Suppose we have two XML fragment
<Book> <Name>The Wonder that was India</Name> <Price>Basham</Price> </Book>
The other XML
<Book> <Train>Jhelum</Train> <Number>1028</Number> </Book>
Combining the two fragment will create conflict.
Namespace is declared using xmlns attribute
xmlns:lib="http://www.oyejava.com/lib" xmlns:trn="http://www.oyejava.com/trn"
Default namespace – No prefix to be used
xmlns=“http://www.crayom.com/lib”
Now conflict can be resolved using namespace
<lib:Book> <lib:Name>The Wonder that was India</lib:Name> <lib:Price>Basham</lib:Price> </lib:Book> <trn:Book> <trn:Train>Jhelum</trn:Train> <trn:Number>1028</trn:Number> </trn:Book>
Validation of XML
One of the big reason of popularity of XML is that the XML can be validated using tools. The validation can be done for:
- A valid XML document is well formed. The tags are closed properly in a balanced way.
- It conforms to XML specification.
- It conforms to the constraints defined in a schema definition.
- Also XML grammar can be defined using Schema definition language. There are two ways to define XML syntax which are as follows
- DTD (Document Type definition) - This has been almost gone out of fashion but you might still see the DTD being referred on XML.
- XSD (XML schema definition) - This is prevalent way of defining XML syntax. XSD itself is defined as an XML so they become amenable to tools who understand XML.
XSD
Let's write XSD for our user.xml.
user.xsd
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xml:lang="en"> User schema fo royejava.com Copyright 2008 oyejava.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="user" type="UserType"/> <xsd:complexType name="UserType"> <xsd:sequence> <xsd:element name="homeAddress" type="Address"/> <xsd:element name="officeAddress" type="Address"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="productBought" type="Products"/> </xsd:sequence> <xsd:attribute name="startDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="Address"> <xsd:sequence> <xsd:element name="houseNo" type="xsd:string"/> <xsd:element name="society" type="xsd:string"/> <xsd:element name="locality" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="pin" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="India"/> </xsd:complexType> <xsd:complexType name="Products"> <xsd:sequence> <xsd:element name="product" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="purchaseDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="productNo" type="productCode" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Product Code, a code for identifying products --> <xsd:simpleType name="productCode"> <xsd:restriction base="xsd:string"> <xsd:pattern value="A-ZA-ZA-Z0-90-90-9"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
Let's look into different definitions in XSD.
Elements
Element provide definition for the content of an XML data document. For example
<xsd:element name="user" type="UserType"/>
Element type can be primitive or complex.Primitive types are defined by XML schema specificaiton
- string
- binary
- boolean
- decimal
- double
- float
- uri
- timeInstant
- timeDuration
We can define new simple types also
The number of occurrence of an element can be constrained.
- minOccurs for minimum occurrence
- maxOccurs for maximum occurrence
- When both unspecified it default to 1.
<xsd:element name="product" minOccurs="1" maxOccurs="unbounded">
Complex types consists of other elements and attributes.
<xsd:complexType name="UserType"> <xsd:sequence> <xsd:element name="homeAddress" type="Address"/> <xsd:element name="officeAddress" type="Address"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="productBought" type="Products"/> </xsd:sequence> <xsd:attribute name="startDate" type="xsd:date"/> </xsd:complexType>
In the hierarchical structure of elements, the lowest level of elements is considered to be of simple type, the rest are all complex types. The nesting of elements could be very deep, schema does not impose any restrictions on this.
When the definition of an element is not to be reused than we can define it as a nameless or implicit type.
<xsd:complexType name="Products"> <xsd:sequence> <xsd:element name="product" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity"> … </xsd:sequence> <xsd:attribute name="productNo" type="productCode" use="required"/> </xsd:complexType>
Attributes
Attributes provide additional information to an XML data element.
<xsd:complexType name="Address"> <xsd:sequence> … </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="India"/> </xsd:complexType>
To make an attribute mandatory use the minOccurs option by setting it to 1. It defaults to 0 in the case of attributes. You can also use use
<xsd:attribute name="productNo" type="productCode" use="required"/>
Can use enumeration to restrict values
<xsd:attribute name="country" default=India "> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="India"></xsd:enumeration> <xsd:enumeration value="Nepal"></xsd:enumeration> </xsd:restriction> </xsd:simpleType> </xsd:attribute>
Inheritance of Complex Types
XML definitive provide two types of inheritance:
- Extension
- Restriction
Extension inherits the element and attributes of base type and add new ones.
<xsd:complexType name="ExtendedAddress"> <xsd:complexContent> <xsd:extension base="om:Address"> <xsd:sequence> <xsd:element name="state" type="xsd:string" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType>
Restriction uses only the elements which are listed.
<xsd:complexType name="RestrictedAddress"> <xsd:complexContent> <xsd:restriction base="om:Address"> <xsd:sequence> <xsd:element name="city" type="xsd:string" /> <xsd:element name="pin" type="xsd:decimal" /> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType>
Extension and restriction can be used in instance documents polymorphically
<homeAddress country="India“ xsi:type=“om:ExtendedAddress”> <houseNo>D-1</houseNo> …. </homeAddress>
Restriction:
<officeAddress country="India“ xsi:type=“RestrictedAddress”> <houseNo>A-10</houseNo> … </officeAddress>
Validation by parser will be done against the derived type
Abstract Type
An abstract type cannot be used directly in an instance document. A member of derived type must be use instead.
<xsd:complexType name="Address“ abstract="true"> <xsd:sequence>
Final
A complex type can be declared as final
<xsd:complexType name="Address“ final=“extension”> <xsd:sequence>
Possible values for final attribute are:
- restriction – prevents from deriving as restricted.
- extension – prevents from being extended.
- #all – The type cannot be derived at all.
Inheritance of Simple Type
We can fine tune validations by deriving simple types
<xsd:simpleType name="price"> <xsd:restriction base=“xsd:float"> <xsd:minInclusive value="0"/> <xsd:maxExclusive value="10000"/> </xsd:restriction> </xsd:simpleType>
Facets are defined to restrict the data.
Important one for float is:
- maxInclusive – inclusive upper bound
- maxExclusive – exclusive upper bound
- minInclusive – inclusive lower bound
- minExclusive – exclusive lower bound
- enumeration – set of allowed values
- pattern – format of values using regular expression.
Importing Schema
A schema may import types from other schemas, allowing more modular schema design and type reuse.
xmlns:abc="http://www.oyejava.com/ABC" .... <import namespace==“http://www.oyejava.com/ABC” scehmaLocation=“http://www.oyejava.com/ABC/abc.xsd”
Import mechanism enables to combine schema to create larger more complex schema.
To combine schemas with exactly the same targetnamespace
Schema 1: (Location - http://www.oyejava.com/XYZ/xyz_1.xsd)
<schema targetNamespace="http://www.oyejava.com/XYZ"
Include the first schema into second under same target namespace
Schema 2: (Location - http://www.oyejava.com/XYZ/xyz.xsd)
<schema targetNamespace ="http://www.oyejava.com/XYZ" … > <include schemaLocation="http://www.oyejava.com/XYZ/xyz_1.xsd">
Books
References
XML-XSD Java Home Home
Sidebar
Last wiki comments
- AOP: Thanks
- Lalit Bhatt: Superb Collection
- Lalit Bhatt: J2EE training
- Introduction to ORM: timberland shoes
- Introduction to ORM: jordan shoes
- Introduction to ORM: nike air max
- Pune Tourist Spots: KONARK PARK CLOSED
- jQuery Form Validations: Jquery Developer
- Spring Introduction: RG
- SOAP: Re: Assertion
Sidebar
Random Pages
- What markets work on?
- Why projects fail?
- Bharat Band - Jai ho
- The concept of Nation
- Don't hide complexity if it cannot be handled in a robustway
Last blog post comments
-
Bharat Band - Jai ho: How do we protest?
Wed 18 of Aug., 2010 13:13 IST
-
Divided by Destiny: Contact
Fri 23 of July, 2010 16:02 IST
-
Future of Java: thesis writing
Sat 17 of July, 2010 01:50 IST
-
Hang till Death Mr. Kasab: some change
Mon 28 of June, 2010 16:03 IST
-
God Religion : Why we are confused?: Re: Is GOD Necessary?
Tue 15 of June, 2010 17:29 IST
-
God Religion : Why we are confused?: Is GOD Necessary?
Tue 15 of June, 2010 13:06 IST
-
The reason in religion: good
Wed 10 of Mar., 2010 18:30 IST
-
The confusion of Design Patterns: I think at macro level you are right...
Tue 23 of Feb., 2010 03:31 IST
-
The Indian Municipality: Comment
Fri 22 of Jan., 2010 13:20 IST
-
What Government should do?: Re: Review of the Indian Law and Order and Justice Dispensation regime.
Fri 22 of Jan., 2010 13:16 IST
Post new comment