In most (if not all) relational database managers, when we perform a query the results are presented in grid format (or tables), with the data organized in rows and columns. Some, such as SQL Server, still allow this result to be obtained in a different format, such as in tabular text.
Generally, the layout of the data in the form of a table is already sufficient to obtain the expected results, which are often limited to the superficial analysis of the filtered information. However, there are situations where it is necessary to visualize and export the results in a more structured format, that can be read by individuals with little technical knowledge, as well as that can be easily used by other applications, regardless of platform.
XML meets these requirements well and is currently one of the most widely used formats for data traffic between applications (including applications written in different languages and environments). To address this type of situation, SQL Server provides extensive support for XML data formatting, allowing query results to be easily organized, read, and prepared for export.
XML: A Quick Review
XML (eXtensible Markup Language) is a markup language whose main purpose is to share information in a format that allows you to represent the data in a hierarchical structure and can be read in a simple way.
Currently, most programming languages allow you to work with XML for data representation and, mainly, for the exchange of information.
This language is based on tags and information can be represented in the form of elements and attributes:
<clients> <client code="1" name="John"/> <client code="2" name="Mary"/> <client code="3" name="Suzan"/> </clients>
In this example, each tag (client and clients) represents an element, while the code and name are attributes.
XML Results in SQL Server
To format the results of a query as XML, you must use the FOR XML statement, after the SELECT, followed by the specific format you want. This format can take one of the following values:
- RAW
- AUTO
- EXPLICIT
- PATH
The formats mentioned indicate how the result will be organized since the XML standard is very flexible and allows the same sample of data to be displayed with different configurations.
RAW
The RAW representation mode causes each row of the query to be transformed into a generic element named row, whereas each column then represents an attribute of that element. Let’s see a practical example of using RAW in a sales order query:
SELECT Sales.ID AS Sale, SaleItems.Product_ID AS Product, SaleItems.Value, SaleItems.Quantity FROM SaleItems INNER JOIN Sales ON SaleItems.ID_Sale = Sales.ID FOR XML RAW
The result of this query is all the sold items represented by XML elements named as row, and the Sales, Product, Value, and Quantity columns represented as attributes:
<row Sale="1" Product="0101" Value="10" Quantity="2" /> <row Sale="1" Product="1111" Value="20" Quantity="1" /> <row Sale="1" Product="2233" Value="60" Quantity="1" /> <row Sale="2" Product="2233" Value="60" Quantity="2" /> <row Sale="2" Product="1111" Value="20" Quantity="1" /> <row Sale="2" Product="0101" Value="10" Quantity="6" /> <row Sale="3" Product="4444" Value="50" Quantity="1" /> <row Sale="4" Product="9988" Value="15" Quantity="2" />
For practical purposes, however, it will usually be necessary to rename the elements according to what they actually represent, rather than using the generic row nomenclature. To do this, simply indicate after RAW the name that the elements should have. In addition, it is also possible for columns to be represented as elements (children of the element representing the row), rather than attributes, simply using the ELEMENTS directive. Take a look at the previous example changed to use these options:
SELECT Sales.ID AS Sale, SaleItems.Product_ID AS Product, SaleItems.Value, SaleItems.Quantity FROM SaleItems INNER JOIN Sales ON SaleItems.ID_Sale = Sales.ID FOR XML RAW ('SaleItem'), ELEMENTS
And the respective result:
<SaleItem> <Sale>1</Sale> <Product>0101</Product> <Value>10</Value> <Quantity>2</Quantity> </SaleItem> <SaleItem> <Sale>1</Sale> <Product>1111</Product> <Value>20</Value> <Quantity>1</Quantity> </SaleItem> <SaleItem> <Sale>1</Sale> <Product>2233</Product> <Value>60</Value> <Quantity>1</Quantity> </SaleItem>
AUTO
AUTO returns the query result in the form of hierarchically nested elements, where each table involved is represented as an element and its columns as attributes (or child elements if we use the ELEMENTS directive at the end as well as RAW).
This format does not allow for complex customizations over the resulting XML format, so it is useful when you want to get a simple representation quickly while retaining the original characteristics of the tables and columns. Let us see an example of using AUTO:
SELECT Sales.ID AS Sale, SaleItems.Product_ID AS Product, SaleItems.Value, SaleItems.Quantity FROM SaleItems INNER JOIN Sales ON SaleItems.ID_Sale = Sales.ID FOR XML AUTO
And its result in sequence:
<Sales Sale="1"> <SaleItems Product="0101" Value="10" Quantity="2" /> <SaleItems Product="1111" Value="20" Quantity="1" /> <SaleItems Product="2233" Value="60" Quantity="1" /> </Sales> <Sales Sale="2"> <SaleItems Product="2233" Value="60" Quantity="2" /> <SaleItems Product="1111" Value="20" Quantity="1" /> <SaleItems Product="0101" Value="10" Quantity="6" /> </Sales> <Sales Sale="3"> <SaleItems Product="4444" Value="50" Quantity="1" /> </Sales> <Sales Sale="4"> <SaleItems Product="9988" Value="15" Quantity="2" /> </Sales>
Note that this mode gives us a more suitable result for the representation of the data in this example, since it presents the items as child elements of each sale.
EXPLICIT
EXPLICIT mode offers a more flexible, but more complex, way of forming query-driven XML. While the AUTO and RAW modes ensure the well-formed XML with a standard structure, EXPLICIT requires that the XML format be specified in the query, in order to obtain the expected result.
To get the desired structure, you need to create additional fields, name the elements/attributes and set their level in the hierarchy. Take a look at an example of use:
SELECT 1 AS Tag, NULL AS Parent, NULL AS [SoldItems!1!], NULL AS [Item!2!Sale], NULL AS [Item!2!Product], NULL AS [Item!2!Value], NULL AS [Item!2!Quantity] UNION ALL SELECT 2 AS Tag, 1 AS Parent, NULL, Sales.ID AS Code, SaleItems.Product_ID, SaleItems.Value, SaleItems.Quantity FROM SaleItems INNER JOIN Sales ON SaleItems.ID_Venda = Sales.ID FOR XML EXPLICIT
In this example, we create a SaleItems tag at the first level of the hierarchy, and we represent the items as child elements at the second level. The result can be seen in the following:
<SaleItems> <Item Sale="1" Product="0101" Value="10" Quantity="2" /> <Item Sale="1" Product="1111" Value="20" Quantity="1" /> <Item Sale="1" Product="2233" Value="60" Quantity="1" /> <Item Sale="2" Product="2233" Value="60" Quantity="2" /> <Item Sale="2" Product="1111" Value="20" Quantity="1" /> <Item Sale="2" Product="0101" Value="10" Quantity="6" /> <Item Sale="3" Product="4444" Value="50" Quantity="1" /> <Item Sale="4" Product="9988" Value="15" Quantity="2" /> </SaleItems>
PATH
PATH, like EXPLICIT, allows for more customization of the resulting XML structure, but with a simpler syntax, compared to EXPLICIT. With this mode you can easily represent data hierarchically, including subqueries, and rename elements and attributes using aliases.
Below, we have an example of a query using the PATH. Note that you can nest queries and reset the root element of the result. And compared to EXPLICIT mode, the syntax is much more intuitive:
SELECT Sales.ID AS '@Code', ( SELECT Product_ID AS '@Reference', Quantity AS '@Quantity', Value AS '@Value' FROM SaleItems WHERE Sale_ID = Sales.ID FOR XML PATH ('Product'), TYPE ) AS 'Items' FROM Sales FOR XML PATH ('Sale'), root('Sales')
In the subquery that returns the items of the sale we also apply the FOR XML PATH, so that the results are also represented as XML. The TYPE directive causes this subquery to be treated as a property of the outermost item and thus be displayed as XML, without which the results would be displayed as plain text.
After the PATH, we can name the elements (Product and Sale) as well as define the name of the root element, which in this case is the Sales tag. The result can be seen below:
<Sales> <Sale Code="1"> <Items> <Product Reference="0101" Quantity="2" Value="10" /> <Product Reference="1111" Quantity="1" Value="20" /> <Product Reference="2233" Quantity="1" Value="60" /> </Items> </Sale> <Sale Code="2"> <Items> <Product Reference="2233" Quantity="2" Value="60" /> <Product Reference="1111" Quantity="1" Value="20" /> <Product Reference="0101" Quantity="6" Value="10" /> </Items> </Sale> <Sale Code="3"> <Items> <Product Reference="4444" Quantity="1" Value="50" /> </Items> </Sale> <Sale Code="4"> <Items> <Product Reference="9988" Quantity="2" Value="15" /> </Items> </Sale> </Sales>
Conclusion
Using the FOR XML statement in queries can make it easier, for example, to export data to other applications. This feature can be used by both developers who want to implement certain functionalities in their applications, as well as by DBAs or anyone else responsible for querying and extracting information from a database.
In general, when exporting data to other applications, you should follow an XML formation scheme. Knowing the structure that should have the generated XML, just identify which of the modes is the most appropriate and make the appropriate customizations.