Main| Programming| Travel| Life| Audio Books| Movies| About Me
New Features of SQL Server 2005. A Developer's Perspective.
2009-06-03
I wrote this document about new features of MS SQL Server 2005 over 2 years ago, but it is still interesting. I am planning on writing one about MS SQL Server 2008.
This presentation is developer-centric, and is only supposed to give you a taste of new features of SQL Server 2005. I will concentrate on CLR and T-SQL. I am not going to write about Notification Services, Reporting Services, Analysis Services, Integration Services, Web Services, database encryption and many other new or enhanced features of SQL Server 2005. Don't worry though… there are still plenty of new features left. Service Broker should be actually a part of this presentation, because together with CLR integration and XML data type support it allows easy creation of high quality asynchronous applications.
The presentation is divided into 3 parts:
1. CLR Hosting
2. T-SQL Enhancements
3. Other Cool Stuff
My main source was Books On-line (BOL) and the following technical articles: XML Best Practices for Microsoft SQL Server 2005
http://msdn2.microsoft.com/en-us/library/ms345115.aspx
Introduction to XQuery in SQL Server 2005:
http://msdn2.microsoft.com/en-us/library/ms345122.aspx
Performance Optimizations for the XML Data Type in SQL Server 2005:
http://msdn2.microsoft.com/en-us/library/ms345118.aspx
SQL Server 2005 Row Versioning-Based Transaction Isolation:
http://msdn2.microsoft.com/en-us/library/ms345124.aspx
Building Reliable, Asynchronous Database Applications Using Service Broker:
http://msdn2.microsoft.com/en-us/library/ms345113.aspx
Using CLR Integration in SQL Server 2005:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sqlclrguidance.asp
Please note that the examples are not supposed to be production quality.
Here we go!
CLR Hosting
Why would you host CLR inside SQL Server?
You cannot do some things with T-SQL alone, and you can make some things run faster using CLR. If something can be done simply in T-SQL like a query or an update, do it in T-SQL. If it requires string manipulation, computations, or a cursor, do it in CLR. If in doubt, test both scenarios.
ADO.net Extensions
Microsoft.SqlServer.Server
Context Connection
using(SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); // Use the connection }
SqlContext
Use SqlContext to get access to SqlPipe, SqlTriggerContext and WindowsIdentity.
SqlPipe
Use SqlPipe to send data to the client.
Enabling CLR
Before you can load assemblies into SQL Server, you need to enable CLR:
EXEC sp_configure 'show advanced options', 1 EXEC sp_configure 'clr_enabled', 1 RECONFIGURE
Visual Studio 2005 has a template for SQL Server projects. Select New Project -> Visual C# -> Database -> SQL Server Project.
Then you can add template files for:
• User-Defined Function
• Aggregate
• Stored Procedure
• Trigger
• User-Defined Type
After you compile your assembly you must load it into SQL Server:
CREATE ASSEMBLY SqlClr FROM 'C:\development\SqlClr\SqlClr\bin\Debug\SqlClr.dll'
After the initial load, you can use the ALTER ASSEMBLY statement.
User-Defined Functions
For string manipulation, or other functions not present in T-SQL, using CLR functions instead of T-SQL can improve performance significantly. User-defined functions can be either scalar or table-valued. Here is an example of a table-valued UDF:
CREATE FUNCTION AllCultures() RETURNS TABLE (NativeName nvarchar(200), EnglishName nvarchar(200), LanguageTag nvarchar(20)) AS EXTERNAL NAME SqlClr.UserDefinedFunctions.GetCultures GO
The output:
SELECT * FROM AllCultures() NativeName EnglishName LanguageTag العربية Arabic Ar български Bulgarian Bg català Catalan Ca 中文(简体) Chinese (Simplified) zh-Hans
Here is the source code for SqlClr.UserDefinedFunctions.GetCultures:
using System; using System.Data; using System.Globalization; using System.Collections; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; public partial class UserDefinedFunctions { [SqlFunction(FillRowMethodName = "GetCulturesFillRow")] public static IEnumerable GetCultures() { return CultureInfo.GetCultures(CultureTypes.AllCultures); } public static void GetCulturesFillRow(Object source, out SqlChars nativeName, out SqlChars englishName, out SqlChars ietfLanguageTag) { CultureInfo _Culture = (CultureInfo)source; nativeName = new SqlChars (_Culture.NativeName); englishName = new SqlChars(_Culture.EnglishName); ietfLanguageTag = new SqlChars(_Culture.IetfLanguageTag); } }
An example of a scalar-valued function:
CREATE FUNCTION TableRowCount(@tableName sysname) RETURNS bigint AS EXTERNAL NAME SqlClr.UserDefinedFunctions.TableRowCount GO
Usage:
PRINT dbo.TableRowCount('Person.Address')
And the source code:
[SqlFunction(DataAccess = DataAccessKind.Read)] public static long TableRowCount(string tableName) { using (SqlConnection _Connection = new SqlConnection("context connection=true")) { _Connection.Open(); SqlCommand _Command = new SqlCommand("SELECT COUNT_BIG(*) FROM " + tableName + " WITH (nolock)", _Connection); return (long)_Command.ExecuteScalar(); } }
Aggregates
Custom aggregates are as fast as built in T-SQL aggregates like MAX(), SUM(), etc..
CREATE AGGREGATE Concatenate (@input nvarchar(4000)) RETURNS nvarchar(max) EXTERNAL NAME SqlClr.Concatenate GO SELECT Title, dbo.Concatenate(FirstName) AS [First Names] FROM Person.Contact GROUP BY Title
The output:
Title First Names -------- --------------------------------------------- Sr. José, Jésus, Anibal, José, Luis, Gustavo, Ciro, Humberto, Alvaro, Adrian, Ramón Sra. Janeth, Pilar, Janaina Barreiro Gambaro [...]
Here is the source code for SqlClr.Concatenate:
using System; using System.Text; using System.IO; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; [Serializable] [SqlUserDefinedAggregate( Format.UserDefined, //use clr serialization to serialize the intermediate result IsInvariantToNulls = true, //optimizer property IsInvariantToDuplicates = false, //optimizer property IsInvariantToOrder = false, //optimizer property MaxByteSize = 8000) //maximum size in bytes of persisted value ] public struct Concatenate : IBinarySerialize { private StringBuilder _IntermediateResult; private const int _MaxSize = 7994; // 8000 - 2 control bytes - 4 bytes for 2 UTF-16 characters = 7994 public void Init() { _IntermediateResult = new StringBuilder(); } public void Accumulate(SqlString value) { if (!value.IsNull && (_IntermediateResult.Length + value.GetUnicodeBytes().Length < _MaxSize)) { _IntermediateResult.Append(value.Value).Append(", "); } } public void Merge(Concatenate group) { if ((_IntermediateResult.Length + group._IntermediateResult.Length) < _MaxSize) { _IntermediateResult.Append(group._IntermediateResult); } } public SqlString Terminate() { string output = String.Empty; // Delete the trailing comma and space, if any if (_IntermediateResult != null && _IntermediateResult.Length > 1) { output = _IntermediateResult.ToString(0, _IntermediateResult.Length - 2); } return new SqlString(output); } public void Read(BinaryReader reader) { _IntermediateResult = new StringBuilder(reader.ReadString()); } public void Write(BinaryWriter writer) { writer.Write(_IntermediateResult.ToString()); } }
Stored Procedure
An example:
[..] using Microsoft.SqlServer.Server; public partial class StoredProcedures { [Microsoft.SqlServer.Server.SqlProcedure] public static void GetWorkOrders() { if (SqlContext.IsAvailable) { using (SqlConnection _SqlConnection = new SqlConnection("context connection=true")) { _SqlConnection.Open(); SqlCommand _SqlCommand = new SqlCommand("SELECT * FROM Production.WorkOrder", _SqlConnection); SqlContext.Pipe.ExecuteAndSend(_SqlCommand); } } } [Microsoft.SqlServer.Server.SqlProcedure] public static void GetWorkOrder(SqlInt32 workOrderId) { if (SqlContext.IsAvailable) { using (SqlConnection _SqlConnection = new SqlConnection("context connection=true")) { _SqlConnection.Open(); SqlCommand _SqlCommand = new SqlCommand("SELECT * FROM Production.WorkOrder WHERE WorkOrderID = @WorkOrderID", _SqlConnection); _SqlCommand.Parameters.AddWithValue("@WorkOrderID", workOrderId); SqlContext.Pipe.ExecuteAndSend(_SqlCommand); } } } }
Compile, update the assembly and create procedure in T-SQL:
ALTER ASSEMBLY SqlClr FROM 'C:\development\SqlClr\SqlClr\bin\Debug\SqlClr.dll' CREATE PROCEDURE GetWorkOrdersCLR AS EXTERNAL NAME SqlClr.StoredProcedures.GetWorkOrders CREATE PROCEDURE GetWorkOrderCLR (@workOrderId INT) AS EXTERNAL NAME SqlClr.StoredProcedures.GetWorkOrder
User-Defined Types (UDTs)
Maintenance of UDTs is difficult: to modify a user type, it has to be dropped. To be dropped, it cannot be used in any tables. There is an 8000 bytes limit to a serialized object size. UDTs are useful for small, not changing types like System.DateTime. User defined types may look like a good candidate for simplifying the application design by moving objects into the database, but currently the Xml data type may be better suited for that.
T-SQL Enhancements
Xml Data Type
Relational or XML Data Model?
If your data is highly structured with known schema, the relational model is likely to work best for data storage. On the other hand, if the structure is semi-structured or unstructured, or unknown, you have to give consideration to modeling such data.
XML is a good choice if you want a platform-independent model in order to ensure portability of the data by using structural and semantic markup. Additionally, it is an appropriate option if any of the following properties are satisfied:
Your data is sparse or you do not know the structure of the data, or the structure of your data may change significantly in the future.
Your data represents containment hierarchy, instead of references among entities, and may be recursive.
Order is inherent in your data.
You want to query into the data or update parts of it, based on its structure.
XML data can be untyped or typed. You need to provide a schema for typed data. Typed XML storage can be made significantly more compact than untyped XML, because the database will store, for example, a number instead of a string. Also, the query performance is better because there are fewer type conversions needed. Before you can create typed xml variables, parameters, or columns, you must first register the XML schema collection. The query-processing engine uses the schema for type checking and to optimize queries and data modification. Qualify your elements, to be able to query them later.
CREATE XML SCHEMA COLLECTION Workflow AS N'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="WF" targetNamespace="WF" elementFormDefault="qualified" attributeFormDefault="unqualified" > <xsd:element name="workprocess"> <xsd:complexType mixed="true"> <xsd:sequence minOccurs="1" maxOccurs="1"> <xsd:element name="openingStep"/> <xsd:element name="step" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="closingStep"/> </xsd:sequence> <xsd:attribute name="workprocessID" type="xsd:integer" use="required"/> </xsd:complexType> </xsd:element> </xsd:schema>' GO   SELECT * FROM sys.xml_schema_collections DECLARE @workflowExample XML (Workflow) SET @workflowExample = N'<workprocess xmlns="WF" workprocessID="1"> <openingStep/> <step>Hello World!</step> <step>Bye</step> <closingStep/> </workprocess> ' SELECT @workflowExample AS workflowExample --DROP XML SCHEMA COLLECTION Workflow CREATE TABLE WorkProcess (ID uniqueidentifier primary key, Definition XML (Workflow)) SET @workflowExample = N'<workprocess xmlns="WF" workprocessID="2"> <openingStep/> <step>Second work process</step> <closingStep/> </workprocess> ' INSERT INTO WorkProcess VALUES (newid(), @workflowExample)
The xml data type has the following methods: query, value, exist, modify, nodes. You call these methods by using the user-defined type method invocation syntax. Example of value :-):
WITH XMLNAMESPACES ('WF' AS wf) SELECT Definition.value( '(/wf:workprocess/@workprocessID)[1]', 'int' ) AS [Work Process ID] FROM WorkProcess
Example of exist:
WITH XMLNAMESPACES ('WF' AS wf) SELECT Definition FROM WorkProcess WHERE Definition.exist('/wf:workprocess[@workprocessID=2]') = 1
XQuery
Xml Query is based on the XPath query language. Functions include: contains, substring, last, position, avg, max, sum, data, etc.
WITH XMLNAMESPACES ('WF' AS wf) SELECT Definition.query( 'data(/wf:workprocess/@workprocessID)') AS [Work Process ID] FROM WorkProcess
FLWOR (For, Let, Where, Order by, Return)
Look at the following path expression:
doc("books.xml")/bookstore/book[price>30]/title
The expression above will select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30. The following FLWOR expression will select exactly the same as the path expression above:
for $x in doc("books.xml")/bookstore/book where $x/price>30 return $x/title
The result will be:
<title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>
With FLWOR you can sort the result:
for $x in doc("books.xml")/bookstore/book where $x/price>30 order by $x/title return $x/title
XML DML
The XML Data Modification Language (XML DML) is an extension of the XQuery language. The XML DML adds the following case-sensitive keywords to XQuery: insert, delete, replace value of, An example:
SET @workflowExample.modify(' insert <step>One more step after the 2nd</step> after (/workprocess/step)[2] ')
XML Indexing
The first index on the xml type column must be the primary XML index. Three types of secondary indexes are supported: PATH, VALUE, and PROPERTY. The primary XML index is a shredded and persisted representation of the XML BLOBs in the xml data type column. For each XML binary large object (BLOB) in the column, the index creates several rows of data. The number of rows in the index is approximately equal to the number of nodes in the XML binary large object.
Each row stores the following node information:
• Tag name such as an element or attribute name.
• Node value.
• Node type such as an element node, attribute node, or text node.
• Document order information, represented by an internal node identifier.
• Path from each node to the root of the XML tree. This column is searched for path expressions in the query.
• Primary key of the base table. The primary key of the base table is duplicated in the primary XML index for back join with the base table, and the maximum number of columns in primary key of the based table is limited to 15.
Enhanced Data Types
Use varchar(max), nvarchar(max), and varbinary(max) data types instead of text, ntext, and image data types which will be removed in a future version of SQL Server. The “max” data types can store up to 2^31-1 bytes (2GB). Use char when the sizes of the column data entries are consistent. Use varchar when the sizes of the column data entries vary considerably. Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes. The large-value data types do not have the limitations of text, ntext and image data types: they can be used in cursors, triggers, and string functions like SUBSTRING, CHARINDEX. The large-value data type columns can be updated partially using the new .WRITE clause in the UPDATE statement.
Common Table Expressions (CTEs)
CTEs:
• Make code more readable.
• Allow easier implementation of recurring processing.
WITH DirectReports (ManagerID, EmployeeID, Title, DepartmentID, [Level]) AS ( -- Anchor member definition SELECT e.ManagerID, e.EmployeeID, e.Title, edh.DepartmentID, 0 AS [Level] FROM HumanResources.Employee AS e INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.EmployeeID = edh.EmployeeID AND edh.EndDate IS NULL WHERE ManagerID IS NULL UNION ALL -- Recursive member definition SELECT e.ManagerID, e.EmployeeID, e.Title, edh.DepartmentID, [Level] + 1 FROM HumanResources.Employee AS e INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.EmployeeID = edh.EmployeeID AND edh.EndDate IS NULL INNER JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID ) -- Statement that executes the CTE SELECT ManagerID, EmployeeID, Title, DepartmentID, [Level] FROM DirectReports order by DepartmentID, [Level]
Result:
ManagerID EmployeeID Title DepartmentID Level ----------- ----------- -------------------------------------------------- ------------ ----------- 109 12 Vice President of Engineering 1 1 12 3 Engineering Manager 1 2 3 267 Senior Design Engineer 1 3 3 270 Design Engineer 1 3 […]
TRY…CATCH
Try… Catch in T-SQL does not catch everything but still can be better then checking @@error after every command. Make sure to review the list of errors which are not caught by TRY…CATCH
-- Create procedure to retrieve error information. CREATE PROCEDURE GetErrorInfo AS SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_SEVERITY() AS ErrorSeverity, ERROR_STATE() AS ErrorState, ERROR_PROCEDURE() AS ErrorProcedure, ERROR_LINE() AS ErrorLine, ERROR_MESSAGE() AS ErrorMessage; GO BEGIN TRY -- Generate divide-by-zero error. SELECT 1/0 END TRY BEGIN CATCH -- Execute the error retrieval routine. EXECUTE GetErrorInfo; END CATCH
Here is the result:
ErrorNumber ErrorSeverity ErrorState ErrorProcedure ErrorLine ErrorMessage ----------- ------------- ----------- -------------- ----------- -------------------- 8134 16 1 NULL 4 Divide by zero error
TOP (@variable)
Previously, we had to use dynamic SQL to retrieve the variable number of rows. TOP can be used not only in SELECT but also in UPDATE, DELETE and INSERT statements in SQL Server 2005. The most common use would be in SELECT. In other commands, rows selected for the operation are not guaranteed to be in any particular order even if ORDER BY is used.
DECLARE @pageSize AS INT SET @pageSize=20 SELECT TOP(@pageSize) * FROM HumanResources.Employee
PIVOT and UNPIVOT
PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where necessary on any remaining column values that are desired in the final output. UNPIVOT performs almost the reverse operation of PIVOT, by rotating columns into rows.
table_source PIVOT ( aggregate_function ( value_column ) FOR pivot_column IN ( <column_list> )
Specifies that the table_source is pivoted based on the pivot_column. table_source is a table or table expression.
The output is a table that contains all columns of the table_source except the pivot_column and value_column.
The columns of the table_source, except the pivot_column and value_column, are called the grouping columns of the pivot operator. PIVOT performs a grouping operation on the input table with regard to the grouping columns and returns one row for each group. Additionally, the output contains one column for each value specified in the column_list that appears in the pivot_column of the input_table.
SELECT ProductID, Color, ProductLine FROM Production.Product ProductID Color ProductLine ----------- --------------- ----------- 680 Black R 706 Red R 707 Red S 708 Black S 709 White M 710 White M [..] SELECT * FROM (SELECT ProductID, Color, ProductLine FROM Production.Product) AS productTable PIVOT ( COUNT(ProductID) FOR Color in ([Blue], [Red], [Black], [Silver], [Yellow]) ) AS pivotTable ProductLine Blue Red Black Silver Yellow ----------- ----------- ----------- ----------- ----------- ----------- NULL 0 0 7 12 0 M 0 0 42 30 0 R 0 37 31 0 14 S 4 1 11 1 4 T 22 0 2 0 18
ROW_NUMBER
One of the ranking functions. ROW_NUMBER() returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
SELECT ROW_NUMBER() OVER (PARTITION BY PostalCode ORDER BY City) AS RowNumber, AddressLine1, City, PostalCode FROM Person.Address RowNumber AddressLine1 City PostalCode -------------------- ------------------------------------------------------------ ------------------------------ --------------- […] 29 Werftstr 54 Leipzig 04139 30 Wertheimer Straße 899 Leipzig 04139 31 Zur Lindung 46 Leipzig 04139 1 Welt Platz 99 Eilenburg 04838 2 Wasserstr 63 Eilenburg 04838 3 Waldstr 9 Eilenburg 04838 […]
TABLESAMPLE
For selecting random row sets. It actually selects pages (8KB) not rows, so it is approximate and does not work well for small tables.
SELECT * FROM Production.Product TABLESAMPLE (10 PERCENT)
Other Cool Stuff
DDL Triggers
CREATE TRIGGER safety ON DATABASE FOR DROP_TABLE, ALTER_TABLE AS ROLLBACK ; RAISERROR('DROP_TABLE, ALTER_TABLE not allowed', 16, 1);
User-Schema Separation
A schema is a collection of database entities that form a single namespace. A namespace is a set in which every element has a unique name.
In SQL Server 2005, schemas exist independently of the database user that creates them. We can finally group database objects around business areas or applications for improved clarity.
We can have tables like: Workflow.Process, Workflow.WorkGroup, Equifax.Address, Experian.Address etc.
Row Versioning
Transactions running under snapshot isolation take an optimistic approach to data modification by not acquiring locks on data until the data is to be modified. The selection of rows to update is done without acquiring locks. When a data row meets the update criteria, the snapshot transaction verifies that the data row has not been modified by a concurrent transaction that committed after the snapshot transaction began. If the data row has been modified outside of the snapshot transaction, an update conflict occurs and the snapshot transaction is terminated. The update conflict is handled by the Database Engine and there is no way to disable the update conflict detection. The tempdb database must have enough space for the version store. When tempdb is full, update operations will stop generating versions and continue to succeed, but read operations might fail because a particular row version that is needed no longer exists.
Summary
SQL Server 2005 brings in a lot of new functionality. I hope that I got you interested in exploring it for yourself.
There are interesting design choices that we can make to ease development and maintenance of large applications. For example, we can now move away from T-SQL stored procedures in favor of managed stored procedures. Imagine that we have all stored procedures in one assembly. That one assembly is the only thing that needs to be deployed.
We can also move away from modeling everything as a relational data in favor of a more flexible and easier to manage Xml data types. We could design tables as containers for Xml data, with one primary key column, one Xml column in most cases. We would only need to deploy new Xml schemas. The number of tables needed could be reduced by a tenfold. We no longer would need to add or alter table columns in most cases.
There are even more design choices on the horizon with .NET Language Integrated Query (LINQ) and Entities projects.