Querying XML fields in LINQ to SQL

A while back I wrote about the limitations of LINQ to SQL when querying data in XML columns. It was frustrating but I hadn’t figured out an elegant solution at the time.

Then at then end of 2010 I attended a DDD8a session which re-opened my eyes to SQL CLR functions and how much better these were than SQL Scalar functions.

And today I linked the two problems in my head – the solution is right there! Write some SQLCLR functions which allow you to query an XML column at the server, rather than having to retrieve the full XML value and search it on the client.

Here is the VB.NET code I wrote as a first test to get the first element (or child element) of a given name and return the value.

    'Get value of an Xml column element (including child elements)
    <Microsoft.SqlServer.Server.SqlFunction()> _
    Public Shared Function XmlGetElementValue(ByVal column As SqlXml,
                                              ByVal elementName As SqlString) As SqlString
        'Check for nulls
        If column.IsNull Then Return SqlString.Null
        If elementName.IsNull Then Return SqlString.Null

        'Create an Xelement object from the XML
        Dim e As XElement = XElement.Parse(column.Value)
        'Try to find an element of given name
        Dim found = e.Descendants(elementName.Value).FirstOrDefault
        If found IsNot Nothing Then
            'return value
            Return New SqlString(found.Value)
            Return SqlString.Null
        End If
    End Function

9 thoughts on “Querying XML fields in LINQ to SQL

  1. I wonder if it would be possible to provide a bit more information.

    1) Could I see the LINQ to SQL statements on the client that reference using this function? Is it same premise as SQL Scalar Functions where you drop the function signature into the *.dbml Data Context and call the function?

    2) Do you have a snippet of the sql generated using these functions? For example if you queried 1000s of rows drilling into the Xml for >= 8 fields of varying types, ordered by one of those columns, and treat it as a ‘paged’ query where you use the LINQ Skip( pagesize * pagenumber ).Take( pagesize) functions?

    I ask this because I was directed to this post from this question (http://stackoverflow.com/questions/4972312/linq-to-sql-xml-field-problem-how-low-down-can-you-intercept-the-generated-ts) and I’m curious what the performance of this method vs SQL Scalar functions are.

    • I generated the method using a SQL-CLR project in Visual Studio and this code generates T-SQL functions which access the CLR methods. I then map these T-SQL functions into my LNQ to SQL datacontext in the normal way. You then call them from your code as if they were regular scalar functions, e.g.

      var db = new MyDataContext();
      var query = from x in db.MyTable where db.MyFunction(x.ColumnName) == "somevalue" select x;

      I have not done any performance tests but I will have a look at doing some. You should check out my other post about the performance of SQL scalar functions vs CLR functions !

  2. Great. Will go read it. If you could provide the generated SQL from a query similiar to yours but selecting multiple Xml fields and filtering and sorting on 1 Xml field, I’d appreciate it. I’m just concerned that any method of querying Xml fields without using the proper XQuery syntax will effectively eliminate the benefit of the indexes SQL creates.

  3. So I’ve got some interesting, and by interesting I mean bad, news. If I write a standard LINQ paging grid query like the following:

    var rows = from r in Table
                     where udfGetXpathValueString(r.Xml,'Profile','name-last').StartsWith("A") ||
                     orderby udfGetXpathValueString(r.Xml,'Profile','name-last').StartsWith("A"),
                     select new { r.Key, 
                                       udfGetXpathValueString(r.Xml,'Profile','name-first') };

    r.Xml is essentially an attributed Xml element ().

    The UDF is basically defined to return a value like:

    @xml.value( ‘(/*[local-name()=sql:variable(“@container”)]/@*[local-name()=sql:variable(“@field”)])[1]’, ‘nvarchar( 4000 )’ ).

    I then wrote a SQL Clr function ‘equivalent’ as:

    public static SqlString GetStringValueFromXml( SqlXml xml, string fieldName, string containerName )
    	if ( xml.IsNull || string.IsNullOrEmpty( fieldName ) ) return SqlString.Null;
    	var e = XElement.Parse( xml.Value );
    	return new SqlString( (string)e.Attribute( fieldName ) );

    I then took the TSQL generated from my LINQ statement above and massaged the query and tried all three for performance testing (I can provide the TSQL if desired).

    Here are my results on a 5000 row sample:

    UDF 2 Column Sort Query: 396 millisecs
    XQuery 2 Column Sort Query: 63 millisecs
    CLR 2 Column Sort Query: 426 millisecs

    So I guess in ‘all’ cases UDFs aren’t bad (well at least not compared to CLR…still brutal when compared to XQuery).

  4. <Profile name-last="Aney" name-first="Terry"/>

    This was supposed to be the r.Xml field above. (hopefully comment doesn’t hide my LessThan and GreaterThan characters.

  5. FYI, my coworker has ‘solved’ the problem. A bit ‘evil’ but doing the job for us. http://chriscavanagh.wordpress.com/2011/03/12/manipulating-linq-to-sql-command-text/

    So basically, now that we can modify, we created some ‘placeholder’ functions in our L2S DataContext that are obviously UDFs. But we intercept command, and swap them out for proper XQuery value/exists() syntax to make sure it is as performant as possible. Let me know if you would like any additional information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s