Friday, February 01, 2008

Entity SQL Queries for Retrieving EntityReferences and EntitySets

Updated February 2, 2008: See the "Caching Order Entities" near the end of the post and February 10, 2008: See the "Comparative Query Performance Data and T-SQL Sent to Server" topic.

Julie Lerman's Clever LINQ to Entities query trick? Or is it a bug? post of February 1, 2008 describes her query for obtaining an EntitySet (Products) with access to the associated EntityReference (Category) by CategoryID:

var _prod = Northwind.Categories.
            Where(c => c.CategoryName == id).
            OrderBy(c => c.CategoryName).
            Select(c => c.Products).
            First().ToList(); 

The preceding query replaced Brad Abrams' original query in his ASP.NET MVC Example Application over Northwind with the Entity Framework post of January 29, 2008:

List<Product> products = TheProducts.Where(c => c.Category.CategoryName == category).ToList();
              //prepare the view by explicitly loading the categories  
              products.FindAll(p => p.Category == null)
                      .ForEach(p => p.CategoryReference.Load()); 
 
which Julie analyzed in her MVC with Entity Framework ... a twist on Brad Abram's example post of January 31, 2008 and I reported in my "Julie Lerman Morphs Brad Abrams' and Lance Olson's MVC with EF Demo to AdventureWorks, VB, and Streamlined LINQ Queries" topic of LINQ and Entity Framework Posts for 1/28/2008+.
 
My post described a query similar to Julie's for returning an Orders EntitySet and its associated Customer EntityReference:
 
var query = ctxNwind.Customers
    .Where(cust => cust.CustomerID == customerIDTextBox.Text)
    .Select(c => c.Orders
    .Select(o => o))
    .FirstOrDefault()
    .OrderByDescending(o => o.OrderDate);
orderBindingSource.DataSource = query.ToList();
 
The preceding query is based on the the Entity Framework Sample Query Explorer’s LINQ to Entities | Relationship Navigation | Relationship Collection 1 (LinqToEntities70) query:
 
 
I was somewhat surprised that my query or the Explorer's sample query worked because they were missing an explicit eager loading method call. The fact that my query returns Order entities indicates that implicit loading is alive and well in EntityClient.
 
Note: If allowing execution of the preceding query is a bug, the Entity Framework Query Samples' Relationship Navigation group needs fixing, because the group's six samples use similar syntax.

Adding a Query with Explicit Eager-Loading

Julie said the ADO.NET Team's EF development lead, Danny Simmons, believed her replacement for Brad's query that returned Category EntityReferences without executing an explicit  .Include("Category") method to eager-load the tem was a bug.
 
My preceding query has the same problem, so I added an alternative query syntax that parallels Julie's new, more conventional syntax:
 
var query = ctxNwind.Orders
            .Include("Customers")
            .Where(o => o.Customers.CustomerID == customerIDTextBox.Text)
            .OrderByDescending(o => o.OrderDate);
orderBindingSource.DataSource = query.ToList();
 
I also set up a timer and a couple of check boxes with saved state so I could compare the performance of the two query versions. Here's the code for the form's two primary methods:
 
private void NwindEdm_Load(object sender, EventArgs e)
{
    swTimer.Start();
    ctxNwind = new nwEntities();
    customerBindingSource.DataSource = ctxNwind.Customers;
    txtTime.Text = (swTimer.ElapsedMilliseconds / 1000D).ToString("0.000");
    swTimer.Reset();
    isLoaded = true;
    
    // Load persisted CheckBox state
    chkQueryTimeOnly.Checked = Properties.Settings.Default.qryTimeOnly;
    chkUseOldQuery.Checked = Properties.Settings.Default.useOldQuery;
    
    // Load the DataGridView
    customerBindingSource_CurrentChanged(null, null);
}

private void customerBindingSource_CurrentChanged(object sender, EventArgs e)
{
    if (customerIDTextBox.Text.Length == 5) 
    {
        swTimer.Start();
        if (chkUseOldQuery.Checked)
        {
            // Original version from Entity Framework Query Samples project
            var query = ctxNwind.Customers
                        .Where(cust => cust.CustomerID == customerIDTextBox.Text)
                        .Select(c => c.Orders
                        .Select(o => o))
                        .FirstOrDefault()
                        .OrderByDescending(o => o.OrderDate);
            if (chkQueryTimeOnly.Checked)
                swTimer.Stop();
            orderBindingSource.DataSource = query.ToList();
        }
        else
        {
            // Updated version based on Julie Lerman's final query
            var query = ctxNwind.Orders.Include("Customers")
                        .Where(o => o.Customers.CustomerID == customerIDTextBox.Text)
                        .OrderByDescending(o => o.OrderDate);
            if (chkQueryTimeOnly.Checked)
                swTimer.Stop();
            orderBindingSource.DataSource = query.ToList();
        }
        if (isLoaded)
        {
            txtTime.Text = (swTimer.ElapsedMilliseconds / 1000D).ToString("0.000");
            swTimer.Reset();
        }
    }
}

Comparative Query Performance Data and T-SQL Sent to Server

I then ran some performance tests to compare the loading time of the customerBindingSource and orderBindingSource with a single Order EntityRef, as well as the time to update the orderDataGridView:
 

Operation
Original Query Time, s. Updated Query Time, s.
Load text boxes and DataGridView on opening 0.460 0.704
Average to alternate between ALFKI and ANATR 0.034 0.045
Average to alternate, query execute time only 0.031 0.001
 
The original query gave better performance in loading and query execution.
 
I then set up profiler and captured the T-SQL statements sent to SQL Server 2005 Express. The following Customers query executed 21 times on opening either the original or the updated query:
 
SELECT 
[Extent1].[CustomerID] AS [CustomerID], 
[Extent1].[CompanyName] AS [CompanyName], 
[Extent1].[ContactName] AS [ContactName], 
[Extent1].[ContactTitle] AS [ContactTitle], 
[Extent1].[Address] AS [Address], 
[Extent1].[City] AS [City], 
[Extent1].[Region] AS [Region], 
[Extent1].[PostalCode] AS [PostalCode], 
[Extent1].[Country] AS [Country], 
[Extent1].[Phone] AS [Phone], 
[Extent1].[Fax] AS [Fax]
FROM [dbo].[Customers] AS [Extent1]

The original query executed the following parameterized T-SQL query to fill the DataGridView:

exec sp_executesql N'SELECT 
[Project2].[CustomerID] AS [CustomerID], 
[Project2].[C1] AS [C1], 
[Project2].[OrderID] AS [OrderID], 
[Project2].[OrderDate] AS [OrderDate], 
[Project2].[RequiredDate] AS [RequiredDate], 
[Project2].[ShippedDate] AS [ShippedDate], 
[Project2].[Freight] AS [Freight], 
[Project2].[ShipName] AS [ShipName], 
[Project2].[ShipAddress] AS [ShipAddress], 
[Project2].[ShipCity] AS [ShipCity], 
[Project2].[ShipRegion] AS [ShipRegion], 
[Project2].[ShipPostalCode] AS [ShipPostalCode], 
[Project2].[ShipCountry] AS [ShipCountry], 
[Project2].[CustomerID1] AS [CustomerID1], 
[Project2].[EmployeeID] AS [EmployeeID], 
[Project2].[ShipVia] AS [ShipVia]
FROM ( SELECT 
    [Limit1].[CustomerID] AS [CustomerID], 
    [Extent2].[OrderID] AS [OrderID], 
    [Extent2].[CustomerID] AS [CustomerID1], 
    [Extent2].[EmployeeID] AS [EmployeeID], 
    [Extent2].[OrderDate] AS [OrderDate], 
    [Extent2].[RequiredDate] AS [RequiredDate], 
    [Extent2].[ShippedDate] AS [ShippedDate], 
    [Extent2].[ShipVia] AS [ShipVia], 
    [Extent2].[Freight] AS [Freight], 
    [Extent2].[ShipName] AS [ShipName], 
    [Extent2].[ShipAddress] AS [ShipAddress], 
    [Extent2].[ShipCity] AS [ShipCity], 
    [Extent2].[ShipRegion] AS [ShipRegion], 
    [Extent2].[ShipPostalCode] AS [ShipPostalCode], 
    [Extent2].[ShipCountry] AS [ShipCountry], 
    CASE WHEN ([Extent2].[OrderID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
    FROM   (SELECT TOP (1) 
        [Extent1].[CustomerID] AS [CustomerID]
        FROM [dbo].[Customers] AS [Extent1]
        WHERE [Extent1].[CustomerID] = @prm_0216d5c2bd6e4cffb03484188ee0983b ) AS [Limit1]
    LEFT OUTER JOIN [dbo].[Orders] AS [Extent2] ON [Limit1].[CustomerID] = [Extent2].[CustomerID]
)  AS [Project2]
ORDER BY [Project2].[CustomerID] ASC, [Project2].[C1] ASC',N'@prm_0216d5c2bd6e4cffb03484188ee0983b nvarchar(5)',@prm_0216d5c2bd6e4cffb03484188ee0983b=N'ALFKI'
 
The updated query executed the following more complex (as well as prolix and inefficient) parameterized T-SQL query:
 
exec sp_executesql N'SELECT 
[Project2].[C1] AS [C1], 
[Project2].[OrderID] AS [OrderID], 
[Project2].[OrderDate] AS [OrderDate], 
[Project2].[RequiredDate] AS [RequiredDate], 
[Project2].[ShippedDate] AS [ShippedDate], 
[Project2].[Freight] AS [Freight], 
[Project2].[ShipName] AS [ShipName], 
[Project2].[ShipAddress] AS [ShipAddress], 
[Project2].[ShipCity] AS [ShipCity], 
[Project2].[ShipRegion] AS [ShipRegion], 
[Project2].[ShipPostalCode] AS [ShipPostalCode], 
[Project2].[ShipCountry] AS [ShipCountry], 
[Project2].[EmployeeID] AS [EmployeeID], 
[Project2].[ShipVia] AS [ShipVia], 
[Project2].[CustomerID] AS [CustomerID], 
[Project2].[CompanyName] AS [CompanyName], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[ContactTitle] AS [ContactTitle], 
[Project2].[Address] AS [Address], 
[Project2].[City] AS [City], 
[Project2].[Region] AS [Region], 
[Project2].[PostalCode] AS [PostalCode], 
[Project2].[Country] AS [Country], 
[Project2].[Phone] AS [Phone], 
[Project2].[Fax] AS [Fax]
FROM ( SELECT 
    [Project1].[OrderID] AS [OrderID], 
    [Project1].[EmployeeID] AS [EmployeeID], 
    [Project1].[OrderDate] AS [OrderDate], 
    [Project1].[RequiredDate] AS [RequiredDate], 
    [Project1].[ShippedDate] AS [ShippedDate], 
    [Project1].[ShipVia] AS [ShipVia], 
    [Project1].[Freight] AS [Freight], 
    [Project1].[ShipName] AS [ShipName], 
    [Project1].[ShipAddress] AS [ShipAddress], 
    [Project1].[ShipCity] AS [ShipCity], 
    [Project1].[ShipRegion] AS [ShipRegion], 
    [Project1].[ShipPostalCode] AS [ShipPostalCode], 
    [Project1].[ShipCountry] AS [ShipCountry], 
    [Extent2].[CustomerID] AS [CustomerID], 
    [Extent2].[CompanyName] AS [CompanyName], 
    [Extent2].[ContactName] AS [ContactName], 
    [Extent2].[ContactTitle] AS [ContactTitle], 
    [Extent2].[Address] AS [Address], 
    [Extent2].[City] AS [City], 
    [Extent2].[Region] AS [Region], 
    [Extent2].[PostalCode] AS [PostalCode], 
    [Extent2].[Country] AS [Country], 
    [Extent2].[Phone] AS [Phone], 
    [Extent2].[Fax] AS [Fax], 
    1 AS [C1]
    FROM   (SELECT 
        [Extent1].[OrderID] AS [OrderID], 
        [Extent1].[CustomerID] AS [CustomerID], 
        [Extent1].[EmployeeID] AS [EmployeeID], 
        [Extent1].[OrderDate] AS [OrderDate], 
        [Extent1].[RequiredDate] AS [RequiredDate], 
        [Extent1].[ShippedDate] AS [ShippedDate], 
        [Extent1].[ShipVia] AS [ShipVia], 
        [Extent1].[Freight] AS [Freight], 
        [Extent1].[ShipName] AS [ShipName], 
        [Extent1].[ShipAddress] AS [ShipAddress], 
        [Extent1].[ShipCity] AS [ShipCity], 
        [Extent1].[ShipRegion] AS [ShipRegion], 
        [Extent1].[ShipPostalCode] AS [ShipPostalCode], 
        [Extent1].[ShipCountry] AS [ShipCountry]
        FROM [dbo].[Orders] AS [Extent1]
        WHERE [Extent1].[CustomerID] IS NOT NULL ) AS [Project1]
    LEFT OUTER JOIN [dbo].[Customers] AS [Extent2] ON [Project1].[CustomerID] = [Extent2].[CustomerID]
    WHERE [Project1].[CustomerID] = @prm_a5a2d18d8e45457dbb2be17c69c09551
)  AS [Project2]
ORDER BY [Project2].[OrderDate] DESC',N'@prm_a5a2d18d8e45457dbb2be17c69c09551 nvarchar(5)',@prm_a5a2d18d8e45457dbb2be17c69c09551=N'ALFKI'

Excessive repetition in unwieldy SQL statements is one of the reasons many developers attempt to avoid code generation by O/RM tools.

Updated February 10, 2008: Fixed typo and added comment re prolix, inefficient, and unwieldy queries.

Caching Order Entities

LINQ to SQL has a "smart caching" feature that avoids a round-trip to the server if executing the query would produce objects already in the cache. Object Services doesn't support this feature. Therefore the original and updated queries force a round-trip to the server to fetch related Order entities each time the user selects a different Customer entity. If you add an Includes("Orders") method call when populating the customerBindingSource, you can save round trips in a WinForm project by applying the original query to a class-level List<Customer> that caches the Customer entity with its Orders EntitySet.

I'll provide an example of this technique as well as using the object cache as the data source in a future update.

Updated: February 2, 2008

T-SQL UPDATE Statements

For a quick check of the updatability of the bound text boxes and DataGridView controls, I added '-Benson' to Maria Anders' last name and changed the Freight charge for the first ALFKI order from $1.21 to $1.25. Following is the UPDATE batch:

exec sp_executesql N'update [dbo].[Customers]
set [ContactName] = @0
where ([CustomerID] = @1)
',N'@0 nvarchar(19),@1 nvarchar(5)',@0=N'Maria Anders-Benson',@1=N'ALFKI'

exec sp_executesql N'update [dbo].[Orders]
set [Freight] = @0
where ([OrderID] = @1)
',N'@0 decimal(5,4),@1 int',@0=1.2500,@1=11011

The above provies that Object Services default concurrency model is last user wins.

Stay tuned for additional analysis of the preceding operations as well as repetition of the preceding operations on other machines running Vista and Windows Server 2008.

0 comments: