Friday, September 28, 2007

LINQ and Entity Framework Posts for 9/28/2007+

Julie Lerman Demonstrates Detached Updates with Entity Framework

The Entity Framework (EF) and LINQ to SQL teams have been talking about about sample projects to demonstrate detached updates and WCF implementations for months. Talk is cheap, but sample projects (like whiskey) cost money, which translates to time for developers.

Julie Lerman's More on Disconnected Entity Framework of October 1, 2007 shows you the code required to achieve her objective with detached and re-attached entities: "Update only relevant properties and have the option to deal with concurrency on a granular level."

Julie starts by recommending that you add a LastModified DateTime property to each entity and field to each persistence table (similar to that required by Sync Services for SQL Server Compact Edition), rather than a timestamp, and have the client update the value when modifying property values. She then shows how to handle entity additions and deletions (no concurrency management) with LINQ queries. Finally, Julie uses the largely undocumented the EF's MetadataWorkspace object and reflection to modify entity property values programmatically for updates. At this point, the ObjectContext.SaveChanges() method handles all persistence chores.

Most of Julie's techniques apply to LINQ to SQL, although detaching and reattaching the same entity to a different DataContext is problematic, as Dinesh Kulkarni notes in his Beta2 to RTM Changes in Attach() Behavior post to the LINQ Project General forum.

Julie promises to blog about handling state in ASP.NET projects and XML serialization of entities. I'm anxious to read her posts about both projects.

From the "Talk Is Cheap" Department:

On June 2, 2007, EF architect Mike Pizzo said in response to request #10 of my Defining the Direction of LINQ to Entities/EDM post of May 29, 2007, "Define the support for n-tier architectures and provide non-trivial, loosely-coupled sample projects with WCF:"

I would love to see us do some more in-depth sample projects with WCF.

Me, too, Mike. But Julie's doing this now.

Danny Simmons says in his September 15, 2007 Concepts Part I: Getting an entity model up and running post about his proposed blog series:

[A]s we go through the series [of blog posts ... explaining various Entity Framework concepts] we probably will also spend some time exploring other app architectures like web services and web apps.

I'm not holding my breath for a WCF sample.

Finally, in his October 1, 2007 Where was I hiding (aka lame excuse for not posting before)? post (see below), Dinesh Kulkarni promises to write a blog post about "Attaching detached objects (if only I can detach them first)" as his second priority (after "Stored Procedures."

Hopefully, Dinesh will include a non-trivial WCF sample project or two.

Considering Microsoft's emphasis on service architectures, it's surprising that more forethought wasn't devoted to detached and, especially WCF, scenarios by both the EF and LINQ to SQL teams.

Dinesh Kulkarni Plans Series of LINQ to SQL Posts

Dinesh's Where was I hiding (aka lame excuse for not posting before)? post of October 1, 2007 discusses why he's been among the missing LINQ to SQL team bloggers and lists seven topics he plans to cover in future posts:

  1. Sprocs and LINQ
  2. Attaching detached objects (if only I can detach them first)
  3. Transacting in LINQ to SQL
  4. Joins vs. n queries for eager loading (why can't you do them joins)
  5. Top 3 bugs that we fixed
  6. What am I doing after Orcas (VS 2008, .NET Fx 3.5) ships
  7. Tintin in LINQ land

I've about exhausted the first "Sprocs and LINQ" topic and am starting on the second "Attaching detached objects" topic in conjunction with WCF. I've already dealt with #4, which I call "JOIN Queries from Hell" because of their poor performance.

However, I've recommended that Dinesh replace topic #5 with "LINQ to SQL vs. DataAdapter Peformance" and some code that proves Soma's claim in his September 27, 2008 VS 2008 Performance Improvements post that :

In our testing, LINQ does in fact out-perform SqlDataAdaptor on almost every test case we tried, and in many of the exceptions, it is no more than 10% slower than using a SqlDataReader to accomplish the same task.  Given the power of LINQ, we feel this is a very reasonable trade-off.

I'm not seeing such results in my tests; I can't even come close to them with my test harnesses.

Save Server Round Trips by Preloading LINQ to SQL EntityRefs

shows you how to minimize the number of server round trips required to populate EntityRef associations with stored procedures. The technique executes a custom stored procedure that retrieves all the EntityRef objects that might be expected to appear on a set of Web pages or in a Windows form session. The eager loading process for the association detects that the entities are already loaded and doesn't invoke the corresponding stored procedure call.

This article uses the same ASP.NET 3.5 test harness as the earlier Problems Using Stored Procedures for LINQ to SQL Data Retrieval (Updated) post.

Added: 10/1/2007

Problems Using Stored Procedures for LINQ to SQL Data Retrieval (Updated)

finally became available on September 30, 2007. This post summarizes the issues expected to remain at RTM when you replace LINQ to SQL's autogenerated, dynamic T-SQL prepared statements for hydrating objects with stored procedures. The post covers issues common to ADO.NET 3.5 and ASP.NET 3.5 projects, as well as those that affect only ASP.NET projects that use the LinqDataSource control.

Here's a screen capture of the OrdersLinqDataSourceSP.sln test harness for pre-loading EntityRef objects and server-side sorting (by Employee.LastName) with a LinqDataSource control:

Click image for full-size screen capture.

The above is one of six test harnesses used to verify workarounds for and analyze comparative performance of stored procedures and dynamically generated T-SQL prepared statements.

Added: 9/20/2007

David Hayden Comments on Soma's LINQ to SQL Performance Post

David Hayden's LINQ To SQL and Visual Studio 2008 Performance Update post of September 28, 2007 mentions that he "would love to hear more about improvements on the way the team is dealing with the chattiness of LINQ To SQL when you are prefetching a graph of objects." So would I, but I'd also like to hear more about the methods used to gather the data for Soma's comparison of LINQ to SQL and DataSet performance. 

David's sample query includes two 1:n associations (Customer:Orders and Order:Order_Details) populated by eager loading the Orders collection for each Customer entity and the Order_Details collection for each Order entity:

using (NorthwindDataContext context =
           new NorthwindDataContext())
{
    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<Customer>(c => c.Orders);
    options.LoadWith<Order>(o => o.Order_Details);

    context.LoadOptions = options;

    IEnumerable<Customer> customers =
           context.Customers.ToList<Customer>();
}

The preceding code executes 92 queries to load all rows from the three tables into entities and populates my test harness's three bound DataGridView controls in 1.23 seconds. Each query opens and closes a pooled connection.

The reason for 92 queries? According to Scott Guthrie, only one 1:n association is permitted in a single JOIN query to preclude data explosions, as noted in my Clarification of the Object Tracking Problem with LINQ to SQL's Eager Loading Feature post of August 29, 2007.

Note: Deferred loading (the default), more commonly called lazy loading, executes 922 queries: 1 to retrieve all Customer entities, 91 to retrieve Orders for each Customer, and 830 to retrieve Orders with their Order_Details.

Rewriting the sample query as Scott suggested to substitute a m:1 Customer:Order association for the 1:n Customer:Orders association executes two queries to retrieve all entities:

using (NorthwindDataContext context =
           new NorthwindDataContext())
{
    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<Order>(c => c.Customer);
    options.LoadWith<Order>(o => o.Order_Details);

    context.LoadOptions = options;

    IEnumerable<Order> orders =
           context.Orders.ToList<Order>();
}

The first query is a simple SELECT query that returns all Customers. The second is a JOIN Query from Hell that returns a row for each Order Details item. This query takes 3.21 seconds to execute and fill the three DataGridViews, which includes a few milliseconds to convert the  List<Order> to a filterable BindingListView<Order> that can sync the contents of the Orders DataGridView for the selected Customer entity. (See Support Sorting and Filtering for List<T> Collections of 9/7/2007 for more details on the BindingListView<Order> wrapper.) My code uses ToBindingList<Customer> and ToBindingList<Order> for autogenerated dynamic SQL queries to make all three DataGridViews sortable, which is equivalent to DataSet functionality.

The corresponding typed DataSet loads into the three DataGridViews with three queries in 0.21 seconds.

David also suggests:

Obviously when you get into these scenarios where LINQ To SQL is not providing optimized queries, you can use LINQ To SQL's support for Stored Procedures.

This might be true for very simple queries after RTM when SProcs return the same Table<TEntity> data type as autogenerated dynamic SQL queries but, as Matt Warren noted in response to my Problem Populating a LinqDataSource's EntityRef Association with a Stored Proc issue in the LINQ Project General forum:

There is currently no means by which you can use a single stored procedure to load more than a single type of entity or association at a time.  For example, there is no way to write a stored procedure to retrieve a customer and all its orders and map that result to a heirarchy of customer and order objects.  You would have one stored procedure for retrieving a customer and a separate procedure for retrieving orders for that customer's ID.

This means that LINQ to SQL executes an SProc for each entity, which generates 923 queries for the three Northwind tables, regardless of whether you eager-load or lazy-load associations.

Until RTM or later CTP bits arrive, SProcs for data retrieval deliver a second-class ISingleResult<T> data type instead of the Table<TEntity> data type with Attach() and Detach() methods. The latter also offers a ToBidingList() method so your DataGridViews are sortable (but not filterable). See Upgrade to LINQ to SQL's WinForm Databinding with Stored Procedures Scheduled for RTM for more details on the forthcoming data type change.

Stored procedures to support server-side paging and correct sorting of GridViews bound to LinqDataSource controls contain ugly as sin T-SQL statements, to be charitable. See Dynamic ORDER BY Clauses for ASP.NET Server-Side Paging with Stored Procedures of 9/18, 2007 for examples.

P.S. See Frans Bouma's comment of 9/29/2007 below about the issues raised here.

Added: 9/28/2007 1500 PST Updated: 9/29/2007 Minor edits, additions and clarifications

Mike Taulty Says "Grouping in LINQ is Weird"

His Grouping in LINQ is weird (IGrouping<K,T> is your friend) post of September 28, 2007 explains the non-public GroupedEnumerable type and the IGrouping<K,T> enumeration to those who haven't yet figured out LINQ's grouping story. (The attempt at "an attention-grabbing headline" would only interest a LINQ junkie.)

Added: 9/28/2007 1530 PST

Chris Buckett Sets Up Unit Tests for an Entity Framework Project

Entity Framework - Part 2 - Setting up a project and getting unit tests working of September 28, 2007 includes a downloadable EF Research project that contains a simple Entity Data Model (EDM) generated from the Northwind Customers table.

The most interesting aspect of Chris's article is his use of Visual Studio 2008 Professional Edition's unit test feature provided by the New Project\Test\Test Project template, which you can substitute for NUnit or TestDriven.net.

As Chris learned in this Unit testing / mocking Entity Framework post in the ADO.NET Orcas forum, "there is no planned way to mock the database access in EF," so he's working on a "mock entity data provider."

Frans Bouma's Heavy into His LINQ to SQL Provider at Day 4

In his Developing Linq to LLBLGen Pro, Day 4 post of September 28, 2007, Frans discusses the hazards of retrofitting commercial code to accommodate a new feature: support for  derived tables, which Frans defines in Developing Linq to LLBLGen Pro, Day 2 as the [in-memory] result of execution of SELECT queries in FROM clauses. (This is a narrow view of derived tables, which the SQL 2003 standard defines indirectly as the result of any table subquery.) All modern relational database management systems (RDBMSs) now support derived tables, which are especially useful with aggregate functions. Frans includes sample code for generating a derived table.

Frans next tackles issues with the visitor pattern and observes that Microsoft should have provided a generic visitor class that provider writers could adapt to their database or O/RM tool. It appears that Frans intends to write a giant "switch statement which calls handler routines" instead of a custom visitor class.

Matt Warren, the LINQ to SQL team's chief architect, discusses the visitor pattern and expression trees in LINQ: Building an IQueryable Provider - Part II (Where and Reusable Expression Tree Visitor) and LINQ: Building an IQueryable Provider - Part V (Improved Column Binding).

Note: GoF is the accepted abbreviation for the "Gang of Four," authors of Design Patterns: Elements of Reusable Object-Oriented Software, the seminal work on application of design patterns to software design.

Pablo Castro and Opening Astoria's Door to Additional Data Sources

Pablo's Astoria data sources and system layering post of September 27, 2007 starts by asking and answering, at least in part, "Why does Astoria build on top of the EDM and the Entity Framework?" I'd say that demonstrating the practicality and scalability of the EDM and Entity Framework (EF) in a large-scale data service implementation provided much if not the majority of the initial impetus.

He then goes on to define Astoria's requirements for all data sources and proposes LINQ's IQueryable interface and the use of expression trees for translating Astoria's URI-based HTTP interface into composable (IQueryable) LINQ queries. Pablo cite's Matt Warren's LINQ: Building an IQueryable Provider series as "provid[ing] a great detailed reference" for expression trees. I agree wholeheartedly.

The team also plans to:

[D]efine an interface, something like IUpdatable or whatever names works, that has the basic operations we need to perform in order to handle updates. The interface would have primitive operations for adding a new resource, remove an existing resource, applying modifications to resources and also handle linking/unlinking of resources.

On the whole, the idea of using IQueryable and IUpdatable as the interfaces sounds good to me (and much better than substituting Web3S for POX as a data format, as noted in Deciding Astoria's Final URI Addressing Scheme.)

Soma Somasegar Claims Major Performance Gains for LINQ

Soma Somasegar says the following in his September 27, 2008 VS 2008 Performance Improvements post:

In the VS 2008, we have made some good strides in improving performance for some of the common scenarios that developers care about.  We set explicit performance goals both for new product features and for improving the performance of existing features.  

For example, with the new LINQ facility we set a goal that LINQ performance be significantly better than using a SqlDataAdaptor for the same query and competitive with using a SqlDataReader, the lightest weight interface available for retrieving data from SQL Server.  In our testing, LINQ does in fact out-perform SqlDataAdaptor on almost every test case we tried, and in many of the exceptions, it is no more than 10% slower than using a SqlDataReader to accomplish the same task.  Given the power of LINQ, we feel this is a very reasonable trade-off.

My tests with Beta 2 bits don't show that LINQ to SQL outperforms the SqlDataAdapter on any case I tried. In fact, my LINQ to SQL test harnesses show far slower data retrieval time with dynamic SQL statements or stored procedures than TableAdapters. I have the feeling that Microsoft's test results didn't populate associations while hydrating objects. On the other hand, there might have been a miraculous performance improvement post Beta 2.

Watch for my "Optimizing LINQ to SQL Performance" cover article for the November 2007 issue of Visual Studio Magazine where I provide the execution times for a wide variety of tests with bound DataGridViews for Windows forms and GridView/LinqDataSource combinations for Web pages. You'll be able to download the sample code for the four LINQ to SQL test harnesses that use Northwind as the persistence store. The tests weren't intended to compare LINQ to SQL and DataSet performance, but I built a comparable Windows form test harness for typed data sets and included the load time as a reference value. You can drag and drop a similar DataSet test harness in less than 5 minutes.

Note: My "Layer Data Access with LINQ to SQL" article from the July 2007 issue covered the basics of LINQ to SQL with Beta 1 code.

I also plan to rerun my timing tests for conventional and compiled LINQ to SQL queries with Beta 2 bits to see if I show the gains reported by Rico Mariani in his and my previous posts:

Stay tuned for updates.

Update 9/29/2007: Minor fixes and additions.

Mike Griffen Explains Why He's Lukewarm about Support for LINQ in EntitySpaces: Performance

The .NET Rocks! team released on September 27, 2007 a one-hour interview with Mike Griffen, the creator of MyGeneration and EntitySpaces. MyGeneration is an open-source (BSD licensed) codegen and O/RM tool for .NET that's available for download from SourceForge. EntitySpaces is a commercial persistence layer and business object generator/manager for .NET that supports seven popular relational database management systems. According to Mike, EntitySpaces competes with LLBLGen, NHibernate, and other popular O/RM tools for .NET.

The EntitySpaces Team Blog EntitySpaces and LINQ - An Introduction - Part 1 post of June 18, 2006 (updated September 30, 2006) states: "As LINQ nears its official release we will look at translating the LINQ IQueryable<T> expression trees into our query API." Subsequent EntitySpaces versions' LINQ support was limited to querying previously loaded ObjecNameCollection lists, which implement IEnumerable.

Mike claims that his dynamic query API is very similar to LINQ (16:28). Here's his explanation (18:00):

LINQ is very similar to what we're doing. They have a different syntax than we do. I've played around with it quite a bit, actually. I find our syntax more intuitive, especially when were doing joining and stuff like that. LINQ is very powerful, though. You can LINQ into our collections; we've supported that for a long time.

But we, basically, feel that we're not going to spend a lot of time right now trying to create LINQ providers for EntitySpaces, because LINQ is basically really supported on SQL Server [only]; some other databases are working towards that, but there aren't [other] LINQ providers or good LINQ implementation guidelines for actually creating a LINQ provider.

However, LINQ's performance seems to be Mike's primary concern. Mike says in the interview at 49:20:

From what I gather, [LINQ] uses a lot of reflection. I hear a lot about performance in ADO.NET, the Entity Framework and stuff. So I hear a lot about binding performance, about slowness. ...

From an EntitySpaces perspective I'm not panicked or really worried about where LINQ goes. We're going to support a lot of the LINQ stuff, we'll take advantage of it where we can.

It sounds to me as if Mike won't be following Frans Bouma's lead in translating LINQ expression trees into his O/RM tool's native query language after all.

Added: 9/29/2007 Updated: 10/24/2007 (Fixed Mike Griffen's name in the title)

Ambition Attempts LINQ for Ruby

Java has Quaere so Ruby must deserve a LINQ clone that enables writing queries in Ruby instead of SQL. Welcome to Chris Wanstrath's Ambition project. Ambition queries behave like LINQ queries, that is, they defer execution until you invoke a kicker method, such as detect, each, each_with_index, map and first (with no argument). Examples in the initial Full of Ambition blog of August 30, 2007 use the ActiveRecord "logging hack" to display the SQL for simple User.select, User.first, User.sort_by, User.detect, etc., expressions. There's some similarity to EntitySpaces' string-based query syntax.

As of September 11, 2007's Even More Ambitious post, Ambition's developers had "moved our sights from Rack to LINQ." Ambition has the ambition "to be a query language for SQL, LDAP, XPath, the works."

Anders Noras, the developer of Quaere, describes Ambition's similarities and differences in Ambition, Quaere, Closures and Other Random Thoughts of September 27, 2007, from which I learned of Ambition's existence.

Added: 9/28/2007 1630 PST

1 comments:

Anonymous said...

Not only is their LoadWith approach broken, their hierarchical projection stuff is also broken:
http://blogs.msdn.com/mattwar/archive/2007/08/09/linq-building-an-iqueryable-provider-part-vi.aspx
which requires MARS to even work.

What's funny is that BOTH could be fixed with the same routine: the graph node fetcher could simply use the relation between the entities as specified by the field mapped onto the relation (e.g. 'customer.Orders', which is mapped onto the relation Customer - Order) and then build hash sets for both sides of the relation to merge the objects in memory.

That same routine could be used for fetching hierarchical projections. This means that you again execute one query per node, after eachother (so parent first, then sublevel, then sub sub level etc.) and again you build hash sets for both sides (parent & child) using the predicate expression used in the fetch of the related data. (see the code snippet in the post I linked above)

THis would both optimize the eager loading tremendously, and also make it possible to do efficient fetches of hierarchical projections which also work on systems without mars.(and 1 query per graph level!)

One other thing occured to me, which will be problematic with Linq to sql and Linq in general due to its deferred execution system: what if you have a loop where you fetch data, modify it, save it, fetch other data, modify it, save it etc. etc. in a transaction. You can only do that with Linq to Sql if the fetch code and the save code is in the same routine, because you need the same context! This can be problematic if the query creation code is located somewhere else in the system (and why shouldn't it be).