So you want to write a data layer?

Massive is the data layer I wish I had written. It’s small enough to just drop into any project and sensible enough that you won’t have a massive learning overhead. This is exactly what I needed when I got handed a legacy project recently and needed to make some page loads faster.

SQL Profiler reported an insane number of database calls for what looked like a relatively simple (but frequently visited) page in this application. Delving into the code left me feeling giddy as I navigated through layers of object oriented obscurity down to a custom data layer, finally discovering the root of the problem.

Recreating a cut down version of this application will afford us the pleasure of bypassing a whole heap of code with Massive. Rob Conery has my thanks and an open offer for some beers on my dime if he’s ever in London.

The Scenario

For argument’s sake, we will say we are writing a Sales system for a large organisation with a central office and several regional locations. Central needs an application to collect data from the regions. The spec calls for us to hold one row per item sold per region per quarter. A quick and dirty data diagram might look like this:

Sales Data Diagram

Sales are, as mentioned, line items; one item sold in one region in one quarter = one row in the Sales table. Let’s not argue too much about the data structure. It’s good enough for now.

The Code

Stored Procedure

In keeping with the wisdom of the time when the app was written, stored procedures are used – here’s our “get” proc for sales:

CREATE PROCEDURE [dbo].[GetSales]
    @Id uniqueidentifier = null
AS
BEGIN
    SET NOCOUNT ON;
    IF @Id is null
        BEGIN
            SELECT * FROM Sales
        END
    ELSE
        BEGIN
            SELECT * FROM Sales WHERE Id = @Id
        END

END

We’ve only one parameter in this procedure. In reality, the app I’m working on has ten or twelve with logic in the stored procedure to match. The “if” statements, as you can imagine, multiply enough to confuse the most battle hardened SQL Guru. Ok, maybe not Joe Celko, but certainly the rest of us.

Data Layer

We need an interface – interfaces are good.

namespace DataLayerCatastrophe.DataLayer
{
    public interface IDataAccess
    {
        IDataReader GetById(Guid id);
        IDataReader GetByParameters(DbParameter[] parameters);
    }
}

Every class in our data layer will implement our interface. It’s pretty scanty up there, more messy in real life.

Now, of course, our implementation in the form of a Sales class:

namespace DataLayerCatastrophe.DataLayer
{
    class Sales : IDataAccess
    {
        public IDataReader GetById(Guid id)
        {
            SqlParameter[] p = new SqlParameter[] { new SqlParameter() { ParameterName = "@Id", Value = id } };
            return GetByParameters(p);
        }

        public IDataReader GetByParameters(System.Data.Common.DbParameter[] parameters)
        {
            return Common.Common.GetDataReader(parameters, "GetSales");
        }
    }
}

The actual Sales class fleshes things out; GetById will return a single row from the database based on the Id passed in (Guids all round). GetByParameters calls out a common function that creates a connection, command, executes a stored procedure and returns the result. A simple (and probably badly coded) implementation could look like this:

namespace DataLayerCatastrophe.Common
{
    class Common
    {
        public static IDataReader GetDataReader(DbParameter[] p, String procName)
        {
            SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["production"].ConnectionString);
            SqlCommand cmd = new SqlCommand(procName,conn);
            cmd.CommandType = CommandType.StoredProcedure;
            foreach (SqlParameter parm in p)
            {
                cmd.Parameters.AddWithValue(parm.ParameterName, parm.Value);
            }
            conn.Open();
            SqlDataReader reader = cmd.ExecuteReader();
            return reader;
        }
    }
}

Business Layer

So far so good. Now we need a business layer. No interface this time, I’m not sure why.

namespace DataLayerCatastrophe.BusinessLayer
{
    public abstract class BusinessObject
    {
        protected IDataReader GetDataReader(Guid guid, IDataAccess dataAccess)
        {
            return dataAccess.GetById(guid);
        }
    }
}

namespace DataLayerCatastrophe.BusinessLayer
{
    class Sales : BusinessObject
    {
        public Guid Id { get; set; }
        public Guid RegionId { get; set; }
        public Guid ItemId { get; set; }
        public Guid QuarterId { get; set; }
        public decimal PriceSoldAt { get; set; }

        protected void Populate(Guid id)
        {
            IDataReader dataReader = GetDataReader(id, new DataLayer.Sales());
            while (dataReader.Read())
            {
                Id = dataReader.GetGuid(dataReader.GetOrdinal("Id"));
                RegionId = dataReader.GetGuid(dataReader.GetOrdinal("RegionId"));
                ItemId = dataReader.GetGuid(dataReader.GetOrdinal("ItemId"));
                QuarterId = dataReader.GetGuid(dataReader.GetOrdinal("QuarterId"));
                PriceSoldAt = dataReader.GetDecimal(dataReader.GetOrdinal("PriceSoldAt"));
            }
            dataReader.Close();
        }
        
        public Sales(Guid id)
        {
            Populate(id);
        }
    }
}

And hey presto – we now have class that can inflate itself to represent a single row of Sales data. The constructor takes a guid, this then calls the Populate method which in turn calls the GetDataReader method on the base class, passing in the guid and an instance of the Datalayer.Sales class. GetDataReader makes use of our IDataAccess interface reducing our coupling.

All is peaceful in the land of sales until we want to return a list of all sales data. At which point, for some unknown reason, those who designed the system figured this would be the most logical implementation:

namespace DataLayerCatastrophe.DataLayer
{
    class ListCatastrophe
    {
        private static List GetIdList(SqlParameter[] p, IDataAccess dataAccess)
        {
            List guidList = new List();
            IDataReader reader = dataAccess.GetByParameters(p);
            while (reader.Read())
            {
                guidList.Add(reader.GetGuid(0));
            }
            return guidList;
        }
        public static List GetSalesList()
        {
            List sales = new List();
            SqlParameter[] p = new SqlParameter[0];
            List ids = GetIdList(p, new DataLayer.Sales());
            foreach (Guid id in ids)
            {
                sales.Add(new BusinessLayer.Sales(id));
            }
            return sales;
        }
    }
}

What we have here is a class that will, eventually, implement one (or more) methods for each business object, returning lists of objects as and when we need them. The lovely method GetSalesList first uses the GetIdList to query the database and return a list of each and every Id in the Sales table by calling the GetByParameters in our Sales class in the Data Layer. The Sales class, if you remember, calls out to our Common GetDataReader function which actually calls the stored proc, passing in the parameters. In this case, there are none and so, the lovely if statement in the stored procedure falls into a ‘Select *’.

Back in GetSalesList, we loop through the list of guids and construct a list of Business Layer Sales objects, instantiating each one by passing in its Id field. Our business object, of course, inflates itself through the hierarchy of objects, calling eventually, the stored procedure with a parameter.

In case this wasn’t clear, to return a list of Sales objects, we first hit the database for a list of guids, then hit the database once for each row of sales data (selecting a single row based on it’s guid), inflating the business objects one by one. As you can appreciate, this pattern makes for a gradual slow down as the dataset increases. The application literally collapses under it’s own weight.

The kicker

As if this wasn’t enough, the consuming code that rendered the page was doing this:

int counter;
counter = ListCatastrophe.GetSalesList().Count;

Having gone to all the trouble of bringing back sales objects one by one, all we use the list for is a count.

Endemic

This pattern is endemic in this application. It’s the reason I decided to implement a new datalayer rather than try and patch what is currently there. Adding to the weight of the layers of code seemed to me, counter productive when what I really want is to redesign the data and business logic layer. I considered bolting on some methods in ListCatastrophe to return various aggregates. I also considered changing some of the base classes, either in the data layer or business layer, but it felt like I was just bloating an already over complicated situation.

Clean up

After I decided to side-step the current application anatomy, I thought about writing my own minimal data layer. Then, I stumbled on Massive and thought I would give it crack first.

Massive relies on the .NET DynamicObject and ExpandoObject. Dropping the single file in our application allows us to (among other things) execute a quick piece of SQL like this:

var db = Massive.DynamicModel.Open("production");
int counter = Convert.ToInt32(db.Scalar("Select count(*) from Sales"));

One database hit no matter how many rows. Much better. It feels good to regain control. Although I’m perhaps now more tightly coupled, I think stripping all the way back to this is a good thing. Backfilling the application a little at a time will allow me to refactor code incrementally rather than try and second guess a massive (hehehe) upfront design that will be just as bad as the one I’m replacing. I will most probably end up with a new structure that will have model classes that inherit from Massive’s DynamicModel (see Rob’s doc’s), into which I’ll move the various business layer and stored procedure logic (since in fact the logic encoded into the stored procs is pretty much the business logic of the app).

Most importantly, my users now have a super fast page load and hopefully, a restored faith in the power of code to help them do their jobs more efficiently.

Meteor – This is what Asp.Net webforms could have been

Meteor ticks the boxes for a realtime web application; “one language”, “realtime as default”. It looks like it eases the path of development, automatically updating your front end templates when your data changes by implementing an subscription model.

I’m playing with it. Enjoying the feeling of coding *everything* in one language. It’s definitely a smoother process for a developer. And all of a sudden, I get a feeling of deja-vu. Someone else tried to do this before, didn’t they? Someone named Microsoft. I remember back when ASP.NET came out and we all frantically switched from the old-and-inferior-scripting based technology to the all-new-and-improved-web-forms based technology. The promise was the same. No more hacking away in VBScript (erm I mean Asp) for your backend and Javascript for your frontend. No, instead you could write C# in your magical code-behind pages and all but forget about frontend jiggery-pockery.

Web forms really tries. You define your HTML in an aspx file – these are basically your templates. Elements can be set to have a data source which will, for example, infill the data for the element from your database. Elements can also be set to post back to the server anytime they change, get clicked or what have you.

Postbacks are tied to events in the C# code behind pages which means that you can move all your logic server side. It makes developing for the web more like writing a traditional style windows client application.

Now the html controls, of course, responded to Javascript to make the page post back. But it wasn’t Javascript that the developer had to write. It was auto-generated. Awesome. Awesome. Awesome. One language….C#.

Of course it isn’t without it’s downside. The ViewState – a massive chunk of data that kept….state between page loads. That’s a nasty piece of work. Ingenious in it’s own way, but nasty when you bump up against it. And of course there is the enormous expense of re-rendering the entire page every time a select box that is wired up to post back to the server changes. Large Web forms applications can become sloooooow on the client side.

Meteor does something similar. Only now with baked in Ajax goodness, the applications actually feel useable. I wonder if Microsoft missed the boat? I wonder if they could have leveraged Web forms, or something like it, to be more like Meteor. Maybe they do now – Web forms is still alive and well, but I haven’t (thank goodness) had to use it in a very long time so I’m somewhat out of touch.

Where Microsoft really shine, in my opinion, is with their IDE. Wiring up server side events to a button click really is easy – just open up your aspx file (the template), double click on the button and Visual Studio will wire up the event for you and pop you into the server side event so you can write your code. On a large page, this really does take some pain away. You don’t make stupid typos wiring up your button because you don’t have to type it in. It know Vim is awesome and all, but this part of the developer experience, Microsoft really does well.

Somewhere in the future, I imagine the best of both worlds. Something like Meteor + Visual Studio all open sourced and ready to go.