ORM framework anti-patterns

Written by John Mikael Lindbakk - 2020-12-01

Content

Introduction

These days, most projects tend to use one form of ORM frameworks. They make development easier by hiding all that pesky database stuff away. ORM frameworks help with security, cuts down on bloat and remove a bunch of potential traps one can fall in if one opts for not using one. At this point, I’d be more surprised if a project using a database without an ORM framework.

While ORM frameworks are remarkable, it doesn’t mean that they don’t introduce challenges in other areas. Throughout this post, we will look at some of the biggest traps I’ve encountered. There are probably more traps than what I’ve covered, but these are the ones that stand out to me as especially grievous. These problems can’t be blamed on the ORM frameworks themselves, instead of how developers decide to integrate their software with their ORM framework of choice.

Terminology

An ORM nightmare

Let’s start this post with a story. I once worked on a project, let’s say it was a “school management system” (it wasn’t, but let’s avoid pointing fingers here), let’s call it SMS for the remainder of this story. This project was a C# .net WinForms monolith which had its roots back in the mid-90s. As one might imagine, such an old system has a lot of quirks, and one of them was how it dealt with its ORM framework.

SMS used an ORM framework called XPO, which is proprietary, but capable. XPO has all of the features we expect from a modern ORM framework, and XPO is not to blame for the issues we encountered in this story. Instead, in this story, we can blame the developers.

Initially, SMS were written for private tutors who had small classes, and often a single student at a time—having such a small volume of data made database interactions a breeze. No matter how many joins one threw at the database, it would only just do it. Due to this, developers often used XPO in ways which are convenient, but technically inferior.

One convenient solution that SMS’s business logic used XPO directly. In other words, there were no forms of a DAO class used in this project. If algorithms needed some data, it would query it directly using an XPCollection, which is an object which allows you to build up a query and fetch some data from the database. As we will discuss later, using ORM tools directly within business logic tends to become messy, but this is how SMS was built. Another thing to keep in mind is that SMS relied on the concrete implementation found in XPO, not its interfaces, nor were these query objects injected anywhere. Suppose we put all of these factors together. In that case, we end up with a system which can call the database from any place in the code in any number of ways and all of them being difficult to verify in any meaningful fashion.

Before continuing, I should mention that this project had a bunch of other problems which is not related to how SMS dealt with entities:

All of this doesn’t sound good, but it worked well enough for private tutors for many years. At some point, management decided to branch out and target small private schools, and other organizations which could loosely fit within the system. One day the company managed to hook the most prominent institution so far, an institution with a whopping 14 employees! Most of the time, we just converted customers data from other systems and threw them into production. In this instance, we decided to do some testing first. Management pressured us to make sure that everything went smoothly, seeing as this institution had a pretty good reputation, and their opinions held a lot of weight in the community. I converted the database, which took 12 hours after some optimizations. I restored the database in one of the test environments (this were very much pre-GDPR). I started the application, logged in and waited… and waited some more… about a minute later the UI stopped hanging and the schedule for the user popped up.

I was already worried about the data quality of the conversion due to the “large” database, but now the application was hanging? I clicked over to the “Student tab”, which showed all the information of a selected student. It took about 20 seconds to load the initial view. Maybe it was an initialization thing? So I decided to switch between a few students to see if this was a WinForms thing. Deep inside, I knew it wasn’t that, but at this point, the customer is supposed to go live in a couple of days, so I were grasping at straws. Unfortunately for me, the straw broke when I attempted to open the list which allowed the user to select a student, as it took about 5 minutes to load.

At this point, I knew that things were FUBAR, but I had to see how bad things were. I was aware that the most challenging tab we had were the accounting tab, which held all the financial information of an institution. I clicked the finance tab… About 20 minutes later the application crashed…

Let’s be clear; we are not talking about thousands of users and millions of rows of data. We are talking about 14 users and a few hundred thousands of rows spread out over a few different tables. SMS used a modern version of MS SQL, and this volume of data should not be a problem, but for us, it was. There was nothing wrong with MS SQL, XPO, C#, Winforms or .net. The problem lied squarely on us, the developers.

We quickly alerted management which ordered us to work day and night to get it fixed ASAP, while they bought us a few extra days.

During this time, we probably worked 16 hours a day, and only went home to shower and sleep. We put in a fantastic effort and eventually managed to get the application into a runnable state. One always remember these “firefighter” scenarios with fondness, but in reality, our solutions were sloppy, and the work we applied were terrible. We solved the worst performance issues, but as the customer would tell us the next day: The application were pretty much unusable in its current state.

As I alluded to earlier in this story, there were some significant issues with how SMS dealt with XPO, and this was the core of our performance issues.

SMS’s entities were designed to have the same relationships as the tables had in the database. If there was a foreign key in the database, then there was a relationship between the entities in the code as well. What we realized is that due to this pattern, we had conveniently written all of our code to use them. After all, if you want a students classes, why query the database when you can use student.getClasses()? The appropriate function was already there, and it was a convenient solution. Since we had preferred using the entities relationships directly in such a manner, we had also put these relationships to load eagerly. It worked fine to eagerly load data when we basically could hold the whole database in memory for every user, but this was not possible for this new larger customer.

The issue was, we couldn’t just switch eager loading to lazy loading. The fetch behaviour was defined in the entity, and by changing eager to lazy, we risked introducing performance issues in other parts of the application. Eventually, this would lead to a hunt where we could squash one performance issue, but X more would crop up as a result. We could not fix the root issue, because that would lead to months of dealing with performance issues which the customer would not accept. The only option left was to treat the symptoms rather than the disease.

The most common “solution” we applied was to use XPO’s XPView. Rather than returning an entity, the XPView returns a map/dictionary of the data. By using XPView, we could bypass the entity entirely, so we avoided eager fetching data we didn’t want, but we also didn’t get any of the benefits it provided. The code looked horrible, but at least it ran with an acceptable speed. While such an approach works, it will always punish you in the future. We tried to fix technical debt by covering it up with more technical debt. Now the code is a mix of XPO entities and dictionaries with hardcoded string keys for accessing values. The overall solution is faster but also more complicated and less flexible.

As previously explained, there were no DAO of any kind. We weren’t sure of what triggered database calls where, as a call could go out at any point anywhere. The other common solution was to load data early in the topmost call of a call chain. XPO would take our objects and put them in a cache, so if we managed to load the correct entities, we could avoid querying the database. The result of this approach was that we had a lot of functions loading a bunch of objects (without doing anything with them) to make sure they’re cached. Again, this was an approach that worked, but it was fragile. If any of the underlying calls were to change, it could potentially render those initial loads pointless, and the performance issues would re-merge (as they often did).

SMS has had performance issues ever since that first “big customer”. Every time we got a new “biggest user” the application would break down, and we had to firefight to fix it. Later I left the project (for other reasons), but I am not proud of the state I left it in. At the time, I hadn’t realized the core issue with the performance issues. I had chopped it up to a bunch of other reasons, but at the time it never occurred to me that this was an architectural issue. We had severely mismanaged our ORM integration, and the business has paid dearly for it. I learned a lot from working on this project, but today I would have done things much differently.

Anti-pattern #1: The code doesn't use a DAO

SMS did not use any such classes, so the complexities of the ORM framework was tightly coupled with the business logic. Not only did we access the ORM specific classes directly, but we didn’t even bother relying on the interfaces XPO provided, which means that abstracting this out later will be a painful process. The solution, which I disagree with, was to mock the database itself, which might be appropriate in some instances, however, when pretty much every function of the application accesses the database it tends to lead to a lot of extra code for setting up tests. The other issue with not abstracting out the ORM framework is that it tests has to account for behind-the-scene ORM behaviour which can lead to tricky situations.

A better approach is to merely put the queries inside its class and hide it behind an interface. Now one can write tests for the business logic without having to worry about the ORM framework or the database.

If one abstracts out the complexities of ORM frameworks, then one ends up with cleaner business code and tests. If everything is abstracted out, then we know where calls to the database will happen, which means we can put in efforts to integration test our queries to the database itself.

At this point, the industry has agreed that tight coupling between UI and business logic is a bad idea, and the same applies to tight coupling to ORM frameworks. Both the database and the ORM framework is a dependency which the application has to deal with. As with most third-party dependencies, one wants to abstract such dependencies away from the business logic.

Anti-pattern #2: One allows for lazy loading of data outside the DAO

Most ORM frameworks allow for automatic lazy loading of data when calling on in-entity references which haven’t be loaded. Queries outside objects made to deal with queries explicitly is a bad practice and should not happen. There are predominantly two reasons why one would want to avoid lazy loading:

  1. Lazy loading can lead to severe performance issues. While it, at first, might seem fair in a single function, it can quickly spiral when that function is executed in a loop, or maybe when the caller also lazy loads data. A typical example is showing the impact of lazy-loaded data in loops, but the more insidious and more challenging to solve issues arise when we have complicated logic which deals with incomplete classes. To fix such problems, one often has to re-architecture parts of the application. The architecture should make sure all the data is loaded at the right time, and it would probably be easier if the code were written in such a way that it didn’t allow for lazy loading, to begin with.
  2. On a more purist level, lazy loading infringes on the responsibilities of our DAO objects. When our business code calls the DAO, it should expect that all the required data required to complete the execution is returned. By allowing for lazy loading, we are allowing DAO to cheat at their job and return incomplete sets of data. Such behaviour is hidden from the business code, which is excellent, but it is also hidden from the developer. A developer has to instinctively know that specific values in an entity will trigger a new SQL call, and it is easy to fall into a trap.

NOTE

Default lazy loading of values is something that looks magical and great on paper, especially in demos; however, as with many things: Convenience doesn’t necessarily lead to better software. It would be interesting to see a framework which didn’t allow for lazy loading at all but instead threw an exception when one tried to access the unloaded properties of an entity.

An exception would force people too eager load data, and it would be obvious earlier in the development process that one had to write specific queries for specific use cases to get the best performance.

I do not know whether such a framework exists. As far as I can tell the ORM frameworks, I’ve touched only allows for either lazy or eager fetching of data, but they don’t allow explicitly banning lazy fetching.

It would be interesting to see how an ORM framework, without the functionality of lazy loading, would impact application architecture and whether or not it would naturally guide developers to better solutions or not.

Anti-pattern #3: Eager fetching is defined in the entity

Many ORM frameworks allow developers to state in the entity when individual relationships are being resolved. Applying load logic within the entities themselves has a lot of unfortunate traps, and the biggest of them is that it often leads to severe performance issues.

We made the mistake of defining eager loading within entities in SMS. Eager fetching on the entity level turned into a massive headache, as now all interactions with particular objects loaded vast chunks of the database. While eager fetching was convenient for certain parts of the solution, it also made it impossible to work with specific entities in other parts of the application without introducing unacceptable performance issues.

The core of the issue is that logic and rules put into the entities themselves govern all interactions with the entities. ORM frameworks will read these entities and enforce these requirements, which puts us in an odd spot. By adding eager fetching in entities, we have decided to cater to one need, but potentially alienated others. In SMS, we ended up bypassing the entities entirely, as the entities themselves were too heavy to load into memory reliably. Instead, we read the values into a map/dictionary and processed them that way. While such an approach worked, it is hardly elegant, and it was forced upon us due to poorly designed entities.

A more appropriate approach would be to eager fetch data on a query level. Most ORM frameworks will allow us to decide what to eager fetch in our DAO class. Eager fetching on a query level will enable us to determine what to lead for each query, which allows us to have queries which load a bunch of data (if that is required), or we can have very lightweight calls for other use-cases. By putting the burden on the DAO, we can cater to more use cases without affecting others.

Anti-pattern #4: Implicit queries to the database

Some ORM frameworks, like Hibernate, allows for implicit updates of objects that have changed when a transaction is committed. Whether this is something that you consider an anti-patter mainly comes down to how much you value being explicit over being implicit.

My experience is that explicit code tends to be more readable and testable. Our code should be easy to read and easy to understand, and code which gets triggered implicitly in our application goes against readability. I am not saying that features such as AOP are bad, but we should be wary of the impact it has on readability. Our code should strive to strain the reader as little as possible.

Grady Booch once said:

"Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control."

By allowing for implicit updates, we are not using crisp abstractions, and we don’t have straightforward lines of control. The persistence call happens in the background, hidden from the reader. My complaints about implicit calls to the database do not come from a feature point of view, because it undeniably works, but it comes from a readability point of view. If we intend to save something, then our code should explicitly say that it is saving something. The same thing goes for loading objects which we looked at anti-pattern #2.

Anti-pattern #5: No integration tests

While this one isn’t related to ORMs, it is still something I’ve often seen some projects miss. Most projects tend to have unit tests, but some decide not to implement tests for the database. I tend to implement these tests as narrow integration tests.

Do note that that there’s no requirement to test every single query, primarily when queries rely heavily on structures provided by an ORM framework. For example, built-in functions from Spring data’s crudrepository offers a bunch of built-in functions, and we might not want to test all of them. In such circumstances, we should verify that the ORM framework can map the object to, and from, the database. There might be other system tests which also trigger database reads, so if the contract between the application and database were to be broken, then those tests will fail.

When working with ORM frameworks test coverage doesn’t always provide full benefit. Sometimes it just verifies the same thing which other similar queries have confirmed. Instead, focus on what is essential:

ORM frameworks can make mistakes

So far we have discussed things that I would consider to be anti-patterns where developers do a lousy job integrating with their ORM framework, but let’s look at a very devious “gotcha” which ORM frameworks may introduce.

In the project described in the introduction, SMS, we occasionally found that the ORM framework generated sub-par queries. Some were easy to catch by merely looking at the query and how XPO decided to build up a query, and sometimes XPO decided to generate queries which were very inefficient for our production versions of MS SQL. As you might imagine, the latter one was a nightmare to figure out as we were running other versions when testing the application. The takeaway here is not that XPO is terrible, but preferably that ORM frameworks are made to be generic, and sometimes they decide to do some things which don’t make sense in your specific use case. The lesson here should then be: When writing complicated queries, which the ORM framework will use to generate SQL queries, double-check the generated SQL query.

Some ORM frameworks try to be smart and only allow you to suggest whether something should be eagerly or lazily fetched. Most of the time, the ORM framework do decide on the correct behaviour, but once in a blue moon, the framework might decide on something which goes against the developer’s wish and expectations. In my experience, this behaviour only happens when one has to deal with big and wild queries and unconventional ways of processing data. The root of these issues usually come from dealing with a poorly designed database, where I had to compensate for it in the business logic and queries. We get to blame the developer once again, but the lesson should be: When writing complicated flows and queries in an ORM framework, which might decide to load things at its own leisure, make sure to double-check when items are loaded.

Conclusions

As with everything, a developer must be wary of the tools at their disposal. ORM frameworks make life much more comfortable but also come with their own set of design challenges which may cost us dearly if ignored. Developers must design their application to work with ORM frameworks, not the other way around.

Suppose you felt this post gave you nothing new, then great! That means you are less likely to make these mistakes. If you have a project with these anti-patterns, then I urge you to fix them. Technical debt from ORM frameworks is the kind of technical debt that will grow and expand as the application changes. Problems will only get worse, and solutions will only become harder to implement. There’s no better time than present to take action.

The core of this post was to raise awareness of the various challenges ORM frameworks may bring, and I hope I've done a decent job of it. These anti-patterns are, by far, not a complete list and there are more out there probably more than I am aware of. The anti-patterns in this posts are just those which I, personally, find extra problematic. If I encounter others, I'll be adding them to the list. If you feel that I've missed anything, feel free to let me know on one of my socials or through the website.