Laziness (in proxies) is a virtue

We use Hibernate for object-relational mapping (ORM) and organize our code into domain objects and data access objects.

Historically many of our domain objects have been marked with the “@Proxy(lazy = false)” Hibernate annotation. This annotation tells Hibernate that it should NOT create lazy proxies for the annotated class.

At Redfin, these were almost all bugs. We should never use “@Proxy(lazy = false)” without a big comment explaining why it’s necessary. Our default should be “@Proxy(lazy = true)”. Laziness is good!

Lazy

Here’s my quick understanding of the effects of the @Proxy annotation. As with everything in Hibernate, each individual piece seems simple, but when you consider all the features that Hibernate exposes, and how they interact, it can become pretty complicated.

Hibernate Load Options

When Hibernate loads objects that refer to other objects (i.e. have member objects), it needs to do something about the associated objects. For example, suppose that Cat objects contain (optional) references to Owner objects. When Hibernate is loading a Cat object into memory, it has to decide what to do about the Owner member variable. There are a number of things it COULD do:

  1. When it constructs SQL to load the Cat, it could include the Owner table and columns in the SELECT clause, so that all the data is loaded at once
  2. It could load the Cat object, and subsequently load the Owner object (via a second SQL statement)
  3. It could load the Cat object, and set the Owner member to a placeholder (a proxy), which can be filled in later when the Owner information is needed

Note that it CANNOT simply do nothing about the Owner- if it instantiates a Cat and leaves the Owner member null when the DB says that the Cat DOES have an Owner, then consumers of the Cat will be misinformed- they’ll think that the Cat has no Owner, which is false.

Option 1 (get all the info in 1 SQL statement) is efficient when loading multiple Cats for which the Owner information is needed. For example, if some code needed to iterate over 1000 Cats, and get Owner information for each one, this approach would be efficient.

However, option 1 is inefficient in cases where the secondary information is not needed. E.g. if some code needed to iterate over 1000 Cats but did NOT need to get Owner information, then loading the Owner information is an obvious inefficiency.

Worse, taking option 1 to the extreme can cause an explosion in the data load. For example, a Cat might have an Owner, the Owner might have a Home, the Home might have a Address, which might have a City, which might have a State, etc. Loading the whole object graph into memory via SQL could be very inefficient. Further, every change to domain objects could cause many SQL statements to get hairier (e.g. adding a Country member to the State object would effectively add to the SQL needed to load Cat objects.)

Option 2 (load the Cat, then load the Owner) is simple, and often not bad, but never optimal (if you know you’ll need the Owner info, it’s more efficient to load it in a single SQL statement; if you know you won’t use it you should never load it; if you won’t know until later, delaying the load is better.)

However, option 2 is particularly bad when bulk operations are being performed. For instance, if some code were to load up every Cat object in the database to do some processing, this could be accomplished via a single SQL statement (though it’d probably be better to break it into chunks of, say, 10,000 Cat objects.) However, Hibernate would run a “SELECT * FROM owners” type statement for every Cat object that has an Owner- potentially millions of SQL statements.

Option 3 (load the Cat and set it’s Owner member variable to a proxy- load the Owner info on demand) is a compromise. It allows code to do bulk operations without loading the ancillary information (e.g. load all Cat objects without ever loading any Owner objects.) However, it requires additional SQL statements to load the secondary information IF that info is needed (e.g. if code loaded all Cat objects, then accessed the Owner for each Cat, option 3 would result in potentially millions of SQL statements.) Note that if the Owner information is never needed, then option 3 is most efficient- the information is never loaded.

Hibernate allows programmers to influence which strategy it will take. It offers (at least) two types of control: direct control over the SQL it generates, and control over the proxies.

See https://www.hibernate.org/315.html and https://www.hibernate.org/162.html for information on Hibernate fetching strategies and lazy loading.

Controlling SQL

When you’re implementing a DAO method, you can tell Hibernate whether it should proactively fetch information about member objects.

Under the Criteria API, Hibernate lets you call criteria.setFetchMode to tell Hibernate that it should load the additional info immediately, or should defer it. Hibernate uses the term “eager” to mean “load immediately”, and “lazy” to mean “defer loading.”

When using HQL, you can use the FETCH keyword to specify the fetch mode, which is equivalent.

When using SQL, you can use the query.addJoin method to tell Hibernate that you’ve written SQL which retrieves information for member objects. In this case, you’ll be responsible for writing the joins, etc., yourself.

Controlling Proxies

Hibernate also lets you control the existence and behavior of proxies via the tags mentioned above. Annotating a class with “@Proxy(lazy = false)” tells Hibernate to NOT support lazy proxies for that type of object (of course “@Proxy(lazy = true)” tells Hibernate to support lazy proxies.) This allows the writer of the domain object to essentially override the wishes of the writer of the DAO. If the DAO writer would like to load members in a lazy manner, but the domain object in question doesn’t support lazy loading, then Hibernate will NOT lazy load the object (since it cannot.)

If you’re writing a class for which lazy loading would be dangerous, then you SHOULD disallow lazy proxies, since DAO writers probably won’t understand the detailed load requirements of your class. However, this is unusual. In most cases, lazy proxies are safe.

Since the writer of the domain object can control what choices are available to the writer of the DAO object, they need to use that power judiciously. You CAN code all of your domain objects to disallow lazy loading, which will force all writers of DAOs to use load options 1 or 2 (load all members via fancy SQL, or load all members via secondary SQL statements.) But you generally should not. DAO writers often rely on option 3 (lazy loading), particularly when they know that the member objects will never be accessed (or when they’re not sure.) If you specify “@Proxy(lazy = false)”, you’ve made it impossible for DAO writers to use option 3, which means it may be difficult for them to get their code to perform well. Worse, the writer of the DAO may not realize that you did that, or may not understand the implications. Hibernate queries are actually kinda hard to view, so the writer of the DAO may have created a huge performance problem and not even known it (until you go into production.)

Only the client really knows

Even the writer of the DAO doesn’t know how the client will use the objects it returns. If you’re implementing the CatDAO, you might add a method like getBasementCatsAndOwners, which would return all black cats and pre-fetch the corresponding owners. You think you’re clever because you’ve avoided a major performance problem, but a caller might try to get the Home for each Owner, defeating your pre-fetching strategy. The DAO writer should do their best to anticipate the needs of their callers, and to name and document their methods such that callers can understand what they do, but ultimately the caller is in control, and can (unintentionally) defeat the optimizations of the DAO writer. If your database were large and you knew that you had clients that sometimes needed Owners, sometimes needed Owners and Homes, etc., you might make three methods: CatDAO.getBasementCats, CatDAO.getBasementCatsAndOwners, and CatDAO.getBasementCatsAndOwnersAndHomes.

Conclusion: @Proxy(lazy = false) is generally evil

As mentioned above, when you indicate that a domain object should not support lazy proxies, you make it hard for DAO writers to get their code to perform well. Worse, you disable a capability that they may be counting on, and they may not notice until there are major performance problems. Unless you have a good reason to, use “@Proxy(lazy = true)” on your domain objects.

P.S.

Lazy proxies do have some known problems.

First, the lazy proxy is NOT the same as the actual object. If you depend on the datatype of the object, you may have problems, since the type of the proxy isn’t the same as the type of the actual object (e.g. a proxy for an Owner is not actually an Owner- it’s a subclass.)

Second, you may have to think carefully about methods like equals() or hashCode(), since the proxies may not do what you expect.

P.P.S
Thanks to carloneworld for the great lazy kitty photo!

Discussion

  • proxies

    Even the title alone appears so catchy to me. This is why I really have read this blog post from cover to cover. Amazing!
    http://www.proxylion.com

  • http://twitter.com/aikhan Asad Khan

    great