Hibernate

Tuning Queries when using Pagination with JPA and Hibernate

In recent performance tuning of some EJBQL search queries, I’ve had a lot of discussions with other developers on database pagination. There are some definite nuances that you have to be aware of when using Hibernate’s pagination feature, so I thought I would explain them here.

Quick Introduction to Database Pagination with JPA

Database pagination allows you to step through a result set in manageable chunks (say 5 at a time). This is an important feature when a result set is large. Imagine if the user selects the first result of 1000. Essentially 999 out of 1000 results were wasted. This is wasteful in terms of CPU cycles on the database server, network usage, CPU cycles on the application server, and memory allocation. On the other hand, if we only loaded 10 results into memory, we’ve only wasted 9 results. As the result set grows, this problem becomes more important to address.

Database pagination with JPA is quite simple through the javax.persistence.Query. The following method invocations retrieve the first 10 results for the query:

javax.persistence.Query query =
  em.createQuery("select order from Order as order
  left join order.customer as customer
  where customer.name like '%' || :name || '%'");

query.setParameter("name", name);
query.setFirstResult(0);
query.setMaxResults(10);

// returns 10 or less orders
List<Order> orders = query.getResultList();

The max number of results to retrieve at one time can be any number you choose. As the user pages through the data, we alter the setFirstResult(int) to retrieve the next set of results.

Query Tuning with Fetch Joins

When paging through a result-set, you may be interested in performing fetch joins to enhance query performance. This avoids the N+1 select problem when walking lazy relationships for displaying data. For example, let’s say we are working with an order management system. This order management system allows users to search for orders that have been placed by customers. Our domain would look something like the following, where an Order has one Customer and a Customer can be associated to many Orders.

This relationship could be described in the Order entity as:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  // ... ...
}

In the search results, the users want to see both Order and Customer information on the page. Lazily loading the Customer results in a query being executed to retrieve the Customer for each Order displayed. To avoid this, we can perform a fetch join on the Customer when retrieving the Order results. Here is the resulting EJBQL:

select order from Order as order
  left join fetch order.customer as customer
  where customer.name = '%' || :name || '%'

This ensures that only a single query is executed to load both the Order and the Customer results. An example of what the SQL result set might look like in this case would be:

| order_id | cust_id | cust_name       |
----------------------------------------
| 1        | 1       | Jacob Orshalick |
| 2        | 2       | Nirav Assar     |
| 3        | 3       | John Doe        |

As you can see, each Order is associated to a single Customer which ensures a unique result set. In this case we are guaranteed that limiting the result set to 5 will always result in 5 or less unique Order results. This is generally the right solution for a @OneToOne or a @ManyToOne relationship.

Fetching One-to-many or Many-to-many Relationships

Fetching one-to-many or many-to-many relationships gets a bit tricky. The moment you introduce a fetch join for a one-to-many or many-to-many relationship, Hibernate will load all results into memory and then only return you the max number of results you requested. This is due to the semantics of SQL queries.

Going back to our example, we will likely have a list of LineItem entries for each order that tell us what Products the Customer purchased on the Order.

And the Order entity would now look like:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  @OneToMany
  private List<LineItem> lineItems;

  // Getters and Setters
}

The users request that we display the LineItem entries below each Order in the search results. So we can just do another fetch join and load this data as well right? Here is the resulting EBJQL:

select distinct order from Order as order
  left join fetch order.customer as customer
  left join fetch order.lineItems
  where customer.name = '%' || :name || '%'

Once you introduce this additional fetch into the query, Hibernate will present the following message in the log:

  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults
  specified with collection fetch; applying in memory!

This message is telling you that Hibernate is retrieving all results from the database, and then only returning the first 10 results (or the number of max results you specified). So why does Hibernate do this? Let’s have a look at an example of what the SQL result set generated from this query might look like.

| order_id | cust_id | cust_name       | line_id | product_sku |
----------------------------------------------------------------
| 1        | 1       | Jacob Orshalick | 1       | 1403-1209   |
| 1        | 1       | Jacob Orshalick | 2       | 1405-1333   |
| 2        | 2       | Nirav Assar     | 3       | 1300-1222   |
| 3        | 3       | John Doe        | 4       | 1400-3029   |
| 3        | 3       | John Doe        | 5       | 1401-1000   |
| 3        | 3       | John Doe        | 6       | 1200-1000   |

Each database has it’s own SQL syntax for limiting the result set, but assuming we limit the result-set to 5 results on the database side we would only get the first 5 results. As you can see, the result set returned duplicates the Order and Customer information for each LineItem on the Order. Thanks to the way Hibernate processes these results, we would still see the 3 expected orders (order_id = 1, 2, 3), but the database would only return us 2 of the LineItem entries for John Doe’s order. This is an incorrect result from the user’s point-of-view.

Knowing this, Hibernate rightfully retrieves all results in this case and then returns you the 3 Order results with all associated LineItem entries. But, to ensure correctness, you lose the value of pagination. So will we always face the N+1 select problem when using pagination with @OneToMany or @ManyToMany relationships? Not if you consider other options from a user experience perspective.

Other Options for one-to-many Relationships

There are a number of ways to enhance performance without losing the advantages of database pagination.

Display LineItem Entries only when Requested

Technology combinations like RichFaces and Seam make this simple. Basically you can walk the lazy relationship only when the user requests this information through an AJAX request. Through use of a <rich:togglePanel> a link can be provided to expand the Order data for the user. Because Seam allows an EntityManager to span requests lazily loading this data is simple.

Another simple option is using REST and JSON to retrieve the LineItem entries through an AJAX request when accessed by the user. A simple RESTful invocation (http://my-server/order/1/lineItems) allows the LineItem entries to be retrieved for an Order and we can then parse the results and display them back to the user. RESTEasy makes this simple for any Java application.

Display the LineItem Entries on a Details Page

This is the easiest and most obvious solution. Just display high-level Order information on the search results and the user can then access a details page that provides additional details. In general, this is the solution I generally push users toward for simplicity.

Display a High-level LineItem Summary Information

Another option is to give high-level information (e.g. number of LineItem entries) on the search page, and then display all information on a detail page. With the flexibility of EJBQL, you can use aggregate functions (e.g. count(lineItem.id) ) with a group-by clause to avoid the issues with a one-to-many. But, this also generally requires introduction of DTOs to hold the query result data or additional parsing of the result set.

Performance Tuning Always has Trade-offs

As I always say when discussing performance tuning, there are always trade-offs. Whether it’s additional complexity or changes to user experience, we always have to consider the implications of tuning our applications.

Extending query by example through annotation

Query by example (QBE) is a great feature for easily executing Criteria using Hibernate. By simply filling in an object and submitting it, Hibernate automatically generates the query based on the values set in the object. While this is an attractive option, in my experience it usefulness quickly becomes limited. In many cases, specifying Restrictions is required which quickly leads to ugly code. For example, lets take a Product class:

@Entity
@Table(name="PRODUCTS")
public class Product {
  @Id
  @Column(name="ID")
  private BigDecimal id;

  @Column(name="SKU")
  private String sku;

  @Column(name="CATEGORY")
  private Category category;

  @Column(name="NAME")
  private String name;

  @Column(name="DESCRIPTION")
  private String description;

  @Column(name="STARTED_OFFERING")
  private Date startedOffering;

  @Column(name="PRICE")
  private BigDecimal price;
  ...
}

This class has a number of attributes we could search on. The easiest of which are probably the category and name. With the id and name we can simply populate and object instance and use QBE:

...
Product product = new Product();

product.setCategory(Category.APPAREL);
product.setName("slacks");

Example productExample =
  Example.create(product).ignoreCase().enableLike();

List results = session.createCriteria(Product.class)
  .add(productExample)
  .list();
...

This works fine, but what if we want to search for products in a price range say gift ideas from $30.00 to $50.00, or products that we started offering between December 1, 2006 and March 31, 2007. Obviously we can’t accomplish this by simply setting values into our example product. We need to use the Criteria API in this case:

...
List products = session.createCriteria(Product.class)
  .add(Restrictions.between("price", minPrice, maxPrice))
  .add(Restrictions.
      between("startedOffering", startDate, endDate))
  .list();
...

This seems pretty simple but what if we need to add this to the UI. Well, first we need an object to bind our data to:

...
public class ProductCriteria {
  private BigDecimal minPrice;
  private BigDecimal maxPrice;
  private Date beginStartedOffering;
  private Date endStartedOffering;
  ...
  // getters and setters for attributes
  ...
}

Now we will need to perform checks to determine how to execute the search:

...
if(productCriteria.getMaxPrice() != null)
{
  criteria.add(Restrictions.less("price",
    productCriteria.getMaxPrice()))
}

if(productCriteria.getMinPrice() != null)
{
  criteria.add(Restrictions.greater("price",
    productCriteria.getMinPrice()))
}
...

Etc, etc… As you can imagine the problem is exacerbated the more Restrictions that become required to execute the search. So what about a hybrid approach? Lets define a few annotations:

@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Criteria {
  Class entity;
}
...
@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Criterion {
  String property;
  Restriction operator default Restriction.EQUALS;
}
...

Restriction is simply a Java5 enum that will map to a Hibernate Restriction (this can be done through a polymorphic method in your Restriction implementation). Alright, now lets place these annotations on our Criteria object that we bound to the UI:

@Criteria(entity=Product.class)
public class ProductCriteria {
  @Criterion(property="price", operator=Restrictions.LESS)
  private BigDecimal minPrice;

  @Criterion(property="price", operator=Restrictions.GREATER)
  private BigDecimal maxPrice;

  @Criterion(property="startedOffering",
    operator=Restrictions.GREATER)
  private Date beginStartedOffering;

  @Criterion(property="startedOffering",
    operator=Restrictions.LESS)
  private Date endStartedOffering;

  ...
  // getters and setters for attributes
  ...
}

At run-time the annotations can be processed to create and execute the criteria provided the object that we bound to the UI. Assuming we have a ProductCriteria instance populated with data from our UI interaction, the call would look like:

...
List <Product> products =
  criteriaManager.execute(productCriteria, entityManager);
...

The CriteriaManager knows the entity type we’re searching for (we placed it in the @Criteria annotation), it knows the restrictions we’re applying and which @Entity property we are applying them to (that is in the @Criterion annotations), and it knows the values that will be used in executing (based on the attribute the @Criterion annotation is placed on). We could also directly inject the EntityManager instance into the CriteriaProcessor.

So you may say this just seems like we’re moving the problem around. Well, not really. There are some benefits to this approach:

  • Vendor independence: The use of custom APIs is eliminated. Instead of calling on the Hibernate Criteria API, it becomes very simple to swap out our implementation. Simply alter the code that processes the annotations we defined.
  • One-stop shop: The annotations place the meta-data about how a criteria should execute with the actual criteria values. If changes need to be made to your example query, you make the changes in a single location rather than having to alter both your object bound to the UI and some DAO that processes that object. Additionally, the logic associated with executing criteria is right where it belongs, on the criteria object.
  • Remove unnecessary DAOs: Removes the need for a custom DAO to process the object bound to the UI with a bunch of if/thens. The EntityManager provided by JPA was a recognition that the constant DAOs implementing CRUD interfaces could be generalized. This approach recognizes that constant implementation of DAOs handling if/then scenarios can be generalized.
  • JPA Consistency: The @Criteria and @Criterion annotations look very similar to @Entity and @Column. This is intentional to create consistency between our @Entity mappings and our @Criteria mappings.
  • Testability: All logic for creation of criteria is in one place, the annotation processor. This one place can be heavily tested ensuring its functionality. A custom annotation processor could be developed to make testing your criteria classes very simple (development of this annotation processor is actually very simple).
  • Code generation: Tools like grails use compile-time annotation processing through an APT to auto-generate things like JSPs, classes, configuration, etc. This same approach can be taken using @Criteria. It would be fairly simple to write an APT that generates search criteria JSPs based on the @Criteria instance and the provided annotations.

Essentially we have a very reusable criteria execution strategy. There are of course many extensions that could be included with this including:

  • conjunctions – what about ANDs and ORs?
  • associations – what happens if my mappings contain a @OneToOne, @OneToMany, or even a @ManyToMany?
  • dynamic operators – what if my operators are not static (i.e. the user can choose an operator)?

These questions will be answered in an upcoming post so stay tuned.