In recent performance tuning of some EJBQL search queries, I’ve had a lot of discussions with other developers on database pagination. There are some definite nuances that you have to be aware of when using Hibernate’s pagination feature, so I thought I would explain them here.

Quick Introduction to Database Pagination with JPA

Database pagination allows you to step through a result set in manageable chunks (say 5 at a time). This is an important feature when a result set is large. Imagine if the user selects the first result of 1000. Essentially 999 out of 1000 results were wasted. This is wasteful in terms of CPU cycles on the database server, network usage, CPU cycles on the application server, and memory allocation. On the other hand, if we only loaded 10 results into memory, we’ve only wasted 9 results. As the result set grows, this problem becomes more important to address.

Database pagination with JPA is quite simple through the javax.persistence.Query. The following method invocations retrieve the first 10 results for the query:

javax.persistence.Query query =
  em.createQuery("select order from Order as order
  left join order.customer as customer
  where customer.name like '%' || :name || '%'");

query.setParameter("name", name);
query.setFirstResult(0);
query.setMaxResults(10);

// returns 10 or less orders
List<Order> orders = query.getResultList();

The max number of results to retrieve at one time can be any number you choose. As the user pages through the data, we alter the setFirstResult(int) to retrieve the next set of results.

Query Tuning with Fetch Joins

When paging through a result-set, you may be interested in performing fetch joins to enhance query performance. This avoids the N+1 select problem when walking lazy relationships for displaying data. For example, let’s say we are working with an order management system. This order management system allows users to search for orders that have been placed by customers. Our domain would look something like the following, where an Order has one Customer and a Customer can be associated to many Orders.

This relationship could be described in the Order entity as:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  // ... ...
}

In the search results, the users want to see both Order and Customer information on the page. Lazily loading the Customer results in a query being executed to retrieve the Customer for each Order displayed. To avoid this, we can perform a fetch join on the Customer when retrieving the Order results. Here is the resulting EJBQL:

select order from Order as order
  left join fetch order.customer as customer
  where customer.name = '%' || :name || '%'

This ensures that only a single query is executed to load both the Order and the Customer results. An example of what the SQL result set might look like in this case would be:

| order_id | cust_id | cust_name       |
----------------------------------------
| 1        | 1       | Jacob Orshalick |
| 2        | 2       | Nirav Assar     |
| 3        | 3       | John Doe        |

As you can see, each Order is associated to a single Customer which ensures a unique result set. In this case we are guaranteed that limiting the result set to 5 will always result in 5 or less unique Order results. This is generally the right solution for a @OneToOne or a @ManyToOne relationship.

Fetching One-to-many or Many-to-many Relationships

Fetching one-to-many or many-to-many relationships gets a bit tricky. The moment you introduce a fetch join for a one-to-many or many-to-many relationship, Hibernate will load all results into memory and then only return you the max number of results you requested. This is due to the semantics of SQL queries.

Going back to our example, we will likely have a list of LineItem entries for each order that tell us what Products the Customer purchased on the Order.

And the Order entity would now look like:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  @OneToMany
  private List<LineItem> lineItems;

  // Getters and Setters
}

The users request that we display the LineItem entries below each Order in the search results. So we can just do another fetch join and load this data as well right? Here is the resulting EBJQL:

select distinct order from Order as order
  left join fetch order.customer as customer
  left join fetch order.lineItems
  where customer.name = '%' || :name || '%'

Once you introduce this additional fetch into the query, Hibernate will present the following message in the log:

  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults
  specified with collection fetch; applying in memory!

This message is telling you that Hibernate is retrieving all results from the database, and then only returning the first 10 results (or the number of max results you specified). So why does Hibernate do this? Let’s have a look at an example of what the SQL result set generated from this query might look like.

| order_id | cust_id | cust_name       | line_id | product_sku |
----------------------------------------------------------------
| 1        | 1       | Jacob Orshalick | 1       | 1403-1209   |
| 1        | 1       | Jacob Orshalick | 2       | 1405-1333   |
| 2        | 2       | Nirav Assar     | 3       | 1300-1222   |
| 3        | 3       | John Doe        | 4       | 1400-3029   |
| 3        | 3       | John Doe        | 5       | 1401-1000   |
| 3        | 3       | John Doe        | 6       | 1200-1000   |

Each database has it’s own SQL syntax for limiting the result set, but assuming we limit the result-set to 5 results on the database side we would only get the first 5 results. As you can see, the result set returned duplicates the Order and Customer information for each LineItem on the Order. Thanks to the way Hibernate processes these results, we would still see the 3 expected orders (order_id = 1, 2, 3), but the database would only return us 2 of the LineItem entries for John Doe’s order. This is an incorrect result from the user’s point-of-view.

Knowing this, Hibernate rightfully retrieves all results in this case and then returns you the 3 Order results with all associated LineItem entries. But, to ensure correctness, you lose the value of pagination. So will we always face the N+1 select problem when using pagination with @OneToMany or @ManyToMany relationships? Not if you consider other options from a user experience perspective.

Other Options for one-to-many Relationships

There are a number of ways to enhance performance without losing the advantages of database pagination.

Display LineItem Entries only when Requested

Technology combinations like RichFaces and Seam make this simple. Basically you can walk the lazy relationship only when the user requests this information through an AJAX request. Through use of a <rich:togglePanel> a link can be provided to expand the Order data for the user. Because Seam allows an EntityManager to span requests lazily loading this data is simple.

Another simple option is using REST and JSON to retrieve the LineItem entries through an AJAX request when accessed by the user. A simple RESTful invocation (http://my-server/order/1/lineItems) allows the LineItem entries to be retrieved for an Order and we can then parse the results and display them back to the user. RESTEasy makes this simple for any Java application.

Display the LineItem Entries on a Details Page

This is the easiest and most obvious solution. Just display high-level Order information on the search results and the user can then access a details page that provides additional details. In general, this is the solution I generally push users toward for simplicity.

Display a High-level LineItem Summary Information

Another option is to give high-level information (e.g. number of LineItem entries) on the search page, and then display all information on a detail page. With the flexibility of EJBQL, you can use aggregate functions (e.g. count(lineItem.id) ) with a group-by clause to avoid the issues with a one-to-many. But, this also generally requires introduction of DTOs to hold the query result data or additional parsing of the result set.

Performance Tuning Always has Trade-offs

As I always say when discussing performance tuning, there are always trade-offs. Whether it’s additional complexity or changes to user experience, we always have to consider the implications of tuning our applications.