Tuning Queries when using Pagination with JPA and Hibernate

In recent performance tuning of some EJBQL search queries, I’ve had a lot of discussions with other developers on database pagination. There are some definite nuances that you have to be aware of when using Hibernate’s pagination feature, so I thought I would explain them here.

Quick Introduction to Database Pagination with JPA

Database pagination allows you to step through a result set in manageable chunks (say 5 at a time). This is an important feature when a result set is large. Imagine if the user selects the first result of 1000. Essentially 999 out of 1000 results were wasted. This is wasteful in terms of CPU cycles on the database server, network usage, CPU cycles on the application server, and memory allocation. On the other hand, if we only loaded 10 results into memory, we’ve only wasted 9 results. As the result set grows, this problem becomes more important to address.

Database pagination with JPA is quite simple through the javax.persistence.Query. The following method invocations retrieve the first 10 results for the query:

javax.persistence.Query query =
  em.createQuery("select order from Order as order
  left join order.customer as customer
  where customer.name like '%' || :name || '%'");

query.setParameter("name", name);
query.setFirstResult(0);
query.setMaxResults(10);

// returns 10 or less orders
List<Order> orders = query.getResultList();

The max number of results to retrieve at one time can be any number you choose. As the user pages through the data, we alter the setFirstResult(int) to retrieve the next set of results.

Query Tuning with Fetch Joins

When paging through a result-set, you may be interested in performing fetch joins to enhance query performance. This avoids the N+1 select problem when walking lazy relationships for displaying data. For example, let’s say we are working with an order management system. This order management system allows users to search for orders that have been placed by customers. Our domain would look something like the following, where an Order has one Customer and a Customer can be associated to many Orders.

This relationship could be described in the Order entity as:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  // ... ...
}

In the search results, the users want to see both Order and Customer information on the page. Lazily loading the Customer results in a query being executed to retrieve the Customer for each Order displayed. To avoid this, we can perform a fetch join on the Customer when retrieving the Order results. Here is the resulting EJBQL:

select order from Order as order
  left join fetch order.customer as customer
  where customer.name = '%' || :name || '%'

This ensures that only a single query is executed to load both the Order and the Customer results. An example of what the SQL result set might look like in this case would be:

| order_id | cust_id | cust_name       |
----------------------------------------
| 1        | 1       | Jacob Orshalick |
| 2        | 2       | Nirav Assar     |
| 3        | 3       | John Doe        |

As you can see, each Order is associated to a single Customer which ensures a unique result set. In this case we are guaranteed that limiting the result set to 5 will always result in 5 or less unique Order results. This is generally the right solution for a @OneToOne or a @ManyToOne relationship.

Fetching One-to-many or Many-to-many Relationships

Fetching one-to-many or many-to-many relationships gets a bit tricky. The moment you introduce a fetch join for a one-to-many or many-to-many relationship, Hibernate will load all results into memory and then only return you the max number of results you requested. This is due to the semantics of SQL queries.

Going back to our example, we will likely have a list of LineItem entries for each order that tell us what Products the Customer purchased on the Order.

And the Order entity would now look like:

@Entity
public class Order
{
  // ... ...

  @ManyToOne
  private Customer customer;

  @OneToMany
  private List<LineItem> lineItems;

  // Getters and Setters
}

The users request that we display the LineItem entries below each Order in the search results. So we can just do another fetch join and load this data as well right? Here is the resulting EBJQL:

select distinct order from Order as order
  left join fetch order.customer as customer
  left join fetch order.lineItems
  where customer.name = '%' || :name || '%'

Once you introduce this additional fetch into the query, Hibernate will present the following message in the log:

  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults
  specified with collection fetch; applying in memory!

This message is telling you that Hibernate is retrieving all results from the database, and then only returning the first 10 results (or the number of max results you specified). So why does Hibernate do this? Let’s have a look at an example of what the SQL result set generated from this query might look like.

| order_id | cust_id | cust_name       | line_id | product_sku |
----------------------------------------------------------------
| 1        | 1       | Jacob Orshalick | 1       | 1403-1209   |
| 1        | 1       | Jacob Orshalick | 2       | 1405-1333   |
| 2        | 2       | Nirav Assar     | 3       | 1300-1222   |
| 3        | 3       | John Doe        | 4       | 1400-3029   |
| 3        | 3       | John Doe        | 5       | 1401-1000   |
| 3        | 3       | John Doe        | 6       | 1200-1000   |

Each database has it’s own SQL syntax for limiting the result set, but assuming we limit the result-set to 5 results on the database side we would only get the first 5 results. As you can see, the result set returned duplicates the Order and Customer information for each LineItem on the Order. Thanks to the way Hibernate processes these results, we would still see the 3 expected orders (order_id = 1, 2, 3), but the database would only return us 2 of the LineItem entries for John Doe’s order. This is an incorrect result from the user’s point-of-view.

Knowing this, Hibernate rightfully retrieves all results in this case and then returns you the 3 Order results with all associated LineItem entries. But, to ensure correctness, you lose the value of pagination. So will we always face the N+1 select problem when using pagination with @OneToMany or @ManyToMany relationships? Not if you consider other options from a user experience perspective.

Other Options for one-to-many Relationships

There are a number of ways to enhance performance without losing the advantages of database pagination.

Display LineItem Entries only when Requested

Technology combinations like RichFaces and Seam make this simple. Basically you can walk the lazy relationship only when the user requests this information through an AJAX request. Through use of a <rich:togglePanel> a link can be provided to expand the Order data for the user. Because Seam allows an EntityManager to span requests lazily loading this data is simple.

Another simple option is using REST and JSON to retrieve the LineItem entries through an AJAX request when accessed by the user. A simple RESTful invocation (http://my-server/order/1/lineItems) allows the LineItem entries to be retrieved for an Order and we can then parse the results and display them back to the user. RESTEasy makes this simple for any Java application.

Display the LineItem Entries on a Details Page

This is the easiest and most obvious solution. Just display high-level Order information on the search results and the user can then access a details page that provides additional details. In general, this is the solution I generally push users toward for simplicity.

Display a High-level LineItem Summary Information

Another option is to give high-level information (e.g. number of LineItem entries) on the search page, and then display all information on a detail page. With the flexibility of EJBQL, you can use aggregate functions (e.g. count(lineItem.id) ) with a group-by clause to avoid the issues with a one-to-many. But, this also generally requires introduction of DTOs to hold the query result data or additional parsing of the result set.

Performance Tuning Always has Trade-offs

As I always say when discussing performance tuning, there are always trade-offs. Whether it’s additional complexity or changes to user experience, we always have to consider the implications of tuning our applications.

Getting Groovy with Seam after TSSJS

TSSJS was a great conference this year. Neal Ford, well-known author and ThoughtWorker, initiated the conference by delivering an inspiring keynote address discussing dynamic languages, DSLs (Domain-specific languages), and their role in the evolution of software development. Matt Raible blogged about the presentation and you can find a brief summary here. Polyglot programming became a theme of the conference with Ted Neward delivering yet another inspiring keynote encouraging developers to create their own language.

Building on this inspiration, Scott Davis discussed taking the Groovy Red Pill. Well, you would be happy to know I have taken the Red Pill, but of course with a Seam twist. I have been working on a time-tracking application (for my company’s internal use) utilizing Seam and Groovy that I will release as a freely available example. Groovy and Seam integration will be discussed in-depth in JBoss Seam 2E, but let’s look at the Groovy way to initialize a Timesheet:

@Entity
class GroovyTimesheet
{
  @Id @GeneratedValue
  Long id;

  @OneToMany
  @JoinColumn(name="TIMESHEET_ID")
  List<GroovyTimeEntry> entries = new ArrayList<GroovyTimeEntry>();

  GroovyTimesheet(PayPeriod payPeriod, int month, int year)
  {
    (payPeriod.getStartDate(month, year)..
      payPeriod.getEndDate(month, year)).each
    {
      entries << new GroovyTimeEntry(hours:0, date:it);
    }
  }

  // ... ...
}

So what is going on here? Essentially we define a range of dates that are iterated over. The PayPeriod is a fairly simple enum that determines the start and end dates of a pay period. By specifying (startDate..endDate) we define a range. Groovy understands the meaning of a range of dates allowing us to express this in a very concise manner (try expressing this in Java and you’ll get the picture). In addition, we use the each operation on this range. The each operation allows us to define a closure that executes as Groovy loops through our range of dates a day at a time. This allows us to initialize each GroovyTimeEntry instance for the pay period.

You will also note use of the << operator (or the leftShift operator). This operator is defined for a List allowing us to add elements to the list through this syntactic sugar. The GroovyTimeEntry instance is initialized using the default construction approach. By default, named constructor parameters can be specified in any order to initialize an object instance. If you define a constructor this is no longer provided by default. Finally, you will notice the use of the keyword it in the closure we defined for the each operation. The keyword it provides the value of the current element in the iteration. So in our instance, as we loop through the date range, each date will be provided iteratively in the range.

@Entity
class GroovyTimeEntry {
  @Id @GeneratedValue
  Long id;

  BigDecimal hours;
  Date date;
}

Wow, is that all the code? Looks pretty nice doesn’t it. As mentioned, the default constructor allows us to specify named parameters. In addition, getters and setters are automatically provided for each of our attributes.

You’ve probably noticed the use of JPA annotations here. This is perfectly legal and your groovy class will be a JPA entity. The same is true for Seam components annotated with @Name. So how does this work? Groovy classes are compiled to Java bytecode under the covers so JEE and Seam features are fully available to your Groovy classes at runtime. Simply use the groovyc compiler that can be accessed here or include the groovyc Ant task into your build. This will be covered in-depth in JBoss Seam 2E.

So what if we want to add GroovyTimeEntry instances to the GroovyTimesheet instance programmatically? The following code implements this:

@Entity
class GroovyTimesheet
{
  // ... ...

  void leftShift(GroovyTimeEntry entry)
  {
    entries << entry;
  }

  // ... ...
}

As mentioned, the leftShift operator is provided by Groovy and can be overloaded within your custom implementations. By defining a custom leftShift implementation we are now able to add a GroovyTimeEntry instance through the following:

// ... ...
timesheet << new GroovyTimeEntry(new Date());
// ... ...

Operator overloading isn’t limited to the leftShift operation. We can also overload other operators such as +:

// ... ...
BigDecimal plus(GroovyTimeEntry entry)
{
  this.hours + entry.hours;
}
// ... ...

This allows us to add the hours of two GroovyTimeEntry instances using the simple + operator. Notice that a return is not specified. This is optional as the last line is assumed to be a return statement.

As mentioned, a complete time-tracking application will be provided that not only demonstrates Groovy entities but also Groovy Seam components. In addition, the time-tracking application will make use of the ExpandoMetaClass to extend final Java classes at run-time! The code for the application is in development and will be made available through Google code. Also, stay tuned for JBoss Seam 2E which will provide the intimate details of Groovy and Seam integration. So take the Red Pill and use the meta-programming features of Groovy in your own Seam applications!

Extending query by example through annotation

Query by example (QBE) is a great feature for easily executing Criteria using Hibernate. By simply filling in an object and submitting it, Hibernate automatically generates the query based on the values set in the object. While this is an attractive option, in my experience it usefulness quickly becomes limited. In many cases, specifying Restrictions is required which quickly leads to ugly code. For example, lets take a Product class:

@Entity
@Table(name="PRODUCTS")
public class Product {
  @Id
  @Column(name="ID")
  private BigDecimal id;

  @Column(name="SKU")
  private String sku;

  @Column(name="CATEGORY")
  private Category category;

  @Column(name="NAME")
  private String name;

  @Column(name="DESCRIPTION")
  private String description;

  @Column(name="STARTED_OFFERING")
  private Date startedOffering;

  @Column(name="PRICE")
  private BigDecimal price;
  ...
}

This class has a number of attributes we could search on. The easiest of which are probably the category and name. With the id and name we can simply populate and object instance and use QBE:

...
Product product = new Product();

product.setCategory(Category.APPAREL);
product.setName("slacks");

Example productExample =
  Example.create(product).ignoreCase().enableLike();

List results = session.createCriteria(Product.class)
  .add(productExample)
  .list();
...

This works fine, but what if we want to search for products in a price range say gift ideas from $30.00 to $50.00, or products that we started offering between December 1, 2006 and March 31, 2007. Obviously we can’t accomplish this by simply setting values into our example product. We need to use the Criteria API in this case:

...
List products = session.createCriteria(Product.class)
  .add(Restrictions.between("price", minPrice, maxPrice))
  .add(Restrictions.
      between("startedOffering", startDate, endDate))
  .list();
...

This seems pretty simple but what if we need to add this to the UI. Well, first we need an object to bind our data to:

...
public class ProductCriteria {
  private BigDecimal minPrice;
  private BigDecimal maxPrice;
  private Date beginStartedOffering;
  private Date endStartedOffering;
  ...
  // getters and setters for attributes
  ...
}

Now we will need to perform checks to determine how to execute the search:

...
if(productCriteria.getMaxPrice() != null)
{
  criteria.add(Restrictions.less("price",
    productCriteria.getMaxPrice()))
}

if(productCriteria.getMinPrice() != null)
{
  criteria.add(Restrictions.greater("price",
    productCriteria.getMinPrice()))
}
...

Etc, etc… As you can imagine the problem is exacerbated the more Restrictions that become required to execute the search. So what about a hybrid approach? Lets define a few annotations:

@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Criteria {
  Class entity;
}
...
@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Criterion {
  String property;
  Restriction operator default Restriction.EQUALS;
}
...

Restriction is simply a Java5 enum that will map to a Hibernate Restriction (this can be done through a polymorphic method in your Restriction implementation). Alright, now lets place these annotations on our Criteria object that we bound to the UI:

@Criteria(entity=Product.class)
public class ProductCriteria {
  @Criterion(property="price", operator=Restrictions.LESS)
  private BigDecimal minPrice;

  @Criterion(property="price", operator=Restrictions.GREATER)
  private BigDecimal maxPrice;

  @Criterion(property="startedOffering",
    operator=Restrictions.GREATER)
  private Date beginStartedOffering;

  @Criterion(property="startedOffering",
    operator=Restrictions.LESS)
  private Date endStartedOffering;

  ...
  // getters and setters for attributes
  ...
}

At run-time the annotations can be processed to create and execute the criteria provided the object that we bound to the UI. Assuming we have a ProductCriteria instance populated with data from our UI interaction, the call would look like:

...
List <Product> products =
  criteriaManager.execute(productCriteria, entityManager);
...

The CriteriaManager knows the entity type we’re searching for (we placed it in the @Criteria annotation), it knows the restrictions we’re applying and which @Entity property we are applying them to (that is in the @Criterion annotations), and it knows the values that will be used in executing (based on the attribute the @Criterion annotation is placed on). We could also directly inject the EntityManager instance into the CriteriaProcessor.

So you may say this just seems like we’re moving the problem around. Well, not really. There are some benefits to this approach:

  • Vendor independence: The use of custom APIs is eliminated. Instead of calling on the Hibernate Criteria API, it becomes very simple to swap out our implementation. Simply alter the code that processes the annotations we defined.
  • One-stop shop: The annotations place the meta-data about how a criteria should execute with the actual criteria values. If changes need to be made to your example query, you make the changes in a single location rather than having to alter both your object bound to the UI and some DAO that processes that object. Additionally, the logic associated with executing criteria is right where it belongs, on the criteria object.
  • Remove unnecessary DAOs: Removes the need for a custom DAO to process the object bound to the UI with a bunch of if/thens. The EntityManager provided by JPA was a recognition that the constant DAOs implementing CRUD interfaces could be generalized. This approach recognizes that constant implementation of DAOs handling if/then scenarios can be generalized.
  • JPA Consistency: The @Criteria and @Criterion annotations look very similar to @Entity and @Column. This is intentional to create consistency between our @Entity mappings and our @Criteria mappings.
  • Testability: All logic for creation of criteria is in one place, the annotation processor. This one place can be heavily tested ensuring its functionality. A custom annotation processor could be developed to make testing your criteria classes very simple (development of this annotation processor is actually very simple).
  • Code generation: Tools like grails use compile-time annotation processing through an APT to auto-generate things like JSPs, classes, configuration, etc. This same approach can be taken using @Criteria. It would be fairly simple to write an APT that generates search criteria JSPs based on the @Criteria instance and the provided annotations.

Essentially we have a very reusable criteria execution strategy. There are of course many extensions that could be included with this including:

  • conjunctions – what about ANDs and ORs?
  • associations – what happens if my mappings contain a @OneToOne, @OneToMany, or even a @ManyToMany?
  • dynamic operators – what if my operators are not static (i.e. the user can choose an operator)?

These questions will be answered in an upcoming post so stay tuned.