The best way to use JOIN FETCH and Pagination with Spring

Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?

What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?

Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.

So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!

Introduction

In this article, we are going to see how we can use the JOIN FETCH clause when fetching a child collection eagerly while also limiting the number of parent records using pagination in a Spring Data JPA application.

I decided to write this article because the most common solution used in many projects turns out to be extremely inefficient.

Domain Model

Let’s assume we have a parent Post entity that has a one-to-many relationship with the child PostComment entity.

Post and PostComment entities

The Post parent entity is mapped as follows:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @Column(
        name = "created_on", 
        nullable = false
    )
    private LocalDateTime createdOn;

    @OneToMany(
        mappedBy = "post", 
        cascade = CascadeType.ALL, 
        orphanRemoval = true
    )
    private List<PostComment> comments = new ArrayList<>();
}

And the PostComment child entity, like this:

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    private String review;

    @Column(name = "created_on")
    private LocalDateTime createdOn;
}

Using JOIN FETCH with Spring Pagination

Our business requirement is to filter the Post entities by a given predicate, order them by the creation timestamp, and fetch the Top N such Post entities along with all their associated comments.

The most obvious but wrong way of implementing this requirement is to combine a JOIN FETCH on the child collection with the parent entity pagination in a single Spring @Query that looks as follows:

@Query(
    value = """
        select p
        from Post p
        left join fetch p.comments
        where p.title like :titlePattern
        """,
    countQuery = """
        select count(p)
        from Post p
        where p.title like :titlePattern
        """
)
Page<Post> findAllByTitleWithCommentsAntiPattern(
    @Param("titlePattern") String titlePattern, 
    Pageable pageable
);

At first sight, this query looks like the natural solution to our problem, but when running it:

Page<Post> posts = postRepository.findAllByTitleWithCommentsAntiPattern(
    "High-Performance Java Persistence %",
    PageRequest.of(
        0, 
        maxCount, 
        Sort.by("createdOn")
    )
);

We get the following entries printed in the log:

WARN: HHH90003004: firstResult/maxResults specified with collection fetch; 
                   applying in memory

-- JOIN FETCH query
SELECT 
    p1_0.id,
    c1_0.post_id,
    c1_0.id,
    c1_0.created_on,
    c1_0.review,
    p1_0.created_on,
    p1_0.title
FROM 
    post p1_0
LEFT JOIN 
    post_comment c1_0 ON p1_0.id=c1_0.post_id
WHERE 
    p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE ''
ORDER BY 
    p1_0.created_on ASC

-- Count query
SELECT 
    count(p1_0.id)
FROM 
    post p1_0
WHERE 
    p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE ''

The WARN message tells us that the pagination was not done in the database but in the application memory, and if we inspect the SQL query that follows the warning message, we can see that, indeed, there’s no SQL-level pagination.

Because a WARN message can be overlooked, it’s better to enable the Hibernate hibernate.query.fail_on_pagination_over_collection_fetch property which will throw a HibernateException when detecting this issue.

spring.jpa.properties.hibernate.query.fail_on_pagination_over_collection_fetch=true

Check out this article for a detailed explanation of this property.

To verify the performance of a given SQL query, the only valid strategy is to inspect the execution plan.

However, istead of using EXPLAIN ANALYZE, which would just give us the estimated plan, we are going to use auto_explain so that we can collect the actual execution plan for a given executed SQL query.

In our case, if we inspect the execution plan for the aforementioned SQL query, we can see that it fetches 100,000 records from the database (e.g., 10,000 matching post entries multiplied by 10 post_comments for each post):

LOG:  duration: 368.062 ms
Query Text: 
    SELECT p1_0.id,
           c1_0.post_id,
           c1_0.id,
           c1_0.created_on,
           c1_0.review,
           p1_0.created_on,
           p1_0.title
    FROM post p1_0
    LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id
    WHERE p1_0.title like $1 ESCAPE ''
    ORDER BY p1_0.created_on ASC
Sort  
  (cost=2659.65..2660.90 rows=500 width=1072) 
  (actual time=137.324..168.576 rows=100000 loops=1)
  Sort Key: p1_0.created_on
  Sort Method: external merge  Disk: 13512kB
  ->  Hash Right Join  
        (cost=248.63..2637.24 rows=500 width=1072) 
        (actual time=4.356..51.336 rows=100000 loops=1)
        Hash Cond: (c1_0.post_id = p1_0.id)
        ->  Seq Scan on post_comment c1_0  
            (cost=0.00..2126.00 rows=100000 width=540) 
            (actual time=0.033..14.343 rows=100000 loops=1)
        ->  Hash  
              (cost=248.00..248.00 rows=50 width=532) 
              (actual time=4.306..4.307 rows=10000 loops=1)
              Buckets: 16384 (originally 1024)  
              Batches: 1 (originally 1)  Memory Usage: 1075kB
              ->  Seq Scan on post p1_0  
                    (cost=0.00..248.00 rows=50 width=532) 
                    (actual time=0.056..2.470 rows=10000 loops=1)
                    Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text)

Without using SQL-level pagination, such SQL queries are prone to fetching all the post records matching the filtering criteria along with all their associated post_comments.

Fetching the parent-child hierarchy while limiting the parent entities using SQL

Spring 6 uses Hibernate 6, and starting with this version, JPQL queries can use Window Functions and Derived Tables, which we can also use for our business use case.

So, a much more efficient way to JOIN FETCH the child collection while also using pagination on the parent entities is to use the following Spring @Query:

@Query("""
    select p
    from Post p
    left join fetch p.comments pc
    where p.id in (
        select id
        from (
          select
             id as id,
             dense_rank() over (order by createdOn ASC) as ranking
          from Post
          where title like :titlePattern
        ) pr
        where ranking <= :maxCount
    )
    """
)
List<Post> findFirstByTitleWithCommentsByTitle(
    @Param("titlePattern") String titlePattern,
    @Param("maxCount") int maxCount
);

When running the findFirstByTitleWithCommentsByTitle query method:

List<Post> posts = postRepository.findFirstByTitleWithCommentsByTitle(
    "High-Performance Java Persistence %",
    maxCount
);

Hibernate generates the following SQL query:

SELECT 
    p1_0.id,
    c1_0.post_id,
    c1_0.id,
    c1_0.created_on,
    c1_0.review,
    p1_0.created_on,
    p1_0.title
FROM 
    post p1_0
LEFT JOIN 
    post_comment c1_0 ON p1_0.id=c1_0.post_id
WHERE p1_0.id IN (
    SELECT 
        pr1_0.id
    FROM (
        SELECT 
            p2_0.id,
            dense_rank() over(
                ORDER BY p2_0.created_on ASC
            )
        FROM 
            post p2_0
        WHERE 
            p2_0.title LIKE 'High-Performance Java Persistence %' ESCAPE ''
    ) pr1_0(id, ranking)
    WHERE 
        pr1_0.ranking<=25
 )

The execution plan for this query looks as follows:

LOG:  duration: 8.569 ms
Query Text:     
    SELECT p1_0.id,
           c1_0.post_id,
           c1_0.id,
           c1_0.created_on,
           c1_0.review,
           p1_0.created_on,
           p1_0.title
    FROM post p1_0
    LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id
    WHERE p1_0.id in
        (SELECT pr1_0.id
         FROM
           (SELECT p2_0.id,
                   dense_rank() over(
                                     ORDER BY p2_0.created_on ASC)
            FROM post p2_0
            WHERE p2_0.title like $1 ESCAPE '') pr1_0(id, ranking)
         WHERE pr1_0.ranking<=$2)
Nested Loop Left Join  
  (cost=251.70..1228.20 rows=500 width=1072) 
  (actual time=5.630..7.673 rows=250 loops=1)
  ->  Hash Semi Join  
        (cost=251.41..501.22 rows=50 width=532) 
        (actual time=5.532..7.240 rows=25 loops=1)
        Hash Cond: (p1_0.id = pr1_0.id)
        ->  Seq Scan on post p1_0  
            (cost=0.00..223.00 rows=10000 width=532) 
            (actual time=0.018..0.821 rows=10000 loops=1)
        ->  Hash  
              (cost=250.79..250.79 rows=50 width=8) 
              (actual time=5.500..5.502 rows=25 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Subquery Scan on pr1_0  
                    (cost=249.41..250.79 rows=50 width=8) 
                    (actual time=5.452..5.488 rows=25 loops=1)
                    ->  WindowAgg  
                          (cost=249.41..250.29 rows=50 width=24) 
                          (actual time=5.451..5.484 rows=25 loops=1)
                          Run Condition: (dense_rank() OVER (?) <= '25'::bigint)
                          ->  Sort  
                                (cost=249.41..249.54 rows=50 width=16) 
                                (actual time=5.415..5.417 rows=27 loops=1)
                                Sort Key: p2_0.created_on
                                Sort Method: quicksort  Memory: 931kB
                                ->  Seq Scan on post p2_0  
                                      (cost=0.00..248.00 rows=50 width=16) 
                                      (actual time=0.024..3.271 rows=10000 loops=1)
                                      Filter: (
                                        (title)::text ~~ 
                                        'High-Performance Java Persistence %'::text
                                      )
  ->  Index Scan using idx_post_comment_post_id on post_comment c1_0  
        (cost=0.29..9.54 rows=500 width=540) 
        (actual time=0.007..0.013 rows=10 loops=25)
        Index Cond: (post_id = p1_0.id)

Compared to the previous query that took 368 ms, this one takes only 8 ms because it only fetches 250 records from the database, not 100_000 rows.

Awesome, right?

Fetching the parent-child hierarchy using Blaze Persistence

Another way to fetch a parent-child hierarchy while painting the parent entities is to use Blaze Persistence.

For that, we will create a CustomPostRepository:

public interface CustomPostRepository {

    List<Post> findAllByTitleWithComments(
        String titlePattern, 
        PageRequest pageRequest
    );
}

And we will provide the following implementation for it:

public class CustomPostRepositoryImpl implements CustomPostRepository {

    @PersistenceContext
    private EntityManager entityManager;

    @Autowired
    private CriteriaBuilderFactory criteriaBuilderFactory;

    @Override
    public List<Post> findAllByTitleWithComments(
            String titlePattern, 
            PageRequest pageRequest) {
        return criteriaBuilderFactory
            .create(entityManager, Post.class)
            .fetch(Post_.COMMENTS)
            .where(Post_.TITLE)
                .like().value(titlePattern).noEscape()
            .orderBy(Post_.CREATED_ON, true)
            .orderBy(Post_.ID, true)
            .page(
                (int) pageRequest.getOffset(), 
                pageRequest.getPageSize()
            )
            .withCountQuery(false)
            .getResultList();
    }
}

The findAllByTitleWithComments query method uses Blaze Persistence to create the query dynamically, which uses both JOIN FETCH and Post entity pagination.

To use the methods from the CustomPostRepository, the PostRepository will extend from it:

@Repository
public interface PostRepository 
    extends BaseJpaRepository<Post, Long>, 
            CustomPostRepository {
            
}

For more details about using custom Repositories with Spring Data JPA< check out this article as well.

When calling the findAllByTitleWithComments method defined by the CustomPostRepository:

List<Post> posts = postRepository.findAllByTitleWithComments(
    "High-Performance Java Persistence %",
    PageRequest.of(0, maxCount, Sort.by("createdOn"))
);

Hibernate will generate the following SQL query:

SELECT 
    p1_0.id,
    c1_0.post_id,
    c1_0.id,
    c1_0.created_on,
    c1_0.review,
    p1_0.created_on,
    p1_0.title
FROM 
    post p1_0
LEFT JOIN 
    post_comment c1_0 ON p1_0.id=c1_0.post_id
WHERE p1_0.id in (
    SELECT 
        p2_0.id
    FROM 
        post p2_0
    WHERE 
        p2_0.title like 'High-Performance Java Persistence %' ESCAPE ''
    ORDER BY 
        p2_0.created_on ASC,
        p2_0.id ASC 
    FETCH FIRST 25 ROWS ONLY
)
ORDER BY 
    p1_0.created_on ASC,
    p1_0.id ASC

And when analyzing the execution plan:

LOG:  duration: 6.075 ms  plan:
    Query Text: 
        SELECT p1_0.id,
               c1_0.post_id,
               c1_0.id,
               c1_0.created_on,
               c1_0.review,
               p1_0.created_on,
               p1_0.title
        FROM post p1_0
        LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id
        WHERE p1_0.id in
            (SELECT p2_0.id
             FROM post p2_0
             WHERE p2_0.title like $1 ESCAPE ''
             ORDER BY p2_0.created_on ASC,p2_0.id ASC FETCH FIRST 25 ROWS ONLY)
        ORDER BY p1_0.created_on ASC,
                 p1_0.id ASC
    Sort  
      (cost=795.08..795.71 rows=250 width=1072) 
      (actual time=5.384..5.406 rows=250 loops=1)
      Sort Key: p1_0.created_on, p1_0.id
      Sort Method: quicksort  Memory: 64kB
      ->  Nested Loop Left Join  
            (cost=250.36..785.12 rows=250 width=1072) 
            (actual time=5.083..5.250 rows=250 loops=1)
            ->  Nested Loop  
                  (cost=250.07..421.63 rows=25 width=532) 
                  (actual time=5.046..5.106 rows=25 loops=1)
                  ->  HashAggregate  
                        (cost=249.79..250.04 rows=25 width=8) 
                        (actual time=4.986..4.996 rows=25 loops=1)
                        Group Key: p2_0.id
                        Batches: 1  Memory Usage: 24kB
                        ->  Limit  
                              (cost=249.41..249.47 rows=25 width=16) 
                              (actual time=4.970..4.974 rows=25 loops=1)
                              ->  Sort  
                                    (cost=249.41..249.54 rows=50 width=16) 
                                    (actual time=4.968..4.970 rows=25 loops=1)
                                    Sort Key: p2_0.created_on, p2_0.id
                                    Sort Method: top-N heapsort  Memory: 26kB
                                    ->  Seq Scan on post p2_0  
                                          (cost=0.00..248.00 rows=50 width=16) 
                                          (actual time=0.072..3.628 rows=10000 loops=1)
                                          Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text)
                  ->  Index Scan using post_pkey on post p1_0  
                        (cost=0.29..6.86 rows=1 width=532) 
                        (actual time=0.004..0.004 rows=1 loops=25)
                        Index Cond: (id = p2_0.id)
            ->  Index Scan using idx_post_comment_post_id on post_comment c1_0  
                  (cost=0.29..9.54 rows=500 width=540) 
                  (actual time=0.003..0.004 rows=10 loops=25)
                  Index Cond: (post_id = p1_0.id)

We can see that the executed SQL query is extremely efficient. So, Blaze Persistence offers a solution that’s both convenient and efficient.

For more details about Blaze Persistence, check out this article as well.

Split the filtering and the fetching into two separate queries

The previous solution works great for a Top-N query, but if you want to expand it to a Next-N query, you’d need to use Keyset Pagination.

However, there’s another approach that is worth mentioning.

We can split the filtering and the fetching into two separate queries like this:

@Query(
    value = """
        select p.id
        from Post p
        where p.title like :titlePattern
        """,
    countQuery = """
        select count(p)
        from Post p
        where p.title like :titlePattern
        """
)
List<Long> findAllPostIdsByTitle(
    @Param("titlePattern") String titlePattern, 
    Pageable pageable
);

@Query("""
    select p
    from Post p
    left join fetch p.comments
    where p.id in :postIds
    """
)
List<Post> findAllByIdWithComments(
    @Param("postIds") List<Long> postIds
);

The first query filters the Post identifiers that match the filtering criteria and fetches the page of identifiers we are interested in.

And the second query takes a list of Post identifiers and fetches the matching Post entities along with their associated PostComment child entities.

The ForumService would then define a single transaction method that calls these two PostRepository query methods:

@Transactional(readOnly = true)
public List<Post> findAllPostsByTitleWithComments(
        String titlePattern, 
        PageRequest pageRequest) {
    return postRepository.findAllByIdWithComments(
        postRepository.findAllPostIdsByTitle(
            titlePattern,
            pageRequest
        )
    );
}

The reason why we are using a Service method is that we want these two queries to be executed in the context of the same database transaction.

And when calling the findAllPostsByTitleWithComments service method:

List<Post> posts = forumService.findAllPostsByTitleWithComments(
    "High-Performance Java Persistence %",
    PageRequest.of(
        0, 
        maxCount, 
        Sort.by("createdOn")
    )
);

We would get two very simple SQL queries executed by Hibernate:

-- The findAllPostIdsByTitle query
SELECT 
    p1_0.id
FROM 
    post p1_0
WHERE 
    p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE ''
ORDER BY 
    p1_0.created_on ASC
OFFSET 0 ROWS 
FETCH FIRST 25 ROWS ONLY

-- The findAllByIdWithComments query
SELECT 
    p1_0.id,
    c1_0.post_id,
    c1_0.id,
    c1_0.created_on,
    c1_0.review,
    p1_0.created_on,
    p1_0.title
FROM 
    post p1_0
LEFT JOIN 
    post_comment c1_0 ON p1_0.id=c1_0.post_id
WHERE p1_0.id IN (
    1, 2, 3, 4, 5, 
    6, 7, 8, 9, 10,
    11, 12, 13, 14, 15,
    16, 17, 18, 19, 20,
    21, 22, 23, 24, 25
)

The execution plans for these two queries looks as follows:

LOG:  duration: 4.084 ms
    Query Text:     
        SELECT p1_0.id
        FROM post p1_0
        WHERE p1_0.title like $1 ESCAPE ''
        ORDER BY p1_0.created_on ASC
        OFFSET $2 ROWS FETCH FIRST $3 ROWS ONLY
    Limit  
      (cost=249.41..249.47 rows=25 width=16) 
      (actual time=4.069..4.072 rows=25 loops=1)
      ->  Sort  
            (cost=249.41..249.54 rows=50 width=16) 
            (actual time=4.067..4.069 rows=25 loops=1)
            Sort Key: created_on
            Sort Method: top-N heapsort  Memory: 26kB
            ->  Seq Scan on post p1_0  
                  (cost=0.00..248.00 rows=50 width=16) 
                  (actual time=0.082..2.934 rows=10000 loops=1)
                  Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text)

LOG:  duration: 20.008 ms
    Query Text:     
        SELECT p1_0.id,
               c1_0.post_id,
               c1_0.id,
               c1_0.created_on,
               c1_0.review,
               p1_0.created_on,
               p1_0.title
        FROM post p1_0
        LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id
        WHERE p1_0.id in(
            $1, $2, $3, $4, $5,
            $6, $7, $8, $9, $10,
            $11, $12, $13, $14, $15,
            $16, $17, $18, $19, $20,
            $21, $22, $23, $24, $25
        )
    Hash Right Join  
      (cost=142.23..2530.84 rows=250 width=1072) 
      (actual time=0.091..19.621 rows=250 loops=1)
      Hash Cond: (c1_0.post_id = p1_0.id)
      ->  Seq Scan on post_comment c1_0  
            (cost=0.00..2126.00 rows=100000 width=540) 
            (actual time=0.022..11.337 rows=100000 loops=1)
      ->  Hash  
            (cost=141.92..141.92 rows=25 width=532) 
            (actual time=0.062..0.063 rows=25 loops=1)
            Buckets: 1024  Batches: 1  Memory Usage: 11kB
            ->  Bitmap Heap Scan on post p1_0  
                  (cost=79.38..141.92 rows=25 width=532) 
                  (actual time=0.053..0.055 rows=25 loops=1)
                  Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}'::bigint[]))
                  Heap Blocks: exact=1
                  ->  Bitmap Index Scan on post_pkey  
                        (cost=0.00..79.31 rows=25 width=0) 
                        (actual time=0.050..0.050 rows=25 loops=1)
                        Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}'::bigint[]))

This solution will work just fine for both Top-N queries that display the first page and Next-N queries that can display the remaining pages.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The best way to use JOIN FETCH on a child collection while limiting the parent entities is to use SQL-level pagination, and starting with Spring 6 and Hibernate 6, this can be done using a single JPQL query.

On Spring 5 and Hibernate 5, or if you need both Top-N and Next-N queries, you can use the two-query approach. All in all, just avoid the initial JPQL query that mixes both JOIN FETCH and Pagination since it could lead to terrible performance issues.

Transactions and Concurrency Control eBook

4 Comments on “The best way to use JOIN FETCH and Pagination with Spring

  1. You forget the other option, which is to use the Spring Data integration of Blaze-Persistence that generates these kinds of pagination queries automatically 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.