Ticket #138 (assigned enhancement)

Opened 2 years ago

Last modified 1 year ago

Radial and Treemap views when combined with filters have very long running queries

Reported by: bruno Assigned to: julians (accepted)
Priority: critical Milestone:
Component: GUI Version: 0.1
Keywords: Cc:

Description

Each level that's represented in a view from the Radial or Treemap, when filtered (in this example, I used "root" or "uid=0".

SELECT sum(size) AS sum_size FROM files  JOIN directories ON files.directory_id = directories.id WHERE (files.name LIKE '%' AND files.uid = 0) AND (directories.lft >= (SELECT lft FROM directories WHERE id=65313)  AND directories.lft <= (SELECT rgt FROM directories WHERE id=65313)  AND directories.server_id = 2)

The explain plan for each of these queries is:

-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=76161.92..76161.93 rows=1 width=8)
   InitPlan
     ->  Index Scan using directory_info_pkey on directories  (cost=0.00..8.57 rows=1 width=4)
           Index Cond: (id = 64466)
     ->  Index Scan using directory_info_pkey on directories  (cost=0.00..8.57 rows=1 width=4)
           Index Cond: (id = 64466)
   ->  Hash Join  (cost=1198.69..76139.09 rows=2273 width=8)
         Hash Cond: (files.directory_id = directories.id)
         ->  Seq Scan on files  (cost=0.00..71275.43 rows=728449 width=12)
               Filter: (((name)::text ~~ '%'::text) AND (uid = 0))
         ->  Hash  (cost=1197.99..1197.99 rows=282 width=4)
               ->  Bitmap Heap Scan on directories  (cost=145.12..1197.99 rows=282 width=4)
                     Recheck Cond: ((lft >= $0) AND (lft <= $1))
                     Filter: (server_id = 2)
                     ->  Bitmap Index Scan on index_directories_on_lft  (cost=0.00..145.05 rows=452 width=0)
                           Index Cond: ((lft >= $0) AND (lft <= $1))
(16 rows)

and issuing an "explain analyze" takes around 5 seconds. Take into account that the whole database is stored in a RAM file system.

I think we need to look at optimising these queries (if we can).

Change History

02/23/07 18:26:08 changed by julians

  • type changed from defect to enhancement.

Changing type to enhancement.

It's worth noting that uid 0 owns approx. 50% of the files on the filesystem in question (737,171 files out of 1,502,822). If you analyze the same query with uid 19198, owning only 88 files, you get the following result:

 Aggregate  (cost=6534.43..6534.44 rows=1 width=8)
   InitPlan
     ->  Index Scan using directory_info_pkey on directories  (cost=0.00..8.57 rows=1 width=4)
           Index Cond: (id = 65313)
     ->  Index Scan using directory_info_pkey on directories  (cost=0.00..8.57 rows=1 width=4)
           Index Cond: (id = 65313)
   ->  Nested Loop  (cost=145.12..6517.24 rows=18 width=8)
         ->  Bitmap Heap Scan on directories  (cost=145.12..1372.87 rows=274 width=4)
               Recheck Cond: ((lft >= $0) AND (lft <= $1))
               Filter: (server_id = 2)
               ->  Bitmap Index Scan on index_directories_on_lft  (cost=0.00..145.05 rows=452 width=0)
                     Index Cond: ((lft >= $0) AND (lft <= $1))
         ->  Index Scan using index_files_uid_directory_id on files  (cost=0.00..18.73 rows=4 width=12)
               Index Cond: ((files.uid = 19198) AND (files.directory_id = directories.id))
               Filter: ((name)::text ~~ '%'::text)
(15 rows)

This means that the sequential scan found in the query for uid 0 is probably justified - since half of the rows are subject to the query, might as well walk through all of them instead of using the index.

02/23/07 18:39:41 changed by julians

This can most likely be reduced to one query involving SUM by using SQL's GROUP BY and HAVING operators.

Also, dropping the path column from table files (which we can do once the sort-by-size for the flat view is committed) will probably help as the sequential scan should become faster.

02/23/07 18:51:57 changed by julians

The N queries submitted (for N directories in the "outer ring" of the radial diagram) CAN be combined into one statement along those lines:

 SELECT SUM(size) AS sum_size, lft 
   FROM files  
   JOIN directories ON files.directory_id = directories.id 
   WHERE (files.uid = 0) 
     AND directories.server_id=2 
     AND directories.level >= 
         ((SELECT level FROM directories WHERE id=123) + number_of_display_levels - 1) 
   GROUP BY directories.lft 
   HAVING directories.lft >= (SELECT lft FROM directories WHERE id=123)  
      AND directories.lft <= (SELECT rgt FROM directories WHERE id=123)

(The GROUP BY clause can probably be improved.)

This should speed up this part of the database interaction by factor N (but the results might need some post processing if the GROUP BY clause can't be optimized.)

02/24/07 15:15:50 changed by julians

  • owner changed from kenji to julians.
  • status changed from new to assigned.

The above optimization (including improved GROUP BY clause) is implemented in r1004. Please let me know if that helps.

02/26/07 11:53:24 changed by bruno

  • version set to 0.1.

10/18/07 16:06:13 changed by matthewl

  • milestone deleted.

Milestone Uncommited Work deleted