index scan vs. table scan - ibm-db2
This is a discussion on index scan vs. table scan - ibm-db2 ; I have an application that runs a query something like this: select resp_desc, count(*) from userid.survey_fact a, userid.client_service_dim b, userid.samp_group_fact c where b.client_id = 8877 and a.client_service_id = b.client_service_sid and a.survey_id = c.survey_id and mailing_date_id between 20080101 and 20090101 group ...
![]() |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| select resp_desc, count(*) from userid.survey_fact a, userid.client_service_dim b, userid.samp_group_fact c where b.client_id = 8877 and a.client_service_id = b.client_service_sid and a.survey_id = c.survey_id and mailing_date_id between 20080101 and 20090101 group by resp_desc; This query does a full table scan on the samp_group_fact table and I can't understand why. According to the explain plan (see below), the optimizer is expecting about 330,000 rows to be fetched from survey_fact. Since the HSJOIN expects to return only 103,013 rows and that the samp_group_fact table actually has more than 48 million total rows, I really can't understand why the optimizer would choose not to utilize the index. The stats are all up to date. Thanks in advance for any help. Greig DB2 Universal Database Version 9.1, 5622-044 (c) Copyright IBM Corp. 1991, 2006 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool ******************** EXPLAIN INSTANCE ******************** DB2_VERSION: 09.01.7 SOURCE_NAME: SQLC2F0A SOURCE_SCHEMA: NULLID SOURCE_VERSION: EXPLAIN_TIME: 2009-09-08-17.17.21.646137 EXPLAIN_REQUESTER: USERID Database Context: ---------------- Parallelism: None CPU Speed: 2.361721e-07 Comm Speed: 100 Buffer Pool size: 7000 Sort Heap size: 256 Database Heap size: 1200 Lock List size: 100 Maximum Lock List: 10 Average Applications: 1 Locks Available: 640 Package Context: --------------- SQL Type: Dynamic Optimization Level: 5 Blocking: Block All Cursors Isolation Level: Cursor Stability ---------------- STATEMENT 1 SECTION 201 ---------------- QUERYNO: 1 QUERYTAG: CLP Statement Type: Select Updatable: No Deletable: No Query Degree: 1 Original Statement: ------------------ select resp_desc, count(*) from userid.survey_fact a, userid.client_service_dim b, userid.samp_group_fact c where b.client_id = 8877 and a.client_service_id = b.client_service_sid and a.survey_id = c.survey_id and mailing_date_id between 20080101 and 20090101 group by resp_desc Optimized Statement: ------------------- SELECT Q5.$C0 AS "RESP_DESC", Q5.$C1 FROM (SELECT Q4.$C0, COUNT(*) FROM (SELECT Q1.RESP_DESC FROM USERID.SAMP_GROUP_FACT AS Q1, USERID.CLIENT_SERVICE_DIM AS Q2, USERID.SURVEY_FACT AS Q3 WHERE (Q3.MAILING_DATE_ID <= 20090101) AND (20080101 <= Q3.MAILING_DATE_ID) AND (Q3.SURVEY_ID = Q1.SURVEY_ID) AND (Q3.CLIENT_SERVICE_ID = Q2.CLIENT_SERVICE_SID) AND (Q2.CLIENT_ID = 8877)) AS Q4 GROUP BY Q4.$C0) AS Q5 Access Plan: ----------- Total Cost: 1.08498e+06 Query Degree: 0 Rows RETURN ( 1) Cost I/O | 58880 GRPBY ( 2) 1.08498e+06 516914 | 58880 TBSCAN ( 3) 1.08497e+06 516914 | 58880 SORT ( 4) 1.08465e+06 516761 | 103013 HSJOIN ( 5) 1.0826e+06 516608 /--------+--------\ 4.8195e+07 329934 TBSCAN NLJOIN ( 6) ( 7) 762852 180285 428492 71089.1 | /------+-------\ 4.8195e+07 225 1466.37 TABLE: USERID TBSCAN FETCH SAMP_GROUP_FACT ( 8) ( 14) Q1 228.537 474.637 31.5519 186.843 | /---+----\ 225 4638.77 1.62881e+08 SORT IXSCAN TABLE: USERID ( 9) ( 15) SURVEY_FACT 228.483 71.8377 Q3 31.5519 5.5801 | | 225 1.62881e+08 FETCH INDEX: USERID ( 10) SF_CLI_IDX 228.407 Q3 31.5519 /----+-----\ 225 35113 RIDSCN TABLE: USERID ( 11) CLIENT_SERVICE_DIM 13.289 Q2 1.01885 | 225 SORT ( 12) 13.2807 1.01885 | 225 IXSCAN ( 13) 13.2046 1.01885 | 35113 INDEX: USERID CLI_SRV_DIM_IDX_1 Q2 Extended Diagnostic Information: -------------------------------- No extended Diagnostic Information for this statment. Plan Details: ------------- 1) RETURN: (Return Result) Cumulative Total Cost: 1.08498e+06 Cumulative CPU Cost: 4.30831e+10 Cumulative I/O Cost: 516914 Cumulative Re-Total Cost: 329.285 Cumulative Re-CPU Cost: 2.04703e+08 Cumulative Re-I/O Cost: 153 Cumulative First Row Cost: 1.0847e+06 Estimated Bufferpool Buffers: 175 Arguments: --------- BLDLEVEL: (Build level) DB2 v9.1.0.7 : s090308 ENVVAR : (Environment Variable) DB2_ANTIJOIN = Y ENVVAR : (Environment Variable) DB2_EXTENDED_OPTIMIZATION = Y HEAPUSE : (Maximum Statement Heap Usage) 112 Pages PREPTIME: (Statement prepare time) 0 milliseconds STMTHEAP: (Statement heap size) 4096 Input Streams: ------------- 19) From Operator #2 Estimated number of rows: 58880 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q6.$C1+Q6.RESP_DESC 2) GRPBY : (Group By) Cumulative Total Cost: 1.08498e+06 Cumulative CPU Cost: 4.30569e+10 Cumulative I/O Cost: 516914 Cumulative Re-Total Cost: 323.097 Cumulative Re-CPU Cost: 1.78501e+08 Cumulative Re-I/O Cost: 153 Cumulative First Row Cost: 1.0847e+06 Estimated Bufferpool Buffers: 175 Arguments: --------- AGGMODE : (Aggregration Mode) FINAL GROUPBYC: (Group By columns) TRUE GROUPBYN: (Number of Group By columns) 1 GROUPBYR: (Group By requirement) 1: Q4.RESP_DESC ONEFETCH: (One Fetch flag) FALSE Input Streams: ------------- 18) From Operator #3 Estimated number of rows: 58880 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC(A) Output Streams: -------------- 19) To Operator #1 Estimated number of rows: 58880 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q6.$C1+Q6.RESP_DESC 3) TBSCAN: (Table Scan) Cumulative Total Cost: 1.08497e+06 Cumulative CPU Cost: 4.30422e+10 Cumulative I/O Cost: 516914 Cumulative Re-Total Cost: 319.62 Cumulative Re-CPU Cost: 1.63781e+08 Cumulative Re-I/O Cost: 153 Cumulative First Row Cost: 1.0847e+06 Estimated Bufferpool Buffers: 175 Arguments: --------- MAXPAGES: (Maximum pages for prefetch) ALL PREFETCH: (Type of Prefetch) SEQUENTIAL SCANDIR : (Scan Direction) FORWARD Input Streams: ------------- 17) From Operator #4 Estimated number of rows: 58880 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC(A) Output Streams: -------------- 18) To Operator #2 Estimated number of rows: 58880 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC(A) 4) SORT : (Sort) Cumulative Total Cost: 1.08465e+06 Cumulative CPU Cost: 4.28784e+10 Cumulative I/O Cost: 516761 Cumulative Re-Total Cost: 0 Cumulative Re-CPU Cost: 0 Cumulative Re-I/O Cost: 153 Cumulative First Row Cost: 1.08465e+06 Estimated Bufferpool Buffers: 437302 Arguments: --------- AGGMODE : (Aggregration Mode) PARTIAL DUPLWARN: (Duplicates Warning flag) FALSE NUMROWS : (Estimated number of rows) 58880 ROWWIDTH: (Estimated width of rows) 20 SORTKEY : (Sort Key column) 1: Q4.RESP_DESC(A) SPILLED : (Pages spilled to bufferpool or disk) 175 TEMPSIZE: (Temporary Table Page Size) 8192 UNIQUE : (Uniqueness required flag) FALSE Input Streams: ------------- 16) From Operator #5 Estimated number of rows: 103013 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC Output Streams: -------------- 17) To Operator #3 Estimated number of rows: 58880 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC(A) 5) HSJOIN: (Hash Join) Cumulative Total Cost: 1.0826e+06 Cumulative CPU Cost: 4.25133e+10 Cumulative I/O Cost: 516608 Cumulative Re-Total Cost: 1.0826e+06 Cumulative Re-CPU Cost: 4.25133e+10 Cumulative Re-I/O Cost: 516608 Cumulative First Row Cost: 1.0826e+06 Estimated Bufferpool Buffers: 437127 Arguments: --------- BITFLTR : (Hash Join Bit Filter used) 325900 EARLYOUT: (Early Out flag) NONE HASHCODE: (Hash Code Size) 24 BIT HASHTBSZ: (Number of hash table entries) 329934 TEMPSIZE: (Temporary Table Page Size) 8192 Predicates: ---------- 5) Predicate used in Join Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 6.4783e-09 Predicate Text: -------------- (Q3.SURVEY_ID = Q1.SURVEY_ID) Input Streams: ------------- 2) From Operator #6 Estimated number of rows: 4.8195e +07 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q1.RESP_DESC+Q1.SURVEY_ID 15) From Operator #7 Estimated number of rows: 329934 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID+Q3.SURVEY_ID Output Streams: -------------- 16) To Operator #4 Estimated number of rows: 103013 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.RESP_DESC 6) TBSCAN: (Table Scan) Cumulative Total Cost: 762852 Cumulative CPU Cost: 3.00314e+10 Cumulative I/O Cost: 428492 Cumulative Re-Total Cost: 762852 Cumulative Re-CPU Cost: 3.00314e+10 Cumulative Re-I/O Cost: 428492 Cumulative First Row Cost: 12.8528 Estimated Bufferpool Buffers: 428492 Arguments: --------- JN INPUT: (Join input leg) OUTER MAXPAGES: (Maximum pages for prefetch) ALL PREFETCH: (Type of Prefetch) SEQUENTIAL ROWLOCK : (Row Lock intent) NEXT KEY SHARE SCANDIR : (Scan Direction) FORWARD TABLOCK : (Table Lock intent) INTENT SHARE TBISOLVL: (Table access Isolation Level) CURSOR STABILITY Input Streams: ------------- 1) From Object USERID.SAMP_GROUP_FACT Estimated number of rows: 4.8195e +07 Number of columns: 3 Subquery predicate ID: Not Applicable Column Names: ------------ +Q1.$RID$+Q1.RESP_DESC+Q1.SURVEY_ID Output Streams: -------------- 2) To Operator #5 Estimated number of rows: 4.8195e +07 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q1.RESP_DESC+Q1.SURVEY_ID 7) NLJOIN: (Nested Loop Join) Cumulative Total Cost: 180285 Cumulative CPU Cost: 2.72554e+09 Cumulative I/O Cost: 71089.1 Cumulative Re-Total Cost: 72420 Cumulative Re-CPU Cost: 2.60062e+09 Cumulative Re-I/O Cost: 34852.2 Cumulative First Row Cost: 267.254 Estimated Bufferpool Buffers: 42039.8 Arguments: --------- EARLYOUT: (Early Out flag) NONE FETCHMAX: (Override for FETCH MAXPAGES) IGNORE ISCANMAX: (Override for ISCAN MAXPAGES) IGNORE JN INPUT: (Join input leg) INNER Predicates: ---------- 6) Predicate used in Join Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 2.84795e-05 Predicate Text: -------------- (Q3.CLIENT_SERVICE_ID = Q2.CLIENT_SERVICE_SID) Input Streams: ------------- 10) From Operator #8 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID(A) 14) From Operator #14 Estimated number of rows: 1466.37 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.CLIENT_SERVICE_ID(A)+Q3.SURVEY_ID Output Streams: -------------- 15) To Operator #5 Estimated number of rows: 329934 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID+Q3.SURVEY_ID 8) TBSCAN: (Table Scan) Cumulative Total Cost: 228.537 Cumulative CPU Cost: 2.01104e+06 Cumulative I/O Cost: 31.5519 Cumulative Re-Total Cost: 0.165827 Cumulative Re-CPU Cost: 702144 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 228.484 Estimated Bufferpool Buffers: 0 Arguments: --------- JN INPUT: (Join input leg) OUTER MAXPAGES: (Maximum pages for prefetch) ALL PREFETCH: (Type of Prefetch) NONE SCANDIR : (Scan Direction) FORWARD Input Streams: ------------- 9) From Operator #9 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID(A) Output Streams: -------------- 10) To Operator #7 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID(A) 9) SORT : (Sort) Cumulative Total Cost: 228.483 Cumulative CPU Cost: 1.7828e+06 Cumulative I/O Cost: 31.5519 Cumulative Re-Total Cost: 0.111924 Cumulative Re-CPU Cost: 473907 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 228.483 Estimated Bufferpool Buffers: 32.5519 Arguments: --------- DUPLWARN: (Duplicates Warning flag) FALSE NUMROWS : (Estimated number of rows) 225 ROWWIDTH: (Estimated width of rows) 8 SORTKEY : (Sort Key column) 1: Q2.CLIENT_SERVICE_SID(A) TEMPSIZE: (Temporary Table Page Size) 8192 UNIQUE : (Uniqueness required flag) FALSE Input Streams: ------------- 8) From Operator #10 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID Output Streams: -------------- 9) To Operator #8 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID(A) 10) FETCH : (Fetch) Cumulative Total Cost: 228.407 Cumulative CPU Cost: 1.46062e+06 Cumulative I/O Cost: 31.5519 Cumulative Re-Total Cost: 0.111924 Cumulative Re-CPU Cost: 473907 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 26.1332 Estimated Bufferpool Buffers: 32.5519 Arguments: --------- MAX RIDS: (Maximum RIDs per list prefetch request) 512 PREFETCH: (Type of Prefetch) LIST ROWLOCK : (Row Lock intent) NEXT KEY SHARE TABLOCK : (Table Lock intent) INTENT SHARE TBISOLVL: (Table access Isolation Level) CURSOR STABILITY Predicates: ---------- 7) Sargable Predicate Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00640788 Predicate Text: -------------- (Q2.CLIENT_ID = 8877) Input Streams: ------------- 6) From Operator #11 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.$RID$(A) 7) From Object USERID.CLIENT_SERVICE_DIM Estimated number of rows: 35113 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID+Q2.CLIENT_ID Output Streams: -------------- 8) To Operator #9 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.CLIENT_SERVICE_SID 11) RIDSCN: (Row Identifier Scan) Cumulative Total Cost: 13.289 Cumulative CPU Cost: 832971 Cumulative I/O Cost: 1.01885 Cumulative Re-Total Cost: 0.016967 Cumulative Re-CPU Cost: 71841.7 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 13.2807 Estimated Bufferpool Buffers: 2.01885 Arguments: --------- NUMROWS : (Estimated number of rows) 225 Input Streams: ------------- 5) From Operator #12 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.$RID$(A) Output Streams: -------------- 6) To Operator #10 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.$RID$(A) 12) SORT : (Sort) Cumulative Total Cost: 13.2807 Cumulative CPU Cost: 797784 Cumulative I/O Cost: 1.01885 Cumulative Re-Total Cost: 0.00865681 Cumulative Re-CPU Cost: 36654.7 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 13.2807 Estimated Bufferpool Buffers: 2.01885 Arguments: --------- DUPLWARN: (Duplicates Warning flag) TRUE NUMROWS : (Estimated number of rows) 225 ROWWIDTH: (Estimated width of rows) 16 SORTKEY : (Sort Key column) 1: Q2.$RID$(A) TEMPSIZE: (Temporary Table Page Size) 8192 UNIQUE : (Uniqueness required flag) TRUE Input Streams: ------------- 4) From Operator #13 Estimated number of rows: 225 Number of columns: 0 Subquery predicate ID: Not Applicable Output Streams: -------------- 5) To Operator #11 Estimated number of rows: 225 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.$RID$(A) 13) IXSCAN: (Index Scan) Cumulative Total Cost: 13.2046 Cumulative CPU Cost: 475600 Cumulative I/O Cost: 1.01885 Cumulative Re-Total Cost: 0.00865681 Cumulative Re-CPU Cost: 36654.7 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 12.8612 Estimated Bufferpool Buffers: 2.01885 Arguments: --------- MAXPAGES: (Maximum pages for prefetch) 1 PREFETCH: (Type of Prefetch) NONE ROWLOCK : (Row Lock intent) NONE SCANDIR : (Scan Direction) FORWARD TABLOCK : (Table Lock intent) INTENT NONE TBISOLVL: (Table access Isolation Level) CURSOR STABILITY Predicates: ---------- 7) Start Key Predicate Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00640788 Predicate Text: -------------- (Q2.CLIENT_ID = 8877) 7) Stop Key Predicate Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00640788 Predicate Text: -------------- (Q2.CLIENT_ID = 8877) Input Streams: ------------- 3) From Object USERID.CLI_SRV_DIM_IDX_1 Estimated number of rows: 35113 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q2.$RID$+Q2.CLIENT_ID Output Streams: -------------- 4) To Operator #12 Estimated number of rows: 225 Number of columns: 0 Subquery predicate ID: Not Applicable 14) FETCH : (Fetch) Cumulative Total Cost: 474.637 Cumulative CPU Cost: 1.21766e+07 Cumulative I/O Cost: 186.843 Cumulative Re-Total Cost: 385.646 Cumulative Re-CPU Cost: 1.16604e+07 Cumulative Re-I/O Cost: 185.843 Cumulative First Row Cost: 38.7704 Estimated Bufferpool Buffers: 1.53176e+06 Arguments: --------- JN INPUT: (Join input leg) INNER MAXPAGES: (Maximum pages for prefetch) 181 PREFETCH: (Type of Prefetch) SEQUENTIAL ROWLOCK : (Row Lock intent) NEXT KEY SHARE TABLOCK : (Table Lock intent) INTENT SHARE TBISOLVL: (Table access Isolation Level) CURSOR STABILITY Predicates: ---------- 3) Sargable Predicate Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.947733 Predicate Text: -------------- (Q3.MAILING_DATE_ID <= 20090101) 4) Sargable Predicate Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.36838 Predicate Text: -------------- (20080101 <= Q3.MAILING_DATE_ID) Input Streams: ------------- 12) From Operator #15 Estimated number of rows: 4638.77 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.CLIENT_SERVICE_ID(A) 13) From Object USERID.SURVEY_FACT Estimated number of rows: 1.62881e+08 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.MAILING_DATE_ID+Q3.SURVEY_ID Output Streams: -------------- 14) To Operator #7 Estimated number of rows: 1466.37 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.CLIENT_SERVICE_ID(A)+Q3.SURVEY_ID 15) IXSCAN: (Index Scan) Cumulative Total Cost: 71.8377 Cumulative CPU Cost: 564929 Cumulative I/O Cost: 5.5801 Cumulative Re-Total Cost: 58.8753 Cumulative Re-CPU Cost: 88952.1 Cumulative Re-I/O Cost: 4.5801 Cumulative First Row Cost: 25.7144 Estimated Bufferpool Buffers: 160823 Arguments: --------- MAXPAGES: (Maximum pages for prefetch) 4 PREFETCH: (Type of Prefetch) NONE ROWLOCK : (Row Lock intent) NEXT KEY SHARE SCANDIR : (Scan Direction) FORWARD TABLOCK : (Table Lock intent) INTENT SHARE TBISOLVL: (Table access Isolation Level) CURSOR STABILITY Predicates: ---------- 6) Start Key Predicate Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 2.84795e-05 Predicate Text: -------------- (Q3.CLIENT_SERVICE_ID = Q2.CLIENT_SERVICE_SID) 6) Stop Key Predicate Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 2.84795e-05 Predicate Text: -------------- (Q3.CLIENT_SERVICE_ID = Q2.CLIENT_SERVICE_SID) Input Streams: ------------- 11) From Object USERID.SF_CLI_IDX Estimated number of rows: 1.62881e+08 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.CLIENT_SERVICE_ID(A)+Q3.$RID$ Output Streams: -------------- 12) To Operator #14 Estimated number of rows: 4638.77 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q3.CLIENT_SERVICE_ID(A) Objects Used in Access Plan: --------------------------- Schema: USERID Name: CLI_SRV_DIM_IDX_1 Type: Index Time of creation: 2007-06-17-11.03.01.382438 Last statistics update: 2009-09-08-02.59.29.504485 Number of columns: 3 Number of rows: 35113 Width of rows: -1 Number of buffer pool pages: 197 Distinct row values: Yes Tablespace name: DIMIT_TBSP1 Tablespace overhead: 12.670000 Tablespace transfer rate: 0.180000 Source for statistics: Single Node Prefetch page count: 24 Container extent page count: 16 Index clustering statistic: 0.900742 Index leaf pages: 159 Index tree levels: 2 Index full key cardinality: 35113 Index first key cardinality: 7275 Index first 2 keys cardinality: 34626 Index first 3 keys cardinality: 35113 Index first 4 keys cardinality: -1 Index sequential pages: 59 Index page density: 47 Index avg sequential pages: 19 Index avg gap between sequences:31 Index avg random pages: 16 Fetch avg sequential pages: 31 Fetch avg gap between sequences:307 Fetch avg random pages: 1107 Index RID count: 35114 Index deleted RID count: 1 Index empty leaf pages: 0 Base Table Schema: USERID Base Table Name: CLIENT_SERVICE_DIM Columns in index: CLIENT_ID(A) SERV_TYPE(A) SITE_ID(A) Schema: USERID Name: SF_CLI_IDX Type: Index Time of creation: 2007-06-17-11.03.01.953638 Last statistics update: 2009-03-03-23.12.25.155122 Number of columns: 1 Number of rows: 162881175 Width of rows: -1 Number of buffer pool pages: 1372081 Distinct row values: No Tablespace name: FACTIT_TBSP1 Tablespace overhead: 12.670000 Tablespace transfer rate: 0.180000 Source for statistics: Single Node Prefetch page count: 24 Container extent page count: 16 Index clustering statistic: 0.914914 Index leaf pages: 160821 Index tree levels: 3 Index full key cardinality: 25547 Index first key cardinality: 25547 Index first 2 keys cardinality: -1 Index first 3 keys cardinality: -1 Index first 4 keys cardinality: -1 Index sequential pages: 556 Index page density: 19 Index avg sequential pages: 9 Index avg gap between sequences:2763 Index avg random pages: 3 Fetch avg sequential pages: 44 Fetch avg gap between sequences:112 Fetch avg random pages: 206 Index RID count: 162919863 Index deleted RID count: 38688 Index empty leaf pages: 0 Base Table Schema: USERID Base Table Name: SURVEY_FACT Columns in index: CLIENT_SERVICE_ID(A) Schema: USERID Name: CLIENT_SERVICE_DIM Type: Table Time of creation: 2008-01-27-01.31.48.312712 Last statistics update: 2009-09-08-02.59.29.504485 Number of columns: 7 Number of rows: 35113 Width of rows: 24 Number of buffer pool pages: 197 Number of data partitions: 1 Distinct row values: No Tablespace name: DIMIT_TBSP1 Tablespace overhead: 12.670000 Tablespace transfer rate: 0.180000 Source for statistics: Single Node Prefetch page count: 24 Container extent page count: 16 Table overflow record count: 2268 Table Active Blocks: -1 Average Row Compression Ratio: 0 Percentage Rows Compressed: 0 Average Compressed Row Size: 0 Schema: USERID Name: SAMP_GROUP_FACT Type: Table Time of creation: 2009-08-12-17.44.57.693881 Last statistics update: 2009-09-08-09.35.29.208003 Number of columns: 5 Number of rows: 48195029 Width of rows: 32 Number of buffer pool pages: 428492 Number of data partitions: 1 Distinct row values: No Tablespace name: USERSPACE1 Tablespace overhead: 12.670000 Tablespace transfer rate: 0.180000 Source for statistics: Single Node Prefetch page count: 192 Container extent page count: 32 Table overflow record count: 0 Table Active Blocks: -1 Average Row Compression Ratio: 0 Percentage Rows Compressed: 0 Average Compressed Row Size: 0 Schema: USERID Name: SURVEY_FACT Type: Table Time of creation: 2008-01-27-01.31.48.312712 Last statistics update: 2009-03-03-23.12.25.155122 Number of columns: 15 Number of rows: 162881175 Width of rows: 29 Number of buffer pool pages: 1372081 Number of data partitions: 1 Distinct row values: No Tablespace name: FACTIT_TBSP1 Tablespace overhead: 12.670000 Tablespace transfer rate: 0.180000 Source for statistics: Single Node Prefetch page count: 24 Container extent page count: 16 Table overflow record count: 0 Table Active Blocks: -1 Average Row Compression Ratio: 0 Percentage Rows Compressed: 0 Average Compressed Row Size: 0 |
|
#2
| |||
| |||
| "Greig" news:aa047342-5e6a-4987-98f6-af8ceead0787@z30g2000yqz.googlegroups.com... >I have an application that runs a query something like this: > > select resp_desc, count(*) from userid.survey_fact a, > userid.client_service_dim b, > userid.samp_group_fact c > where b.client_id = 8877 > and a.client_service_id = b.client_service_sid > and a.survey_id = c.survey_id > and mailing_date_id between 20080101 and 20090101 > group by resp_desc; > > This query does a full table scan on the samp_group_fact table and I > can't understand why. According to the explain plan (see below), the > optimizer is expecting about 330,000 rows to be fetched from > survey_fact. Since the HSJOIN expects to return only 103,013 rows and > that the samp_group_fact table actually has more than 48 million total > rows, I really can't understand why the optimizer would choose not to > utilize the index. > > The stats are all up to date. > > Thanks in advance for any help. > Greig I have not looked at your query or explain in detail, but DB2 will generally do a table scan if it guesses (using statistics) that it will have to read at least one row on all the pages (or even a large percentage of the pages). The reason for this is that it is usually faster to do the table scan in such cases, even if some pages are needed (mostly because of page prefetch). It might be in your particular case that a better access plan can be chosen, I don't know. You could try to manually update some of the statistics (or alter the tables to volatile) to force it to use an index and then see if it really runs faster that way. Don't just rely on the relative cost of the explain, run some tests with different stats (and access plans) to see what the actual elapsed time results are. |
|
#3
| |||
| |||
|
On Sep 8, 11:46*pm, "Mark A" > "Greig" > > news:aa047342-5e6a-4987-98f6-af8ceead0787@z30g2000yqz.googlegroups.com... > > > > > > >I have an application that runs a query something like this: > > > select resp_desc, count(*) from userid.survey_fact a, > > userid.client_service_dim b, > > userid.samp_group_fact c > > where b.client_id = 8877 > > and a.client_service_id = b.client_service_sid > > and a.survey_id = c.survey_id > > and mailing_date_id between 20080101 and 20090101 > > group by resp_desc; > > > This query does a full table scan on the samp_group_fact table and I > > can't understand why. *According to the explain plan (see below), the > > optimizer is expecting about 330,000 rows to be fetched from > > survey_fact. Since the HSJOIN expects to return only 103,013 rows and > > that the samp_group_fact table actually has more than 48 million total > > rows, I really can't understand why the optimizer would choose not to > > utilize the index. > > > The stats are all up to date. > > > Thanks in advance for any help. > > Greig > > I have not looked at your query or explain in detail, but DB2 will generally > do a table scan if it guesses (using statistics) that it will have to read > at least one row on all the pages (or even a large percentage of the pages). > The reason for this is that it is usually faster to do the table scan in > such cases, even if some pages are needed (mostly because of page prefetch). > > It might be in your particular case that a better access plan can be chosen, > I don't know. You could try to manually update some of the statistics (or > alter the tables to volatile) to force it to use an index and then see ifit > really runs faster that way. Don't just rely on the relative cost of the > explain, run some tests with different stats (and access plans) to see what > the actual elapsed time results are.- Hide quoted text - > > - Show quoted text - Run a DB2 Advisor for that Query for Index Recomendations. Looks like the Index Column order is not liked by DB2 Optimizer. If you change that .....it will pick it up. Cheers... Shashi Mannepalli |
|
#4
| |||
| |||
|
WGreig wrote: > I have an application that runs a query something like this: > > select resp_desc, count(*) from userid.survey_fact a, > userid.client_service_dim b, > userid.samp_group_fact c > where b.client_id = 8877 > and a.client_service_id = b.client_service_sid > and a.survey_id = c.survey_id > and mailing_date_id between 20080101 and 20090101 > group by resp_desc; > > This query does a full table scan on the samp_group_fact table and I > can't understand why. According to the explain plan (see below), the > optimizer is expecting about 330,000 rows to be fetched from > survey_fact. Since the HSJOIN expects to return only 103,013 rows and > that the samp_group_fact table actually has more than 48 million total > rows, I really can't understand why the optimizer would choose not to > utilize the index. > > The stats are all up to date. Without DDL (and some stats) for the samp_group_fact table and associated objects, no one will be able to help answer the question better than Mark A already did. |
|
#5
| |||
| |||
|
On Sep 9, 1:50*pm, Ian > WGreig wrote: > > I have an application that runs a query something like this: > > > select resp_desc, count(*) from userid.survey_fact a, > > userid.client_service_dim b, > > userid.samp_group_fact c > > where b.client_id = 8877 > > and a.client_service_id = b.client_service_sid > > and a.survey_id = c.survey_id > > and mailing_date_id between 20080101 and 20090101 > > group by resp_desc; > > > This query does a full table scan on the samp_group_fact table and I > > can't understand why. *According to the explain plan (see below), the > > optimizer is expecting about 330,000 rows to be fetched from > > survey_fact. Since the HSJOIN expects to return only 103,013 rows and > > that the samp_group_fact table actually has more than 48 million total > > rows, I really can't understand why the optimizer would choose not to > > utilize the index. > > > The stats are all up to date. > > Without DDL (and some stats) for the samp_group_fact table and > associated objects, no one will be able to help answer the > question better than Mark A already did.- Hide quoted text - > > - Show quoted text - Well, I can say that if I parameterize the query like this: select resp_desc, count(*) from userid.survey_fact a, userid.client_service_dim b, userid.samp_group_fact c where b.client_id = ? and a.client_service_id = b.client_service_sid and a.survey_id = c.survey_id and mailing_date_id between ? and ? group by resp_desc; Then the optimizer does utilize the index and the query runs in seconds rather than minutes (when I plug in matching parameters with the first query). As for the DDL for SAMP_GROUP_FACT, it's pretty simple. It's got a survey_id, resp_id and resp_desc. There is 1 index on survey_id. For every survey_id in survey_fact, there is at most 1 matching record in samp_group_fact. What else would you like to see in the way of stats? Thanks again for the help. Greig |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
All times are GMT -4. The time now is 12:38 AM.



Linear Mode