+ Reply to Thread
Results 1 to 6 of 6

SELECT DISTINCT slow, how can I speed up

  1. SELECT DISTINCT slow, how can I speed up


    Hello. I have a question about how I can structure a SQL query to make
    the results come back faster. I have experimented a bit and include
    some results here.

    We have read-only SELECT access to tables for a payroll and absence
    system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
    Server in the IT department on the same site. We do not have
    privileges to UPDATE, DELETE, or create anymore indexes and if I did,
    it would probably invalidate our software support contract.

    We have staff using Access 2003 as a front end to the Oracle tables
    for the purposes of ad hoc queries. These staff regularly enter
    queries across 5-6 tables with various bits added to the SQL to limit
    the number of rows returned and match criteria. All the rows starts to
    come back within a second of the SQL starting to run.

    When the SQL is like this (I have anonymised the table names).


    SELECT a1,a2,a3,a5,a10
    FROM table1, table2, table3, table4
    WHERE table1.empID=table2.empID
    AND table2.empID=table3.empID
    AND table3.empID=table4.empID
    AND table4.absenceT IN (15,19)
    AND table4.year=2008;

    The rows start to come back immediately. Around 1700-1800 rows
    typically come back and there is a lot of network IO. If I type

    SELECT COUNT(*) FROM
    (
    SELECT a1,a2,a3,a5,a10
    FROM table1, table2, table3, table4
    WHERE table1.empID=table2.empID
    AND table2.empID=table3.empID
    AND table3.empID=table4.empID
    AND table4.absenceT IN (15,19)
    AND table4.year=2008
    )

    it returns in 2.14seconds saying it found 1792 rows.

    If I type SELECT DISTINCT a1,a2,a3,a5,a10

    only 20-50 rows come back and the query takes 28 seconds and there is
    a lot less network IO.

    I only want the 20-50 rows but I don't want to wait 28 seconds. As
    Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    seconds only to return the unique ones.

    These are ad hoc queries so I am not trying to optimize a single
    query. My question is Is there a more quick way of generally returning
    a DISTINCT set of rows other than using DISTINCT in the SELECT
    statement.

    Thank you
    Barry



  2. Re: SELECT DISTINCT slow, how can I speed up

    On Aug 14, 2:18*pm, Barry Bulsara wrote:
    > Hello. I have a question about how I can structure a SQL query to make
    > the results come back faster. I have experimented a bit and include
    > some results here.
    >
    > We have read-only SELECT access to tables for a payroll and absence
    > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
    > Server in the IT department on the same site. We do not have
    > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
    > it would probably invalidate our software support contract.
    >
    > We have staff using Access 2003 as a front end to the Oracle tables
    > for the purposes of ad hoc queries. These staff regularly enter
    > queries across 5-6 tables with various bits added to the SQL to limit
    > the number of rows returned and match criteria. All the rows starts to
    > come back within a second of the SQL starting to run.
    >
    > When the SQL is like this (I have anonymised the table names).
    >
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008;
    >
    > The rows start to come back immediately. Around 1700-1800 rows
    > typically come back and there is a lot of network IO. If I type
    >
    > SELECT COUNT(*) FROM
    > (
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008
    > )
    >
    > it returns in 2.14seconds saying it found 1792 rows.
    >
    > If I type SELECT DISTINCT a1,a2,a3,a5,a10
    >
    > only 20-50 rows come back and the query takes 28 seconds and there is
    > a lot less network IO.
    >
    > I only want the 20-50 rows but I don't want to wait 28 seconds. As
    > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    > seconds only to return the unique ones.
    >
    > These are ad hoc queries so I am not trying to optimize a single
    > query. My question is Is there a more quick way of generally returning
    > a DISTINCT set of rows other than using DISTINCT in the SELECT
    > statement.
    >
    > Thank you
    > Barry


    Explain plans as well as output from a 10046 trace would be helpful
    for something like this.

    In general SELECT DISTINCT is going to have some additional overhead
    in oracle.

    Having a SELECT COUNT return in 2+ seconds is not quite the same as
    having 2000 rows come back as oracle may be coming up with quite
    different execution plans between the SELECT COUNT and the SELECT
    columns.

    Can you code the query in such as way that you don't need to use
    DISTINCT? The EXISTS/NOT EXISTS and IN/NOT IN constructs are often
    useful in situations like this if you cannot force uniqueness just by
    correct table joins.

  3. Re: SELECT DISTINCT slow, how can I speed up

    On Thu, 14 Aug 2008 11:18:41 -0700 (PDT), Barry Bulsara
    wrote:

    >If I type SELECT DISTINCT a1,a2,a3,a5,a10
    >
    >only 20-50 rows come back and the query takes 28 seconds and there is
    >a lot less network IO.
    >

    Sure, every query consists of
    parse
    execute
    fetch

    Distinct requires SORT and SORT is in the execute phase. Execute is on
    the database server.

    >I only want the 20-50 rows but I don't want to wait 28 seconds. As
    >Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    >seconds only to return the unique ones.
    >



    using DISTINCT requires a SORT operation.
    Sorts can be tuned.
    >These are ad hoc queries so I am not trying to optimize a single
    >query. My question is Is there a more quick way of generally returning
    >a DISTINCT set of rows other than using DISTINCT in the SELECT
    >statement.


    You would still need a SORT, wouldn't you?

    --
    Sybrand Bakker
    Senior Oracle DBA

  4. Re: SELECT DISTINCT slow, how can I speed up

    On Aug 14, 1:18*pm, Barry Bulsara wrote:
    > Hello. I have a question about how I can structure a SQL query to make
    > the results come back faster. I have experimented a bit and include
    > some results here.
    >
    > We have read-only SELECT access to tables for a payroll and absence
    > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
    > Server in the IT department on the same site. We do not have
    > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
    > it would probably invalidate our software support contract.
    >
    > We have staff using Access 2003 as a front end to the Oracle tables
    > for the purposes of ad hoc queries. These staff regularly enter
    > queries across 5-6 tables with various bits added to the SQL to limit
    > the number of rows returned and match criteria. All the rows starts to
    > come back within a second of the SQL starting to run.
    >
    > When the SQL is like this (I have anonymised the table names).
    >
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008;
    >
    > The rows start to come back immediately. Around 1700-1800 rows
    > typically come back and there is a lot of network IO. If I type
    >
    > SELECT COUNT(*) FROM
    > (
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008
    > )
    >
    > it returns in 2.14seconds saying it found 1792 rows.
    >
    > If I type SELECT DISTINCT a1,a2,a3,a5,a10
    >
    > only 20-50 rows come back and the query takes 28 seconds and there is
    > a lot less network IO.
    >
    > I only want the 20-50 rows but I don't want to wait 28 seconds. As
    > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    > seconds only to return the unique ones.
    >
    > These are ad hoc queries so I am not trying to optimize a single
    > query. My question is Is there a more quick way of generally returning
    > a DISTINCT set of rows other than using DISTINCT in the SELECT
    > statement.
    >
    > Thank you
    > Barry


    You might try this, although I don't know how much faster, if at all,
    it will be:

    SELECT a1,a2,a3,a5,a10
    FROM table1, table2, table3, table4
    WHERE table1.empID=table2.empID
    AND table2.empID=table3.empID
    AND table3.empID=table4.empID
    AND table4.absenceT IN (15,19)
    AND table4.year=2008
    group by a1, a2, a3, a5, a10;

    Possibly you should ask about increasing the hash_area_size parameter
    value as 10g uses hashing algorithms to process such requests, rather
    than the old-style sort operations.


    David Fitzjarrell

  5. Re: SELECT DISTINCT slow, how can I speed up

    On Aug 14, 2:45*pm, "fitzjarr...@cox.net" wrote:
    > On Aug 14, 1:18*pm, Barry Bulsara wrote:
    >
    >
    >
    >
    >
    > > Hello. I have a question about how I can structure a SQL query to make
    > > the results come back faster. I have experimented a bit and include
    > > some results here.

    >
    > > We have read-only SELECT access to tables for a payroll and absence
    > > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
    > > Server in the IT department on the same site. We do not have
    > > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
    > > it would probably invalidate our software support contract.

    >
    > > We have staff using Access 2003 as a front end to the Oracle tables
    > > for the purposes of ad hoc queries. These staff regularly enter
    > > queries across 5-6 tables with various bits added to the SQL to limit
    > > the number of rows returned and match criteria. All the rows starts to
    > > come back within a second of the SQL starting to run.

    >
    > > When the SQL is like this (I have anonymised the table names).

    >
    > > SELECT a1,a2,a3,a5,a10
    > > *FROM table1, table2, table3, table4
    > > * WHERE table1.empID=table2.empID
    > > * * *AND table2.empID=table3.empID
    > > * * *AND table3.empID=table4.empID
    > > * * *AND table4.absenceT IN (15,19)
    > > * * *AND table4.year=2008;

    >
    > > The rows start to come back immediately. Around 1700-1800 rows
    > > typically come back and there is a lot of network IO. If I type

    >
    > > SELECT COUNT(*) FROM
    > > (
    > > SELECT a1,a2,a3,a5,a10
    > > *FROM table1, table2, table3, table4
    > > * WHERE table1.empID=table2.empID
    > > * * *AND table2.empID=table3.empID
    > > * * *AND table3.empID=table4.empID
    > > * * *AND table4.absenceT IN (15,19)
    > > * * *AND table4.year=2008
    > > )

    >
    > > it returns in 2.14seconds saying it found 1792 rows.

    >
    > > If I type SELECT DISTINCT a1,a2,a3,a5,a10

    >
    > > only 20-50 rows come back and the query takes 28 seconds and there is
    > > a lot less network IO.

    >
    > > I only want the 20-50 rows but I don't want to wait 28 seconds. As
    > > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    > > seconds only to return the unique ones.

    >
    > > These are ad hoc queries so I am not trying to optimize a single
    > > query. My question is Is there a more quick way of generally returning
    > > a DISTINCT set of rows other than using DISTINCT in the SELECT
    > > statement.

    >
    > > Thank you
    > > Barry

    >
    > You might try this, although I don't know how much faster, if at all,
    > it will be:
    >
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008
    > group by a1, a2, a3, a5, a10;
    >
    > Possibly you should ask about increasing the hash_area_size parameter
    > value as 10g uses hashing algorithms to process such requests, rather
    > than the old-style sort operations.
    >
    > David Fitzjarrell- Hide quoted text -
    >
    > - Show quoted text -


    With a 8 million records and no index I find the group by executes in
    less than half of the time of distinct:

    SQL> select distinct ename
    2 from dist_tst;

    ENAME
    --------------------
    NANCY10
    NANCY2
    NANCY7
    NANCY9
    NANCY5
    NANCY3
    NANCY6
    NANCY1
    NANCY4
    NANCY8

    10 rows selected.

    Elapsed: 00:00:14.32

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3730636855

    ---------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
    (%CPU)| Time |
    ---------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 8049K| 92M| |
    45750 (9)| 00:09:10 |
    | 1 | HASH UNIQUE | | 8049K| 92M| 308M|
    45750 (9)| 00:09:10 |
    | 2 | TABLE ACCESS FULL| DIST_TST | 8049K| 92M| | 5918
    (10)| 00:01:12 |
    ---------------------------------------------------------------------------------------


    Note
    -----
    - dynamic sampling used for this statement


    Statistics
    ----------------------------------------------------------
    49 recursive calls
    1 db block gets
    47196 consistent gets
    11429 physical reads
    1680880 redo size
    347 bytes sent via SQL*Net to client
    246 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    10 rows processed

    SQL>
    SQL> select ename
    2 from dist_tst
    3 group by ename;

    ENAME
    --------------------
    NANCY9
    NANCY10
    NANCY2
    NANCY6
    NANCY4
    NANCY7
    NANCY5
    NANCY3
    NANCY1
    NANCY8

    10 rows selected.

    Elapsed: 00:00:05.79

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1228703371

    -------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
    Time |
    -------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 8049K| 92M| 9067 (42)|
    00:01:49 |
    | 1 | HASH GROUP BY | | 8049K| 92M| 9067 (42)|
    00:01:49 |
    | 2 | TABLE ACCESS FULL| DIST_TST | 8049K| 92M| 5918 (10)|
    00:01:12 |
    -------------------------------------------------------------------------------

    Note
    -----
    - dynamic sampling used for this statement


    Statistics
    ----------------------------------------------------------
    4 recursive calls
    0 db block gets
    23699 consistent gets
    11462 physical reads
    0 redo size
    348 bytes sent via SQL*Net to client
    246 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    10 rows processed

    SQL>

    So it may help you, it may not. You'll need to test this on your own
    data.


    David Fitzjarrell

  6. Re: SELECT DISTINCT slow, how can I speed up

    On Aug 14, 7:18*pm, Barry Bulsara wrote:
    > Hello. I have a question about how I can structure a SQL query to make
    > the results come back faster. I have experimented a bit and include
    > some results here.
    >
    > We have read-only SELECT access to tables for a payroll and absence
    > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
    > Server in the IT department on the same site. We do not have
    > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
    > it would probably invalidate our software support contract.
    >
    > We have staff using Access 2003 as a front end to the Oracle tables
    > for the purposes of ad hoc queries. These staff regularly enter
    > queries across 5-6 tables with various bits added to the SQL to limit
    > the number of rows returned and match criteria. All the rows starts to
    > come back within a second of the SQL starting to run.
    >
    > When the SQL is like this (I have anonymised the table names).
    >
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008;
    >
    > The rows start to come back immediately. Around 1700-1800 rows
    > typically come back and there is a lot of network IO. If I type
    >
    > SELECT COUNT(*) FROM
    > (
    > SELECT a1,a2,a3,a5,a10
    > *FROM table1, table2, table3, table4
    > * WHERE table1.empID=table2.empID
    > * * *AND table2.empID=table3.empID
    > * * *AND table3.empID=table4.empID
    > * * *AND table4.absenceT IN (15,19)
    > * * *AND table4.year=2008
    > )
    >
    > it returns in 2.14seconds saying it found 1792 rows.
    >
    > If I type SELECT DISTINCT a1,a2,a3,a5,a10
    >
    > only 20-50 rows come back and the query takes 28 seconds and there is
    > a lot less network IO.
    >
    > I only want the 20-50 rows but I don't want to wait 28 seconds. As
    > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
    > seconds only to return the unique ones.
    >
    > These are ad hoc queries so I am not trying to optimize a single
    > query. My question is Is there a more quick way of generally returning
    > a DISTINCT set of rows other than using DISTINCT in the SELECT
    > statement.
    >
    > Thank you
    > Barry


    A COUNT query can often use a very different execution plan to the
    corresponding SELECT , because without the need to
    access the columns it may be able to get the required result entirely
    from indexes, avoiding some table accesses.

+ Reply to Thread