+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 11 to 17 of 17

dealing with skewed data

  1. Re: dealing with skewed data

    Mark D Powell wrote:

    > Richard Kuhler wrote in message news:...
    >
    >>I'm looking for suggestions and experience on dealing with skewed data
    >>in a datawarehouse. Unfortunately, I don't see how histograms can help
    >>with what I believe is a fairly standard star schema. Imagine a
    >>dimension table D that has two rows with natural keys 'D1' and 'D2'.
    >>Imagine a fact table F that has 10 million rows. The typical query
    >>looks something like this:
    >>
    >>select f.*
    >>from d, f
    >>where d.key = 'D1'
    >> and f.d_id = d.d_id
    >>
    >>Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    >> When querying with 'D1' we want to do a full table scan, with 'D2' an
    >>index scan is faster.
    >>
    >>What can be done to get the optimizer on track? Obviously, we could
    >>eliminate the surrogate keys and then build a histogram on the fact
    >>table. Is that how this is typically handled? We're also considering
    >>bitmap join indexes but in reality these queries typically select many
    >>columns from the dimension tables as well.
    >>
    >>Ideas?
    >>
    >>
    >>Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    >>With the Partitioning, OLAP and Oracle Data Mining options
    >>JServer Release 9.2.0.2.0 - Production

    >
    >
    > Richard, we had a situation similar to your in our OLTP. The CBO said
    > full table scan. The developer wanted the query to use an index. The
    > index only had six distinct values of which 85% of the data had one
    > value.
    >
    > I looked at it and demonstrated that the CBO was correct for 85% of
    > the data in our case so the developer coded a test on the bind
    > variable. If it was the 85% value we let the optimizer run the
    > statement the way it wants. For any other value we execute a hinted
    > version of the statement that uses the index.
    >
    > HTH -- Mark D Powell --


    If there is any chance you could send me the item you just described I
    would very much like to show it to my students. Thanks.

    --
    Daniel Morgan
    http://www.outreach.washington.edu/e...ad/oad_crs.asp
    http://www.outreach.washington.edu/e...oa/aoa_crs.asp
    damorgan@x.washington.edu
    (replace 'x' with a 'u' to reply)


  2. Re: dealing with skewed data

    Mark D Powell wrote:

    > Richard Kuhler wrote in message news:...
    >
    >>I'm looking for suggestions and experience on dealing with skewed data
    >>in a datawarehouse. Unfortunately, I don't see how histograms can help
    >>with what I believe is a fairly standard star schema. Imagine a
    >>dimension table D that has two rows with natural keys 'D1' and 'D2'.
    >>Imagine a fact table F that has 10 million rows. The typical query
    >>looks something like this:
    >>
    >>select f.*
    >>from d, f
    >>where d.key = 'D1'
    >> and f.d_id = d.d_id
    >>
    >>Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    >> When querying with 'D1' we want to do a full table scan, with 'D2' an
    >>index scan is faster.
    >>
    >>What can be done to get the optimizer on track? Obviously, we could
    >>eliminate the surrogate keys and then build a histogram on the fact
    >>table. Is that how this is typically handled? We're also considering
    >>bitmap join indexes but in reality these queries typically select many
    >>columns from the dimension tables as well.
    >>
    >>Ideas?
    >>
    >>
    >>Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    >>With the Partitioning, OLAP and Oracle Data Mining options
    >>JServer Release 9.2.0.2.0 - Production

    >
    >
    > Richard, we had a situation similar to your in our OLTP. The CBO said
    > full table scan. The developer wanted the query to use an index. The
    > index only had six distinct values of which 85% of the data had one
    > value.
    >
    > I looked at it and demonstrated that the CBO was correct for 85% of
    > the data in our case so the developer coded a test on the bind
    > variable. If it was the 85% value we let the optimizer run the
    > statement the way it wants. For any other value we execute a hinted
    > version of the statement that uses the index.
    >
    > HTH -- Mark D Powell --


    If there is any chance you could send me the item you just described I
    would very much like to show it to my students. Thanks.

    --
    Daniel Morgan
    http://www.outreach.washington.edu/e...ad/oad_crs.asp
    http://www.outreach.washington.edu/e...oa/aoa_crs.asp
    damorgan@x.washington.edu
    (replace 'x' with a 'u' to reply)


  3. Re: dealing with skewed data

    Mark D Powell wrote:

    > Richard Kuhler wrote in message news:...
    >
    >>I'm looking for suggestions and experience on dealing with skewed data
    >>in a datawarehouse. Unfortunately, I don't see how histograms can help
    >>with what I believe is a fairly standard star schema. Imagine a
    >>dimension table D that has two rows with natural keys 'D1' and 'D2'.
    >>Imagine a fact table F that has 10 million rows. The typical query
    >>looks something like this:
    >>
    >>select f.*
    >>from d, f
    >>where d.key = 'D1'
    >> and f.d_id = d.d_id
    >>
    >>Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    >> When querying with 'D1' we want to do a full table scan, with 'D2' an
    >>index scan is faster.
    >>
    >>What can be done to get the optimizer on track? Obviously, we could
    >>eliminate the surrogate keys and then build a histogram on the fact
    >>table. Is that how this is typically handled? We're also considering
    >>bitmap join indexes but in reality these queries typically select many
    >>columns from the dimension tables as well.
    >>
    >>Ideas?
    >>
    >>
    >>Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    >>With the Partitioning, OLAP and Oracle Data Mining options
    >>JServer Release 9.2.0.2.0 - Production

    >
    >
    > Richard, we had a situation similar to your in our OLTP. The CBO said
    > full table scan. The developer wanted the query to use an index. The
    > index only had six distinct values of which 85% of the data had one
    > value.
    >
    > I looked at it and demonstrated that the CBO was correct for 85% of
    > the data in our case so the developer coded a test on the bind
    > variable. If it was the 85% value we let the optimizer run the
    > statement the way it wants. For any other value we execute a hinted
    > version of the statement that uses the index.
    >
    > HTH -- Mark D Powell --


    If there is any chance you could send me the item you just described I
    would very much like to show it to my students. Thanks.

    --
    Daniel Morgan
    http://www.outreach.washington.edu/e...ad/oad_crs.asp
    http://www.outreach.washington.edu/e...oa/aoa_crs.asp
    damorgan@x.washington.edu
    (replace 'x' with a 'u' to reply)


  4. Re: dealing with skewed data


    It's unfortunate that the CBO has a number
    of variations on the basic calculations that
    kick in when the numbers of rows are very
    small.

    I think you are seeing the result of two
    symptoms. First that the join selectivity
    algorithm is doing something non-standard
    when the driving table is known to return
    just one row. Second that Oracle uses (or
    seems to use) number of distinct values for
    join estimates rather than the density - not
    that it would help you to use the density,
    because you still only get one access path
    to cover two very different queries.

    By the way, you don't get a star_transformation
    unless you have at least two dimension tables
    involved, so if your query is a very common
    requirement it puts you a little to the edge of the
    standard 'star schema' processing and you might
    get some benefit if you can partition your table
    on the target column so that you get one small
    and one large partition. Oracle has a run-time
    join optimisation that would allow it to determine
    whether it should be visiting the small or large
    partition - apart from that, I don't know of any
    mechanism that Oracle has that would allow it
    to determine that the value in the row selected
    for the join is the special case value in the fact
    table distribution.

    If your queries are only for
    d.key = 'D1'
    and not
    d.other_column = 'xxx'
    then you might consider a
    function-based bitmap index
    on the fact table (perhaps on decode()
    if the dimension table is guaranteed
    to be static), rather than a join index.

    --
    Regards

    Jonathan Lewis
    http://www.jlcomp.demon.co.uk

    The educated person is not the person
    who can answer the questions, but the
    person who can question the answers -- T. Schick Jr


    Next public appearance2:
    March 2004 Hotsos Symposium - Keynote
    March 2004 Charlotte NC - OUG Tutorial
    April 2004 Iceland


    One-day tutorials:
    http://www.jlcomp.demon.co.uk/tutorial.html


    Three-day seminar:
    see http://www.jlcomp.demon.co.uk/seminar.html
    ____UK___February


    The Co-operative Oracle Users' FAQ
    http://www.jlcomp.demon.co.uk/faq/ind_faq.html


    "Richard Kuhler" wrote in message
    news:MyFNb.82210$Vs3.69341@twister.socal.rr.com...
    > I'm looking for suggestions and experience on dealing with skewed data
    > in a datawarehouse. Unfortunately, I don't see how histograms can help
    > with what I believe is a fairly standard star schema. Imagine a
    > dimension table D that has two rows with natural keys 'D1' and 'D2'.
    > Imagine a fact table F that has 10 million rows. The typical query
    > looks something like this:
    >
    > select f.*
    > from d, f
    > where d.key = 'D1'
    > and f.d_id = d.d_id
    >
    > Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    > When querying with 'D1' we want to do a full table scan, with 'D2' an
    > index scan is faster.
    >
    > What can be done to get the optimizer on track? Obviously, we could
    > eliminate the surrogate keys and then build a histogram on the fact
    > table. Is that how this is typically handled? We're also considering
    > bitmap join indexes but in reality these queries typically select many
    > columns from the dimension tables as well.
    >
    > Ideas?
    >
    >
    > Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    > With the Partitioning, OLAP and Oracle Data Mining options
    > JServer Release 9.2.0.2.0 - Production
    >
    > --
    > Richard Kuhler
    >




  5. Re: dealing with skewed data


    It's unfortunate that the CBO has a number
    of variations on the basic calculations that
    kick in when the numbers of rows are very
    small.

    I think you are seeing the result of two
    symptoms. First that the join selectivity
    algorithm is doing something non-standard
    when the driving table is known to return
    just one row. Second that Oracle uses (or
    seems to use) number of distinct values for
    join estimates rather than the density - not
    that it would help you to use the density,
    because you still only get one access path
    to cover two very different queries.

    By the way, you don't get a star_transformation
    unless you have at least two dimension tables
    involved, so if your query is a very common
    requirement it puts you a little to the edge of the
    standard 'star schema' processing and you might
    get some benefit if you can partition your table
    on the target column so that you get one small
    and one large partition. Oracle has a run-time
    join optimisation that would allow it to determine
    whether it should be visiting the small or large
    partition - apart from that, I don't know of any
    mechanism that Oracle has that would allow it
    to determine that the value in the row selected
    for the join is the special case value in the fact
    table distribution.

    If your queries are only for
    d.key = 'D1'
    and not
    d.other_column = 'xxx'
    then you might consider a
    function-based bitmap index
    on the fact table (perhaps on decode()
    if the dimension table is guaranteed
    to be static), rather than a join index.

    --
    Regards

    Jonathan Lewis
    http://www.jlcomp.demon.co.uk

    The educated person is not the person
    who can answer the questions, but the
    person who can question the answers -- T. Schick Jr


    Next public appearance2:
    March 2004 Hotsos Symposium - Keynote
    March 2004 Charlotte NC - OUG Tutorial
    April 2004 Iceland


    One-day tutorials:
    http://www.jlcomp.demon.co.uk/tutorial.html


    Three-day seminar:
    see http://www.jlcomp.demon.co.uk/seminar.html
    ____UK___February


    The Co-operative Oracle Users' FAQ
    http://www.jlcomp.demon.co.uk/faq/ind_faq.html


    "Richard Kuhler" wrote in message
    news:MyFNb.82210$Vs3.69341@twister.socal.rr.com...
    > I'm looking for suggestions and experience on dealing with skewed data
    > in a datawarehouse. Unfortunately, I don't see how histograms can help
    > with what I believe is a fairly standard star schema. Imagine a
    > dimension table D that has two rows with natural keys 'D1' and 'D2'.
    > Imagine a fact table F that has 10 million rows. The typical query
    > looks something like this:
    >
    > select f.*
    > from d, f
    > where d.key = 'D1'
    > and f.d_id = d.d_id
    >
    > Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    > When querying with 'D1' we want to do a full table scan, with 'D2' an
    > index scan is faster.
    >
    > What can be done to get the optimizer on track? Obviously, we could
    > eliminate the surrogate keys and then build a histogram on the fact
    > table. Is that how this is typically handled? We're also considering
    > bitmap join indexes but in reality these queries typically select many
    > columns from the dimension tables as well.
    >
    > Ideas?
    >
    >
    > Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    > With the Partitioning, OLAP and Oracle Data Mining options
    > JServer Release 9.2.0.2.0 - Production
    >
    > --
    > Richard Kuhler
    >




  6. Re: dealing with skewed data


    It's unfortunate that the CBO has a number
    of variations on the basic calculations that
    kick in when the numbers of rows are very
    small.

    I think you are seeing the result of two
    symptoms. First that the join selectivity
    algorithm is doing something non-standard
    when the driving table is known to return
    just one row. Second that Oracle uses (or
    seems to use) number of distinct values for
    join estimates rather than the density - not
    that it would help you to use the density,
    because you still only get one access path
    to cover two very different queries.

    By the way, you don't get a star_transformation
    unless you have at least two dimension tables
    involved, so if your query is a very common
    requirement it puts you a little to the edge of the
    standard 'star schema' processing and you might
    get some benefit if you can partition your table
    on the target column so that you get one small
    and one large partition. Oracle has a run-time
    join optimisation that would allow it to determine
    whether it should be visiting the small or large
    partition - apart from that, I don't know of any
    mechanism that Oracle has that would allow it
    to determine that the value in the row selected
    for the join is the special case value in the fact
    table distribution.

    If your queries are only for
    d.key = 'D1'
    and not
    d.other_column = 'xxx'
    then you might consider a
    function-based bitmap index
    on the fact table (perhaps on decode()
    if the dimension table is guaranteed
    to be static), rather than a join index.

    --
    Regards

    Jonathan Lewis
    http://www.jlcomp.demon.co.uk

    The educated person is not the person
    who can answer the questions, but the
    person who can question the answers -- T. Schick Jr


    Next public appearance2:
    March 2004 Hotsos Symposium - Keynote
    March 2004 Charlotte NC - OUG Tutorial
    April 2004 Iceland


    One-day tutorials:
    http://www.jlcomp.demon.co.uk/tutorial.html


    Three-day seminar:
    see http://www.jlcomp.demon.co.uk/seminar.html
    ____UK___February


    The Co-operative Oracle Users' FAQ
    http://www.jlcomp.demon.co.uk/faq/ind_faq.html


    "Richard Kuhler" wrote in message
    news:MyFNb.82210$Vs3.69341@twister.socal.rr.com...
    > I'm looking for suggestions and experience on dealing with skewed data
    > in a datawarehouse. Unfortunately, I don't see how histograms can help
    > with what I believe is a fairly standard star schema. Imagine a
    > dimension table D that has two rows with natural keys 'D1' and 'D2'.
    > Imagine a fact table F that has 10 million rows. The typical query
    > looks something like this:
    >
    > select f.*
    > from d, f
    > where d.key = 'D1'
    > and f.d_id = d.d_id
    >
    > Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    > When querying with 'D1' we want to do a full table scan, with 'D2' an
    > index scan is faster.
    >
    > What can be done to get the optimizer on track? Obviously, we could
    > eliminate the surrogate keys and then build a histogram on the fact
    > table. Is that how this is typically handled? We're also considering
    > bitmap join indexes but in reality these queries typically select many
    > columns from the dimension tables as well.
    >
    > Ideas?
    >
    >
    > Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    > With the Partitioning, OLAP and Oracle Data Mining options
    > JServer Release 9.2.0.2.0 - Production
    >
    > --
    > Richard Kuhler
    >




  7. Re: dealing with skewed data


    It's unfortunate that the CBO has a number
    of variations on the basic calculations that
    kick in when the numbers of rows are very
    small.

    I think you are seeing the result of two
    symptoms. First that the join selectivity
    algorithm is doing something non-standard
    when the driving table is known to return
    just one row. Second that Oracle uses (or
    seems to use) number of distinct values for
    join estimates rather than the density - not
    that it would help you to use the density,
    because you still only get one access path
    to cover two very different queries.

    By the way, you don't get a star_transformation
    unless you have at least two dimension tables
    involved, so if your query is a very common
    requirement it puts you a little to the edge of the
    standard 'star schema' processing and you might
    get some benefit if you can partition your table
    on the target column so that you get one small
    and one large partition. Oracle has a run-time
    join optimisation that would allow it to determine
    whether it should be visiting the small or large
    partition - apart from that, I don't know of any
    mechanism that Oracle has that would allow it
    to determine that the value in the row selected
    for the join is the special case value in the fact
    table distribution.

    If your queries are only for
    d.key = 'D1'
    and not
    d.other_column = 'xxx'
    then you might consider a
    function-based bitmap index
    on the fact table (perhaps on decode()
    if the dimension table is guaranteed
    to be static), rather than a join index.

    --
    Regards

    Jonathan Lewis
    http://www.jlcomp.demon.co.uk

    The educated person is not the person
    who can answer the questions, but the
    person who can question the answers -- T. Schick Jr


    Next public appearance2:
    March 2004 Hotsos Symposium - Keynote
    March 2004 Charlotte NC - OUG Tutorial
    April 2004 Iceland


    One-day tutorials:
    http://www.jlcomp.demon.co.uk/tutorial.html


    Three-day seminar:
    see http://www.jlcomp.demon.co.uk/seminar.html
    ____UK___February


    The Co-operative Oracle Users' FAQ
    http://www.jlcomp.demon.co.uk/faq/ind_faq.html


    "Richard Kuhler" wrote in message
    news:MyFNb.82210$Vs3.69341@twister.socal.rr.com...
    > I'm looking for suggestions and experience on dealing with skewed data
    > in a datawarehouse. Unfortunately, I don't see how histograms can help
    > with what I believe is a fairly standard star schema. Imagine a
    > dimension table D that has two rows with natural keys 'D1' and 'D2'.
    > Imagine a fact table F that has 10 million rows. The typical query
    > looks something like this:
    >
    > select f.*
    > from d, f
    > where d.key = 'D1'
    > and f.d_id = d.d_id
    >
    > Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
    > When querying with 'D1' we want to do a full table scan, with 'D2' an
    > index scan is faster.
    >
    > What can be done to get the optimizer on track? Obviously, we could
    > eliminate the surrogate keys and then build a histogram on the fact
    > table. Is that how this is typically handled? We're also considering
    > bitmap join indexes but in reality these queries typically select many
    > columns from the dimension tables as well.
    >
    > Ideas?
    >
    >
    > Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
    > With the Partitioning, OLAP and Oracle Data Mining options
    > JServer Release 9.2.0.2.0 - Production
    >
    > --
    > Richard Kuhler
    >




+ Reply to Thread
Page 2 of 2 FirstFirst 1 2