+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 29

lots of waiting on 'db file parallel write'

  1. lots of waiting on 'db file parallel write'

    Folks,
    I'm currently trying to get to the bottom of why my 4 database
    writters spend so much time
    waiting on 'db file parallel write'... even when it appears that the
    system is not terribly busy.

    Environment:
    Oracle 10.1.0.4
    Solaris 8 - 64bit
    IBM SAN T700
    Veritas filesystem (mostly everything is 0+1 raid with a few luns at
    raid 5).
    4 db writters

    So at different times during the day I notice that the system just
    "hangs". When I check what is
    waiting I see the number of wait seconds on my dbwn processes is
    anywhere from 10-140 seconds.
    Obviously if the db writters can go fast enough, the I start to see
    sessions pile up and
    lots of waits on 'free buffer wait'.

    Our system is growing very fast because the company is growing at
    almost 50% rate...so I'm just
    not sure if we are hitting hardware limitations.

    I do have an itar open with Oracle, but I'm just not getting anywhere
    with them.
    Our system is running faster than ever, but now at certain times
    during the day it just hangs...

    Here are some parameter settings
    DB_CACHE_SIZE = 2GB
    SHARED_POOL_SIZE = 1GB.

    Something interesting in the AWR reports....

    Load Profile
    ~~~~~~~~~~~~ Per Second Per
    Transaction
    ---------------
    ---------------
    Redo size: 416,701.01
    17,329.98
    Logical reads: 27,387.68
    1,139.01
    Block changes: 2,974.96
    123.72
    Physical reads: 5,293.28
    220.14
    Physical writes: 532.99
    22.17
    User calls: 1,619.75
    67.36
    Parses: 53.37
    2.22
    Hard parses: 0.51
    0.02
    Sorts: 67.82
    2.82
    Logons: 2.85
    0.12
    Executes: 1,466.37
    60.98
    Transactions: 24.05


    Instance Efficiency Percentages (Target 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Buffer Nowait %: 99.92 Redo NoWait %: 100.00
    Buffer Hit %: 83.69 In-memory Sort %: 99.99
    Library Hit %: 99.94 Soft Parse %: 99.05
    Execute to Parse %: 96.36 Latch Hit %: 99.69
    Parse CPU to Parse Elapsd %: 56.18 % Non-Parse CPU: 99.63

    Shared Pool Statistics Begin End
    ------ ------
    Memory Usage %: 93.66 93.77
    % SQL with executions>1: 89.18 88.96
    % Memory for SQL w/exec>1: 90.72 90.46


    Class Waits Total Wait Time (s) Avg Time (ms)
    ------------------ ----------- ------------------- --------------
    data block 76,951 1,206 16
    undo header 71 39 545
    segment header 6 0 10
    undo block 161 0 0
    -------------------------------------------------------------

    Top 5 Timed Events
    ~~~~~~~~~~~~~~~~~~ % Total
    Event Waits Time (s) DB Time
    Wait Class
    ------------------------------ ------------ ----------- ---------
    --------------
    db file sequential read 6,554,919 42,254 55.17
    User I/O
    CPU time 22,239 29.04
    db file parallel write 207,339 5,367 7.01
    System I/O
    PX Deq Credit: send blkd 28,227 5,064 6.61
    Other
    db file scattered read 200,481 3,363 4.39
    User I/O


    'db file sequentail read' is always the top wait event..but this is
    normal for us.

    truss on one of the db writers.


    kaio(AIOWAIT, 0xFFFFFFFF7FFFDE10) = 1
    timeout: 600.000000 sec
    kaio(AIONOTIFY, -2229511408) = 0
    lwp_mutex_lock(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    kaio(AIONOTIFY, -2232272112) = 0
    kaio(AIONOTIFY, -2254603504) = 0
    lwp_mutex_lock(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    lwp_mutex_wakeup(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    kaio(AIONOTIFY, -2229732592) = 0
    lwp_mutex_wakeup(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    lwp_mutex_lock(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    lwp_mutex_wakeup(0xFFFFFFFF7C909EC0) = 0
    mutex type: USYNC_THREAD
    pwrite(266, "06A2\0\002 A } bF5 6DC }".., 57344, 0x2FAC4000) = 57344


    Any help would be appreciated..

    thanks
    -peter


  2. Re: lots of waiting on 'db file parallel write'

    Advance search metalink for "db file parallel write" for starters.

    Check out v$bh - you may be thrashing buffers, interfering with the db
    writers ability to finish the i/o they have to do. With the large SGA
    you have, Oracle will consider larger tables as MRU, effectively
    caching them for full table scans. These tables should be moved to
    keep or recycle pools, as appropriate.

    But don't ignore those sequential reads - are you _sure_ you've tuned
    your app as much as you can? Are you sure those undo header waits
    aren't happening during the slowdowns? Are you sure Veritas isn't
    doing some housekeeping?

    jg
    --
    @home.com is bogus.
    http://www.businessweek.com/innovate...224_155318.htm


  3. Re: lots of waiting on 'db file parallel write'

    Advance search metalink for "db file parallel write" for starters.

    Check out v$bh - you may be thrashing buffers, interfering with the db
    writers ability to finish the i/o they have to do. With the large SGA
    you have, Oracle will consider larger tables as MRU, effectively
    caching them for full table scans. These tables should be moved to
    keep or recycle pools, as appropriate.

    But don't ignore those sequential reads - are you _sure_ you've tuned
    your app as much as you can? Are you sure those undo header waits
    aren't happening during the slowdowns? Are you sure Veritas isn't
    doing some housekeeping?

    jg
    --
    @home.com is bogus.
    http://www.businessweek.com/innovate...224_155318.htm


  4. Re: lots of waiting on 'db file parallel write'

    The application I maintain is not an OLTP database nor a
    warehouse..it's sort of in between with
    large data uploads and large selects going on. Most of the data access
    in our system is Index Fast Full Scans and full table scans.. (we hash
    join large data sets ). I'm not sure I understand what you mean by
    thrashing buffers...

    By looking at AWR reports, my buffer cache ratio is never about 92%,
    but at the same time there aren't lots of waits for free buffers
    either...only when the db writers get hung up. The waits on undo
    blocks seems insignificant compared to everything else..

    As far as the size of my SGA .. we generate histograms on a large part
    of our data..which is constantly changing distributions based on
    regular data uploads..so the SGA needs to keep lots of information in
    the shared pool. On average the memory usage of the shared pool is
    always above 93% so even that leads me to believe that my shared pool
    could probably benefit from being slightly larger... I've looked at
    AWR reports where the INDEX namespace has had as much as 15% misses for
    index information in the shared pool...

    Our system is very much IO bound...it's always been, and we are in the
    process of making some data model changes to reduce that, but I still
    can't make sense of the db writers...

    Can you explain what you mean about v$bh and buffer thrashing and
    veritas housekeeping??

    thanks
    -peter


  5. Re: lots of waiting on 'db file parallel write'

    The application I maintain is not an OLTP database nor a
    warehouse..it's sort of in between with
    large data uploads and large selects going on. Most of the data access
    in our system is Index Fast Full Scans and full table scans.. (we hash
    join large data sets ). I'm not sure I understand what you mean by
    thrashing buffers...

    By looking at AWR reports, my buffer cache ratio is never about 92%,
    but at the same time there aren't lots of waits for free buffers
    either...only when the db writers get hung up. The waits on undo
    blocks seems insignificant compared to everything else..

    As far as the size of my SGA .. we generate histograms on a large part
    of our data..which is constantly changing distributions based on
    regular data uploads..so the SGA needs to keep lots of information in
    the shared pool. On average the memory usage of the shared pool is
    always above 93% so even that leads me to believe that my shared pool
    could probably benefit from being slightly larger... I've looked at
    AWR reports where the INDEX namespace has had as much as 15% misses for
    index information in the shared pool...

    Our system is very much IO bound...it's always been, and we are in the
    process of making some data model changes to reduce that, but I still
    can't make sense of the db writers...

    Can you explain what you mean about v$bh and buffer thrashing and
    veritas housekeeping??

    thanks
    -peter


  6. Re: lots of waiting on 'db file parallel write'

    On Fri, 24 Feb 2006 09:39:27 -0800, peter wrote:

    >
    > So at different times during the day I notice that the system just
    > "hangs". When I check what is
    > waiting I see the number of wait seconds on my dbwn processes is
    > anywhere from 10-140 seconds.
    > Obviously if the db writters can go fast enough, the I start to see
    > sessions pile up and
    > lots of waits on 'free buffer wait'.


    Question here is what are you writing? You have more then one dbwr (you
    used plural when talking about db WRITERS) but statistics below doesn't
    explain all that writing.

    >
    > Our system is growing very fast because the company is growing at
    > almost 50% rate...so I'm just
    > not sure if we are hitting hardware limitations.


    I wouldn't want to exclude that possibility, and the occasional waits for
    free buffers certainly point in that direction, but the huge amount of
    spent CPU time shows me that you are running very expensive SQL commands.
    You may want to tune them properly. I assume that your site is an OLTP
    site?

    >
    > Here are some parameter settings
    > DB_CACHE_SIZE = 2GB
    > SHARED_POOL_SIZE = 1GB.
    >
    > Something interesting in the AWR reports....
    >
    > Load Profile
    > ~~~~~~~~~~~~ Per Second Per
    > Transaction
    > ---------------
    > ---------------
    > Redo size: 416,701.01
    > 17,329.98
    > Logical reads: 27,387.68
    > 1,139.01
    > Block changes: 2,974.96
    > 123.72
    > Physical reads: 5,293.28
    > 220.14
    > Physical writes: 532.99


    This is very small number of writes, compared to the number of reads. How
    is it that so few writes are generating such an enormous quantity of redo?
    Are you using BLOBS or something of that nature?


    > 22.17
    > User calls: 1,619.75
    > 67.36
    > Parses: 53.37
    > 2.22
    > Hard parses: 0.51
    > 0.02
    > Sorts: 67.82


    More sorts then transactions? Hmmm, your CPU usage may
    be rather high. Are you sure that all those sorts are
    necessary?


    > 2.82
    > Logons: 2.85
    > 0.12
    > Executes: 1,466.37
    > 60.98
    > Transactions: 24.05


    Number of transactions is extremely low....Confusing.


    >
    >
    > Instance Efficiency Percentages (Target 100%)
    > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    > Buffer Nowait %: 99.92 Redo NoWait %: 100.00


    Hey, your system is performing well, you have almost 100% BCHR :)


    > Buffer Hit %: 83.69 In-memory Sort %: 99.99
    > Library Hit %: 99.94 Soft Parse %: 99.05
    > Execute to Parse %: 96.36 Latch Hit %: 99.69
    > Parse CPU to Parse Elapsd %: 56.18 % Non-Parse CPU: 99.63
    >
    > Shared Pool Statistics Begin End
    > ------ ------
    > Memory Usage %: 93.66 93.77
    > % SQL with executions>1: 89.18 88.96


    You are either using bind variables or masterfully reusing your SQL.

    > % Memory for SQL w/exec>1: 90.72 90.46
    >
    >
    > Class Waits Total Wait Time (s) Avg Time (ms)
    > ------------------ ----------- ------------------- --------------
    > data block 76,951 1,206 16
    > undo header 71 39 545
    > segment header 6 0 10
    > undo block 161 0 0
    > -------------------------------------------------------------
    >
    > Top 5 Timed Events
    > ~~~~~~~~~~~~~~~~~~ % Total
    > Event Waits Time (s) DB Time
    > Wait Class
    > ------------------------------ ------------ ----------- ---------
    > --------------
    > db file sequential read 6,554,919 42,254 55.17


    This is index access. That's is OK for an OLTP site. Some indicators that
    I see are completely inconsistent with an assumption about you managing an
    OLTP site.

    > User I/O
    > CPU time 22,239 29.04


    CPU time is extremely high. You may want to tune your SQL and make sure
    that it uses the best possible algorithms. What is using all this CPU?
    You have very few transactions, very few writes, yet you have an
    exorbitant amount of CPU usage My experience tells me that so high amount
    of CPU time is usually spent sorting or hashing. This is inconsistent with
    my initial assumption about your site being an OLTP site.

    > db file parallel write 207,339 5,367 7.01


    Are you using parallel DML? You should figure out what is it that you are
    writing so intensely and what is burning your CPU? If there are batch
    jobs running on the system, you may want to reschedule them, you will have
    to look for the most expensive SQL statements and tune them, you will have
    to profile your applications and figure out what is going on. That is a
    job for a very serious performance analyst. You can always submit your
    trace files to Hotsos (http://www.hotsos.com) and they do a great job
    with figuring out what's going on. If you are in the US, they can probably
    send you a consultant, which I don't know because I've never worked for
    them. Your problem sounds very serious and it's not a job for an
    inexperienced person. You need a good performance analyst, someone who
    knows both the OS side and Oracle. I must confess that your data looks
    very puzzling and confusing to me.


    --
    http://www.mgogala.com


  7. Re: lots of waiting on 'db file parallel write'

    On Fri, 24 Feb 2006 09:39:27 -0800, peter wrote:

    >
    > So at different times during the day I notice that the system just
    > "hangs". When I check what is
    > waiting I see the number of wait seconds on my dbwn processes is
    > anywhere from 10-140 seconds.
    > Obviously if the db writters can go fast enough, the I start to see
    > sessions pile up and
    > lots of waits on 'free buffer wait'.


    Question here is what are you writing? You have more then one dbwr (you
    used plural when talking about db WRITERS) but statistics below doesn't
    explain all that writing.

    >
    > Our system is growing very fast because the company is growing at
    > almost 50% rate...so I'm just
    > not sure if we are hitting hardware limitations.


    I wouldn't want to exclude that possibility, and the occasional waits for
    free buffers certainly point in that direction, but the huge amount of
    spent CPU time shows me that you are running very expensive SQL commands.
    You may want to tune them properly. I assume that your site is an OLTP
    site?

    >
    > Here are some parameter settings
    > DB_CACHE_SIZE = 2GB
    > SHARED_POOL_SIZE = 1GB.
    >
    > Something interesting in the AWR reports....
    >
    > Load Profile
    > ~~~~~~~~~~~~ Per Second Per
    > Transaction
    > ---------------
    > ---------------
    > Redo size: 416,701.01
    > 17,329.98
    > Logical reads: 27,387.68
    > 1,139.01
    > Block changes: 2,974.96
    > 123.72
    > Physical reads: 5,293.28
    > 220.14
    > Physical writes: 532.99


    This is very small number of writes, compared to the number of reads. How
    is it that so few writes are generating such an enormous quantity of redo?
    Are you using BLOBS or something of that nature?


    > 22.17
    > User calls: 1,619.75
    > 67.36
    > Parses: 53.37
    > 2.22
    > Hard parses: 0.51
    > 0.02
    > Sorts: 67.82


    More sorts then transactions? Hmmm, your CPU usage may
    be rather high. Are you sure that all those sorts are
    necessary?


    > 2.82
    > Logons: 2.85
    > 0.12
    > Executes: 1,466.37
    > 60.98
    > Transactions: 24.05


    Number of transactions is extremely low....Confusing.


    >
    >
    > Instance Efficiency Percentages (Target 100%)
    > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    > Buffer Nowait %: 99.92 Redo NoWait %: 100.00


    Hey, your system is performing well, you have almost 100% BCHR :)


    > Buffer Hit %: 83.69 In-memory Sort %: 99.99
    > Library Hit %: 99.94 Soft Parse %: 99.05
    > Execute to Parse %: 96.36 Latch Hit %: 99.69
    > Parse CPU to Parse Elapsd %: 56.18 % Non-Parse CPU: 99.63
    >
    > Shared Pool Statistics Begin End
    > ------ ------
    > Memory Usage %: 93.66 93.77
    > % SQL with executions>1: 89.18 88.96


    You are either using bind variables or masterfully reusing your SQL.

    > % Memory for SQL w/exec>1: 90.72 90.46
    >
    >
    > Class Waits Total Wait Time (s) Avg Time (ms)
    > ------------------ ----------- ------------------- --------------
    > data block 76,951 1,206 16
    > undo header 71 39 545
    > segment header 6 0 10
    > undo block 161 0 0
    > -------------------------------------------------------------
    >
    > Top 5 Timed Events
    > ~~~~~~~~~~~~~~~~~~ % Total
    > Event Waits Time (s) DB Time
    > Wait Class
    > ------------------------------ ------------ ----------- ---------
    > --------------
    > db file sequential read 6,554,919 42,254 55.17


    This is index access. That's is OK for an OLTP site. Some indicators that
    I see are completely inconsistent with an assumption about you managing an
    OLTP site.

    > User I/O
    > CPU time 22,239 29.04


    CPU time is extremely high. You may want to tune your SQL and make sure
    that it uses the best possible algorithms. What is using all this CPU?
    You have very few transactions, very few writes, yet you have an
    exorbitant amount of CPU usage My experience tells me that so high amount
    of CPU time is usually spent sorting or hashing. This is inconsistent with
    my initial assumption about your site being an OLTP site.

    > db file parallel write 207,339 5,367 7.01


    Are you using parallel DML? You should figure out what is it that you are
    writing so intensely and what is burning your CPU? If there are batch
    jobs running on the system, you may want to reschedule them, you will have
    to look for the most expensive SQL statements and tune them, you will have
    to profile your applications and figure out what is going on. That is a
    job for a very serious performance analyst. You can always submit your
    trace files to Hotsos (http://www.hotsos.com) and they do a great job
    with figuring out what's going on. If you are in the US, they can probably
    send you a consultant, which I don't know because I've never worked for
    them. Your problem sounds very serious and it's not a job for an
    inexperienced person. You need a good performance analyst, someone who
    knows both the OS side and Oracle. I must confess that your data looks
    very puzzling and confusing to me.


    --
    http://www.mgogala.com


  8. Re: lots of waiting on 'db file parallel write'

    This isn't an OLTP system although we do have lots of single
    record/short dml transactions being processed on a regular basis. Our
    system is mostly read and those reads select lots of data .. queries
    that pull anywhere from 10K to 6MM records...these happen all day long.
    Most of our queries do lots of hash joins and sorts...hence the CPU
    usage...

    I suspect that what we are writing so intensely is these
    inserts/updates coming from the web when users register or subscribe to
    online retailers... these dml's are not batched so throughtout the day
    they hit our system. Our short transactions do use bind variables, but
    our large selects dont' because the WHERE clause of these selects can
    be anything the customers want it to be .... so we use
    dynamic_sampling=4 (when customers user MOD for example) and use
    histograms because data distributions are always changing... our
    customer upload data (large data sets ..millions of records) on a
    regular/daily basis too...to update their customer base.... this could
    also be the reason we are writting so much more.. we've recently
    switched to linux application servers (apache on linux) and we are now
    able to upload data much faster... (more writing).

    I guess I can always turn off async io .. but I didn't want to go there
    yet.
    thanks for the feedback.
    -peter


  9. Re: lots of waiting on 'db file parallel write'

    This isn't an OLTP system although we do have lots of single
    record/short dml transactions being processed on a regular basis. Our
    system is mostly read and those reads select lots of data .. queries
    that pull anywhere from 10K to 6MM records...these happen all day long.
    Most of our queries do lots of hash joins and sorts...hence the CPU
    usage...

    I suspect that what we are writing so intensely is these
    inserts/updates coming from the web when users register or subscribe to
    online retailers... these dml's are not batched so throughtout the day
    they hit our system. Our short transactions do use bind variables, but
    our large selects dont' because the WHERE clause of these selects can
    be anything the customers want it to be .... so we use
    dynamic_sampling=4 (when customers user MOD for example) and use
    histograms because data distributions are always changing... our
    customer upload data (large data sets ..millions of records) on a
    regular/daily basis too...to update their customer base.... this could
    also be the reason we are writting so much more.. we've recently
    switched to linux application servers (apache on linux) and we are now
    able to upload data much faster... (more writing).

    I guess I can always turn off async io .. but I didn't want to go there
    yet.
    thanks for the feedback.
    -peter


  10. Re: lots of waiting on 'db file parallel write'

    >Can you explain what you mean about v$bh and buffer thrashing and
    >veritas housekeeping??


    Well, let me first say this is just random shooting at symptoms, you
    should listen to Mladen.

    What I mean about v$bh is, your dbwr are waiting on something, probably
    i/o. So, if there are things you can do to lessen i/o, they are likely
    to be good things. In a system as you have described, much is
    dependent on what everything else is doing. So, if you are reading
    things into the SGA (which the parallel reads don't do), then updating
    them, you have things happening in the SGA. Your users are continually
    pounding on the SGA, making oracle perform a lot of work moving things
    in and around. Your index scans especially will be flushing out the
    SGA, requiring more I/O (all those sequential waits). So, are you
    using multiple buffer pools? By placing the full-scanned tables into
    the recycle pool and pounded on indices in the keep pool, you lessen
    the number of tables moving stuff in and out of the SGA (and hopefully,
    some associated I/O), and in addition, keep Oracle from placing some
    moderately sized objects in the MRU end of the LRU list, shortening
    what needs to be searched through to get updates done in the SGA.

    Since there tends to be cascading performance problems when you hit
    multiple bottlenecks, the slight change of using multiple buffer pools
    can have a dramatic effect, especially in a hybrid system. Some laugh
    at the implication of a silver bullet, but really it is just pretty
    basic tuning that happens to be on a performance inflection point.

    As far as Veritas, it's been a long time since I worked with it, so I
    just have some hazy recollection of it sometimes just going bonkers
    with something like trying to catch up with some journalling under
    load. For what it's worth, which isn't much. But I seem to remember
    something about archiving logs just overwhelming everything due to
    their sequential write nature.

    jg
    --
    @home.com is bogus.
    Death of blogs, news at 11:
    http://online.wsj.com/public/article...html?mod=blogs


+ Reply to Thread
Page 1 of 3 1 2 3 LastLast