+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 11 to 20 of 25

Application that generates 162 million records per day

  1. Re: Application that generates 162 million records per day

    Sorry if my question was a bit vague.

    What I'm looking for is the feasibility of the above and your opinions
    given the following conditions:

    1. We do not have a data center of our own and our server is co-located
    instead.
    2. Due to budgetary concerns, we only have 1 server at the moment and
    the disc capacity is 160 GB.
    3. The version of MySQL we're using is 4.0.27 MAX, since the other
    developer's web-development tools don't work with MySQL 5.0.
    4. Most of the time we are looking at the aggregate of the data. The
    aggregate function is defined as the count of each record based on
    whether a particular field's value is 1 (ad displayed) or 2 (ad
    clicked).

    Thanks!

    On Dec 10, 3:45 am, Michael Austin wrote:
    > Erol Fornoles wrote:
    > > What's your take on an application that generates a minimum of 162
    > > million records per day? I need your opinion with regards to this
    > > because the app I'm developing behaves exactly like that.Is there a particular issue with generating that much data? What is the problem

    > you need to solve? Home use? Business use? No one can give their "take" as
    > there is insufficient data to make even the wildest of guess as to what you are
    > looking for.
    >
    > I work with databases that generate 5TB+/mth. To do some quick calculations,
    > 162M*30=4.86B records and depending on the record size - I would say you have a
    > pretty healthy size database. Let's say your nominal record length is 250
    > bytes, then you are storing ~1.22Tb /mth.
    >
    > Hope you have LOTS of disc space.
    >
    > --
    > Michael Austin.
    > Database Consultant



  2. Re: Application that generates 162 million records per day

    Erol Fornoles wrote:
    > Sorry if my question was a bit vague.
    >
    > What I'm looking for is the feasibility of the above and your opinions
    > given the following conditions:
    >
    > 1. We do not have a data center of our own and our server is co-located
    > instead.
    > 2. Due to budgetary concerns, we only have 1 server at the moment and
    > the disc capacity is 160 GB.
    > 3. The version of MySQL we're using is 4.0.27 MAX, since the other
    > developer's web-development tools don't work with MySQL 5.0.
    > 4. Most of the time we are looking at the aggregate of the data. The
    > aggregate function is defined as the count of each record based on
    > whether a particular field's value is 1 (ad displayed) or 2 (ad
    > clicked).


    With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    filled more than 8 times a month. You simply do not have enough space to
    store all this information. Store summary data instead.

    Cheers,
    Nicholas Sherlock

    --
    http://www.sherlocksoftware.org

  3. Re: Application that generates 162 million records per day

    Erol Fornoles wrote:
    > Sorry if my question was a bit vague.
    >
    > What I'm looking for is the feasibility of the above and your opinions
    > given the following conditions:
    >
    > 1. We do not have a data center of our own and our server is co-located
    > instead.
    > 2. Due to budgetary concerns, we only have 1 server at the moment and
    > the disc capacity is 160 GB.
    > 3. The version of MySQL we're using is 4.0.27 MAX, since the other
    > developer's web-development tools don't work with MySQL 5.0.
    > 4. Most of the time we are looking at the aggregate of the data. The
    > aggregate function is defined as the count of each record based on
    > whether a particular field's value is 1 (ad displayed) or 2 (ad
    > clicked).


    With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    filled more than 8 times a month. You simply do not have enough space to
    store all this information. Store summary data instead.

    Cheers,
    Nicholas Sherlock

    --
    http://www.sherlocksoftware.org

  4. Re: Application that generates 162 million records per day


    Erol Fornoles ¼g¹D¡G

    > What's your take on an application that generates a minimum of 162
    > million records per day? I need your opinion with regards to this
    > because the app I'm developing behaves exactly like that.


    My company experience this as well, we have a lot of web log to be
    written into DB (more than 162 million records per day), The only way
    is to store the summary of data


  5. Re: Application that generates 162 million records per day


    Erol Fornoles ¼g¹D¡G

    > What's your take on an application that generates a minimum of 162
    > million records per day? I need your opinion with regards to this
    > because the app I'm developing behaves exactly like that.


    My company experience this as well, we have a lot of web log to be
    written into DB (more than 162 million records per day), The only way
    is to store the summary of data


  6. Re: Application that generates 162 million records per day

    Nicholas Sherlock wrote:

    > Erol Fornoles wrote:
    >
    >> Sorry if my question was a bit vague.
    >>
    >> What I'm looking for is the feasibility of the above and your opinions
    >> given the following conditions:
    >>
    >> 1. We do not have a data center of our own and our server is co-located
    >> instead.
    >> 2. Due to budgetary concerns, we only have 1 server at the moment and
    >> the disc capacity is 160 GB.
    >> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
    >> developer's web-development tools don't work with MySQL 5.0.
    >> 4. Most of the time we are looking at the aggregate of the data. The
    >> aggregate function is defined as the count of each record based on
    >> whether a particular field's value is 1 (ad displayed) or 2 (ad
    >> clicked).

    >
    >
    > With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    > filled more than 8 times a month. You simply do not have enough space to
    > store all this information. Store summary data instead.
    >
    > Cheers,
    > Nicholas Sherlock
    >

    that can be revised down a bit since the record size appears (I am guessing
    based on the info provided by the OP) to be date(8bytes=64bits), host
    name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
    ?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
    447.2GB/mth which means you are doing a LOT of purging... after the aggregate
    functions complete.

    --
    Michael Austin.
    Database Consultant

  7. Re: Application that generates 162 million records per day

    Nicholas Sherlock wrote:

    > Erol Fornoles wrote:
    >
    >> Sorry if my question was a bit vague.
    >>
    >> What I'm looking for is the feasibility of the above and your opinions
    >> given the following conditions:
    >>
    >> 1. We do not have a data center of our own and our server is co-located
    >> instead.
    >> 2. Due to budgetary concerns, we only have 1 server at the moment and
    >> the disc capacity is 160 GB.
    >> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
    >> developer's web-development tools don't work with MySQL 5.0.
    >> 4. Most of the time we are looking at the aggregate of the data. The
    >> aggregate function is defined as the count of each record based on
    >> whether a particular field's value is 1 (ad displayed) or 2 (ad
    >> clicked).

    >
    >
    > With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    > filled more than 8 times a month. You simply do not have enough space to
    > store all this information. Store summary data instead.
    >
    > Cheers,
    > Nicholas Sherlock
    >

    that can be revised down a bit since the record size appears (I am guessing
    based on the info provided by the OP) to be date(8bytes=64bits), host
    name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
    ?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
    447.2GB/mth which means you are doing a LOT of purging... after the aggregate
    functions complete.

    --
    Michael Austin.
    Database Consultant

  8. Re: Application that generates 162 million records per day

    Michael Austin wrote:
    > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
    > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
    > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
    > are doing a LOT of purging... after the aggregate functions complete.



    I wouldn't recommend storing both client hostname and client IP if space
    and throughput are issues. Resolving the client hostname can be slow,
    and storing it is likely to be redundant if the IP is stored.
    Especially if aggregate data is the normal view mode, because individual
    hostnames probably won't be displayed anyway.

    If you do need to store the hostname as a string, read about the
    tradeoffs between char and varchar. Char is a fixed-length datatype,
    and if all your columns in a given table are of fixed length, access to
    that table can be quicker.

    Store IP addresses as unsigned integers, not strings. One should
    convert the dotted quad representation to its integer equivalent with
    the INET_ATON() function.

    You could store the date as a UNIX-style timestamp in an unsigned
    integer column, instead of a MySQL DATETIME datatype. That will reduce
    it to 4 bytes instead of 8.

    You can probably combine clicked and displayed into one byte-sized
    column by making them bitfields of a single TINYINT.

    There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
    only 2008MB/day. You should be able to purge once a month and you'll be
    fine.

    Regards,
    Bill K.

  9. Re: Application that generates 162 million records per day

    Michael Austin wrote:
    > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
    > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
    > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
    > are doing a LOT of purging... after the aggregate functions complete.



    I wouldn't recommend storing both client hostname and client IP if space
    and throughput are issues. Resolving the client hostname can be slow,
    and storing it is likely to be redundant if the IP is stored.
    Especially if aggregate data is the normal view mode, because individual
    hostnames probably won't be displayed anyway.

    If you do need to store the hostname as a string, read about the
    tradeoffs between char and varchar. Char is a fixed-length datatype,
    and if all your columns in a given table are of fixed length, access to
    that table can be quicker.

    Store IP addresses as unsigned integers, not strings. One should
    convert the dotted quad representation to its integer equivalent with
    the INET_ATON() function.

    You could store the date as a UNIX-style timestamp in an unsigned
    integer column, instead of a MySQL DATETIME datatype. That will reduce
    it to 4 bytes instead of 8.

    You can probably combine clicked and displayed into one byte-sized
    column by making them bitfields of a single TINYINT.

    There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
    only 2008MB/day. You should be able to purge once a month and you'll be
    fine.

    Regards,
    Bill K.

  10. Re: Application that generates 162 million records per day

    Thanks for the replies!

    We've normalized the hostnames to another table, so only an id
    (integer) gets saved along with the ips (integers encoded via
    inet_aton).

    Aside from the allowable date ranges, would there be a trade-off if I
    used Unix-style timestamps? Is there a conversion function available
    that converts the Unix-style timestamp to a normal timestamp or
    datetime?

    On Dec 11, 2:56 am, Bill Karwin wrote:
    > Michael Austin wrote:
    > > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
    > > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
    > > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
    > > are doing a LOT of purging... after the aggregate functions complete.I wouldn't recommend storing both client hostname and client IP if space

    > and throughput are issues. Resolving the client hostname can be slow,
    > and storing it is likely to be redundant if the IP is stored.
    > Especially if aggregate data is the normal view mode, because individual
    > hostnames probably won't be displayed anyway.
    >
    > If you do need to store the hostname as a string, read about the
    > tradeoffs between char and varchar. Char is a fixed-length datatype,
    > and if all your columns in a given table are of fixed length, access to
    > that table can be quicker.
    >
    > Store IP addresses as unsigned integers, not strings. One should
    > convert the dotted quad representation to its integer equivalent with
    > the INET_ATON() function.
    >
    > You could store the date as a UNIX-style timestamp in an unsigned
    > integer column, instead of a MySQL DATETIME datatype. That will reduce
    > it to 4 bytes instead of 8.
    >
    > You can probably combine clicked and displayed into one byte-sized
    > column by making them bitfields of a single TINYINT.
    >
    > There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
    > only 2008MB/day. You should be able to purge once a month and you'll be
    > fine.
    >
    > Regards,
    > Bill K.



+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast