-
Re: Application that generates 162 million records per day
Sorry if my question was a bit vague.
What I'm looking for is the feasibility of the above and your opinions
given the following conditions:
1. We do not have a data center of our own and our server is co-located
instead.
2. Due to budgetary concerns, we only have 1 server at the moment and
the disc capacity is 160 GB.
3. The version of MySQL we're using is 4.0.27 MAX, since the other
developer's web-development tools don't work with MySQL 5.0.
4. Most of the time we are looking at the aggregate of the data. The
aggregate function is defined as the count of each record based on
whether a particular field's value is 1 (ad displayed) or 2 (ad
clicked).
Thanks!
On Dec 10, 3:45 am, Michael Austin wrote:
> Erol Fornoles wrote:
> > What's your take on an application that generates a minimum of 162
> > million records per day? I need your opinion with regards to this
> > because the app I'm developing behaves exactly like that.Is there a particular issue with generating that much data? What is the problem
> you need to solve? Home use? Business use? No one can give their "take" as
> there is insufficient data to make even the wildest of guess as to what you are
> looking for.
>
> I work with databases that generate 5TB+/mth. To do some quick calculations,
> 162M*30=4.86B records and depending on the record size - I would say you have a
> pretty healthy size database. Let's say your nominal record length is 250
> bytes, then you are storing ~1.22Tb /mth.
>
> Hope you have LOTS of disc space.
>
> --
> Michael Austin.
> Database Consultant
-
Re: Application that generates 162 million records per day
Erol Fornoles wrote:
> Sorry if my question was a bit vague.
>
> What I'm looking for is the feasibility of the above and your opinions
> given the following conditions:
>
> 1. We do not have a data center of our own and our server is co-located
> instead.
> 2. Due to budgetary concerns, we only have 1 server at the moment and
> the disc capacity is 160 GB.
> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
> developer's web-development tools don't work with MySQL 5.0.
> 4. Most of the time we are looking at the aggregate of the data. The
> aggregate function is defined as the count of each record based on
> whether a particular field's value is 1 (ad displayed) or 2 (ad
> clicked).
With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
filled more than 8 times a month. You simply do not have enough space to
store all this information. Store summary data instead.
Cheers,
Nicholas Sherlock
--
http://www.sherlocksoftware.org
-
Re: Application that generates 162 million records per day
Erol Fornoles wrote:
> Sorry if my question was a bit vague.
>
> What I'm looking for is the feasibility of the above and your opinions
> given the following conditions:
>
> 1. We do not have a data center of our own and our server is co-located
> instead.
> 2. Due to budgetary concerns, we only have 1 server at the moment and
> the disc capacity is 160 GB.
> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
> developer's web-development tools don't work with MySQL 5.0.
> 4. Most of the time we are looking at the aggregate of the data. The
> aggregate function is defined as the count of each record based on
> whether a particular field's value is 1 (ad displayed) or 2 (ad
> clicked).
With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
filled more than 8 times a month. You simply do not have enough space to
store all this information. Store summary data instead.
Cheers,
Nicholas Sherlock
--
http://www.sherlocksoftware.org
-
Re: Application that generates 162 million records per day
Erol Fornoles ¼g¹D¡G
> What's your take on an application that generates a minimum of 162
> million records per day? I need your opinion with regards to this
> because the app I'm developing behaves exactly like that.
My company experience this as well, we have a lot of web log to be
written into DB (more than 162 million records per day), The only way
is to store the summary of data
-
Re: Application that generates 162 million records per day
Erol Fornoles ¼g¹D¡G
> What's your take on an application that generates a minimum of 162
> million records per day? I need your opinion with regards to this
> because the app I'm developing behaves exactly like that.
My company experience this as well, we have a lot of web log to be
written into DB (more than 162 million records per day), The only way
is to store the summary of data
-
Re: Application that generates 162 million records per day
Nicholas Sherlock wrote:
> Erol Fornoles wrote:
>
>> Sorry if my question was a bit vague.
>>
>> What I'm looking for is the feasibility of the above and your opinions
>> given the following conditions:
>>
>> 1. We do not have a data center of our own and our server is co-located
>> instead.
>> 2. Due to budgetary concerns, we only have 1 server at the moment and
>> the disc capacity is 160 GB.
>> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
>> developer's web-development tools don't work with MySQL 5.0.
>> 4. Most of the time we are looking at the aggregate of the data. The
>> aggregate function is defined as the count of each record based on
>> whether a particular field's value is 1 (ad displayed) or 2 (ad
>> clicked).
>
>
> With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
> filled more than 8 times a month. You simply do not have enough space to
> store all this information. Store summary data instead.
>
> Cheers,
> Nicholas Sherlock
>
that can be revised down a bit since the record size appears (I am guessing
based on the info provided by the OP) to be date(8bytes=64bits), host
name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
447.2GB/mth which means you are doing a LOT of purging... after the aggregate
functions complete.
--
Michael Austin.
Database Consultant
-
Re: Application that generates 162 million records per day
Nicholas Sherlock wrote:
> Erol Fornoles wrote:
>
>> Sorry if my question was a bit vague.
>>
>> What I'm looking for is the feasibility of the above and your opinions
>> given the following conditions:
>>
>> 1. We do not have a data center of our own and our server is co-located
>> instead.
>> 2. Due to budgetary concerns, we only have 1 server at the moment and
>> the disc capacity is 160 GB.
>> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
>> developer's web-development tools don't work with MySQL 5.0.
>> 4. Most of the time we are looking at the aggregate of the data. The
>> aggregate function is defined as the count of each record based on
>> whether a particular field's value is 1 (ad displayed) or 2 (ad
>> clicked).
>
>
> With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
> filled more than 8 times a month. You simply do not have enough space to
> store all this information. Store summary data instead.
>
> Cheers,
> Nicholas Sherlock
>
that can be revised down a bit since the record size appears (I am guessing
based on the info provided by the OP) to be date(8bytes=64bits), host
name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
447.2GB/mth which means you are doing a LOT of purging... after the aggregate
functions complete.
--
Michael Austin.
Database Consultant
-
Re: Application that generates 162 million records per day
Michael Austin wrote:
> date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
> (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
> looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
> are doing a LOT of purging... after the aggregate functions complete.
I wouldn't recommend storing both client hostname and client IP if space
and throughput are issues. Resolving the client hostname can be slow,
and storing it is likely to be redundant if the IP is stored.
Especially if aggregate data is the normal view mode, because individual
hostnames probably won't be displayed anyway.
If you do need to store the hostname as a string, read about the
tradeoffs between char and varchar. Char is a fixed-length datatype,
and if all your columns in a given table are of fixed length, access to
that table can be quicker.
Store IP addresses as unsigned integers, not strings. One should
convert the dotted quad representation to its integer equivalent with
the INET_ATON() function.
You could store the date as a UNIX-style timestamp in an unsigned
integer column, instead of a MySQL DATETIME datatype. That will reduce
it to 4 bytes instead of 8.
You can probably combine clicked and displayed into one byte-sized
column by making them bitfields of a single TINYINT.
There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
only 2008MB/day. You should be able to purge once a month and you'll be
fine.
Regards,
Bill K.
-
Re: Application that generates 162 million records per day
Michael Austin wrote:
> date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
> (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
> looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
> are doing a LOT of purging... after the aggregate functions complete.
I wouldn't recommend storing both client hostname and client IP if space
and throughput are issues. Resolving the client hostname can be slow,
and storing it is likely to be redundant if the IP is stored.
Especially if aggregate data is the normal view mode, because individual
hostnames probably won't be displayed anyway.
If you do need to store the hostname as a string, read about the
tradeoffs between char and varchar. Char is a fixed-length datatype,
and if all your columns in a given table are of fixed length, access to
that table can be quicker.
Store IP addresses as unsigned integers, not strings. One should
convert the dotted quad representation to its integer equivalent with
the INET_ATON() function.
You could store the date as a UNIX-style timestamp in an unsigned
integer column, instead of a MySQL DATETIME datatype. That will reduce
it to 4 bytes instead of 8.
You can probably combine clicked and displayed into one byte-sized
column by making them bitfields of a single TINYINT.
There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
only 2008MB/day. You should be able to purge once a month and you'll be
fine.
Regards,
Bill K.
-
Re: Application that generates 162 million records per day
Thanks for the replies!
We've normalized the hostnames to another table, so only an id
(integer) gets saved along with the ips (integers encoded via
inet_aton).
Aside from the allowable date ranges, would there be a trade-off if I
used Unix-style timestamps? Is there a conversion function available
that converts the Unix-style timestamp to a normal timestamp or
datetime?
On Dec 11, 2:56 am, Bill Karwin wrote:
> Michael Austin wrote:
> > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
> > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
> > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
> > are doing a LOT of purging... after the aggregate functions complete.I wouldn't recommend storing both client hostname and client IP if space
> and throughput are issues. Resolving the client hostname can be slow,
> and storing it is likely to be redundant if the IP is stored.
> Especially if aggregate data is the normal view mode, because individual
> hostnames probably won't be displayed anyway.
>
> If you do need to store the hostname as a string, read about the
> tradeoffs between char and varchar. Char is a fixed-length datatype,
> and if all your columns in a given table are of fixed length, access to
> that table can be quicker.
>
> Store IP addresses as unsigned integers, not strings. One should
> convert the dotted quad representation to its integer equivalent with
> the INET_ATON() function.
>
> You could store the date as a UNIX-style timestamp in an unsigned
> integer column, instead of a MySQL DATETIME datatype. That will reduce
> it to 4 bytes instead of 8.
>
> You can probably combine clicked and displayed into one byte-sized
> column by making them bitfields of a single TINYINT.
>
> There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
> only 2008MB/day. You should be able to purge once a month and you'll be
> fine.
>
> Regards,
> Bill K.