-
Suggestions for refactoring unusual tables
Hello.
I'm looking at a medium/large web application with 72 tables on a
company intranet. Two of the tables are unusual; they appear to have
been created to avoid having too many additional (very small) tables in
the database. The rough structure of these tables is like this:
Table "catalog":
----------------
id (int, primary key)
systext (string)
dshort (string)
dlong (string)
Table "catalog_entry":
----------------------
id (int, primary key)
catalog_id (int, references catalog)
systext (string)
dshort (string)
dlong (string)
rank (int)
"systext" is an alternate unique human-readable identifier for the
record, so that it can be referenced by name from SQL queries.
"dshort" and "dlong" are short and long labels, respectively.
"rank" is used to determine in which order the catalog_entry records
should be displayed.
Most of the other tables in the schema reference catalog_entry, for
various purposes (such as classification, current status, etc).
For example:
Table "contact":
----------------
id (int, primary key)
name (string)
....
type_id (int, references catalog_entry)
language_id (int, references catalog_entry)
billing_type_id (int, references catalog_entry
When the users edit a record, they usually see a dropdown box to select
one of the entries in a catalog. That's where the "dshort", "dlong", and
"rank" fields are used.
There are 63 catalogs. Only 6 of them contain 20+ entries (max. is 57),
half of them contain 5 entries or less (min. is 2). Here is some example
data from catalog_entry:
SELECT cat.systext, entry.systext
FROM catalog cat
JOIN catalog_entry entry ON entry.catalog_id = cat.id
WHERE cat.systext IN ('coverage', 'document_type');
cat.systext | entry.systext
---------------+------------------------
coverage | international
coverage | national
coverage | regional
document_type | notice_of_registration
document_type | reminder_1
document_type | reminder_2
( ... 25 more results ... )
document_type | publication_invoice
document_type | registragion_expired
My first thought was this looks unhealthy, and that every catalog should
be a separate table. That way the referencial integrity checks could be
enforced by the database - as it is, it would not be a foreign key
violation if the "language_id" value from the "contact" table pointed to
a catalog_entry that is actually part of the "document_type" catalog.
However, I hesitate to break this up, not just because that would almost
double the number of tables, or because it would mean a lot of
restructuring, but also because most of these new tables would be *very*
small (5 rows or less), and they would all have the same columns.
To make matters worse, catalogs can (optionally) be organized
hierarchically - some catalogs have one or more parent catalogs (there
is a separate table for this n:m relationship).
I'm not sure how to approach this. Is that part of the database design
"good enough" to be left alone, or should I break it up? If so, is it
really usual or desirable to have lots of mini-tables that all look
alike, and only have 2-5 rows each?
On the other hand, I suppose I could solve the current weakness
concerning referential integrity with column constraints, so that
shouldn't be a problem.
Any suggestions or hints would be very appreciated.
TIA,
stefan
-
Re: Suggestions for refactoring unusual tables
>> Any suggestions or hints would be very appreciated. <<
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
-
Re: Suggestions for refactoring unusual tables
On 2008-09-16 15:23, --CELKO-- wrote:
>>> Any suggestions or hints would be very appreciated. <<
>
> This nightmare is called an EAV design; you can Google for the
> painful details. Youn will have to throw it all out and start over
> with a relational design. But if this has been used for about a year,
> you can expect to have orphans and bad data everywhere.
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.
I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:
print_status:
id systext dshort dlong
-----------------------------------------------------
1 "pending" "Queued" "Queued to be sent to the printer"
2 "error" "Error" "Print job had errors"
3 "done" "Printed" "Document was printed without errors"
renewal_type:
id systext dshort dlong
-----------------------------------------------------
1 "expiration" "Expiration" "Date of expiration"
2 "grant" "Issuance" "Date of issuance"
(etc.)
All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.
What are the best practices of dealing with very small tables like
these? I assume that they must exist in some larger data models (I heard
that the Siebel CRM uses 1500+ tables).
- Conrad
-
Re: Suggestions for refactoring unusual tables
Conrad Lender skrev:
> On 2008-09-16 15:23, --CELKO-- wrote:
>>>> Any suggestions or hints would be very appreciated. <<
>> This nightmare is called an EAV design; you can Google for the
>> painful details. Youn will have to throw it all out and start over
>> with a relational design. But if this has been used for about a year,
>> you can expect to have orphans and bad data everywhere.
No need to throw a perfectly good design out because of programmer
incompetence. Better throw the programmer away :-)
>
> Thanks for your reply. I have never implemented an EAV design myself,
> but I'm somewhat familiar with what it means, and what the main problems
> are. I wasn't aware that the design that I described would also qualify
> as EAV. It looked more like a concatenation of a number of small tables
> with identical layout into one larger table (like categories). Orphans
> would not be possible in this case, but bad references could happen.
> After 4 years in use, I've only ever seen that happen during development
> - I guess we were lucky.
If the programming is done right, and if data-manipulation is done
through a few carefully tested proceures, like
catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id =catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
getCatalogByName(catalogname)
etc, orphans shouldn't happen.
Also, it appears to me that catalog and catalog_entry is only populated
during th development, not later. Later only dshort and dlong is changed.
>
> I agree that it doesn't look good. I did not design that database
> myself, I only write the software that uses it. Would it really be
> better to split the catalog_entries table into 63 very small tables, like:
Why doesn't it look good? Now, you understand what's going on, can you
give a reason for not using EAV?
> All of these 63 tables would look exactly alike; some of them really
> only have two rows, others have 40-50 rows. Simple enumeration wouldn't
> work, because the text fields dshort and dlong are necessary and can be
> edited by (some) users.
Horrible idea. Would you also create 63 routines for editing each of the
63 tables?
Leif
-
Re: Suggestions for refactoring unusual tables
>> No need to throw a perfectly good design out because of programmer incompetence. Better throw the programmer away :-) <<
Throw away the idiot who did the EAV. This is not a good design -- in
fact, it is not a design at all. It is a metadata framework which is
lacking in any kind of data integrity or performance.
>> If the programming is done right, and if data-manipulation is done through a few carefully tested procedures, like
catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id
=catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
getCatalogByName(catalogname)
etc, orphans shouldn't happen. <<
Wrong, as I know all too well from experience. Where do you prevent
an improper reference in the data (i.e. a postal code that does not
exist)? How do you maintain data integrity (we have to cascade a
delete or update on a postal code).
>> Also, it appears to me that catalog and catalog_entry is only populated during the development, not later. Later only dshort and dlong is changed. <<
Where did you get that spec? How do prevent inserts, updates and
deletes to catalog and catalog_entry?
>>Why doesn't it look good? Now, you understand what's going on, can you give a reason for not using EAV? <<
I found an old "cut & paste". Someone like you posted this:
CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10), -- what does null mean?
attrib_value VARCHAR (50)); -- what does null mean?
INSERT INTO EAV VALUES ('LOCATION', 'Bedroom');
INSERT INTO EAV VALUES ('LOCATION', 'Dining Room');
INSERT INTO EAV VALUES ('LOCATION', 'Bathroom');
INSERT INTO EAV VALUES ('LOCATION', 'courtyard');
INSERT INTO EAV VALUES ('EVENT', 'verbal aggression');
INSERT INTO EAV VALUES ('EVENT', 'peer');
INSERT INTO EAV VALUES ('EVENT', 'bad behavior');
INSERT INTO EAV VALUES ('EVENT', 'other');
CREATE TABLE EAV_DATA --no constraints, defaults, DRI
(id INTEGER IDENTITY (1, 1) NOT NULL, --vague names
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );
INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');
Ideally, the result set of the query would be Location Event count
(headings if possible)
Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1
Also, if possible, another query would return this result set. (I
think I know how to do this one.)
Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1
Here is an answer from Thomas Coleman
SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;
Is the same thing in a proper schema as:
SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;
The reason that I had to use so many subqueries is that those entities
are all plopped into the same table. There should be separate tables
for Locations and Events. Now write this same thing for a 8 table
query.
The column names are seriously painful. Don't use reserved words like
"key" and "value" for column names; that is metadata. It means that
the developer *has* surround the column name with double quotes for
everything. And they are too vague to be data element names anyway!
There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.
"To be is to be something in particular; to be nothing in particular
is to be nothing." --Aristotle
All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.
Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order
system when I tried it as an exercise.
Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!
Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.
For those who are interested, there are couple of links to articles I
found on the net:
Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27*/
The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadka*rni/eav_CR_contents.htm
An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadka*rni/I...20*systems.htm
Data Extraction and Ad Hoc Query of an Entity— Attribute— Value
Database
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...
Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...
A really good horror story about this kind of disaster is at:
http://www.simple-talk.com/opinion/o...ces/bad-carma/
>> All of these 63 tables would look exactly alike; <<
Not in a properly designed schema. Each table will model one and only
set of entities or one and only one relationship. Identical tables
would be another common error -- attribute splitting. This is basic
stuff!!
>> Would you also create 63 routines for editing each of the 63 tables? <<
I work with properly designed schemas that have hundred tables.
Thanks to declarative DRI actions, DEFAULT, and CHECK() constraints,
most of the work is done by the SQL engine itself. But, yes we do
have more than one procedure to maintain something like the database
for General Motors, a major hospital, etc.
-
Re: Suggestions for refactoring unusual tables
--CELKO-- wrote:
>> No need to throw a perfectly good design out because of programmer
>> incompetence. Better throw the programmer away :-) <<
>
> Throw away the idiot who did the EAV. This is not a good design -- in
> fact, it is not a design at all. It is a metadata framework which is
> lacking in any kind of data integrity or performance.
So you took the flamebait :-)
EAV is useful in this case, where it is used for a lot of pull-down options
on a data-entry screen.
It is a lookup-table. It is not the "real" data on each person/order/case
record.
>
>> If the programming is done right, and if data-manipulation is done
>> through a few carefully tested procedures, like
>
>
>> etc, orphans shouldn't happen. <<
>
> Wrong, as I know all too well from experience. Where do you prevent
> an improper reference in the data (i.e. a postal code that does not
> exist)? How do you maintain data integrity (we have to cascade a
> delete or update on a postal code).
Simple: If the postal code does not exist, you can't select it from the data
entry screen pulldown.
And you don't just delete or update the entries in the catalog.
Eg, if you change 90210 from "Beverly Hills" to "Beverly Mountains" in the
catalog there i no cascading to the data, as only the postcode is changed.
If you split 90210 into 90211 "Upper Beverly Hills" and 90212 "Lower Beverly
Hills", you somehow have to manually/programmatically look at each street
and decide which goes in "Upper" and "Lower"
Then again, you only update the datatable, where postalcode and street is in
its own fields in the records.
If you delete a postcode, would you expect the persons in that postcode to
be deleted from the table by a cascade?
>>> Also, it appears to me that catalog and catalog_entry is only
>>> populated during the development, not later. Later only dshort and
>>> dlong is changed. <<
>
> Where did you get that spec? How do prevent inserts, updates and
> deletes to catalog and catalog_entry?
>
The entries in the catalog is like this
document_type | publication_invoice
document_type | registragion_expired
There should not be deletes to this; if a document is marked "Registration
expired", this should never change.
There might come new types, but never should old removed. There might
however be a marking for "Obsolete", so it is not used for new records.
>>> Why doesn't it look good? Now, you understand what's going on, can
>>> you give a reason for not using EAV? <<
I guess it is a little religious, how much is to be done in the database,
and how much in the application.
In this case of lookup-tables for 64 fields, find the EAV approach usable
(Not sure EAV is the proper term in this case)
I'd think more than twice of doing EAV on "real data"
One place where it could be usefull is instead of having
phone1,phone2,phone3,fax1,fax2,fax3,email1,email2,email3 as contact methods,
I would have a contact-table of
person-id
contact_method (phone,fax,email,telex,carrierpigeon)
contact_value(phonenumber,faxnumber,emailadress....)
priority
comments ("Only emergency on weekends","No female callers, wife jealous"...)
So it would be possible to have as many contacts as needed.
Again, use EAV, when usefull, otherwise "flat tables" or what you would call
it.
Leif
-
Re: Suggestions for refactoring unusual tables
On 2008-09-24 06:08, --CELKO-- wrote:
>> No need to throw a perfectly good design out because of
>> programmer incompetence. Better throw the programmer away :-)
>
> Throw away the idiot who did the EAV.
Thank you both for your replies, although I'm a little surprised at the
heated comments. The programmer in this project (not the database
designer) is me, and see no reason to throw myself away or consider
myself incompetent for having to work with a certain kind of data model.
I saw what I thought was a dubious design choice, and asked for
opinions, that's all. The rest of the schema is designed very well.
We're not going to throw anybody away.
I'm still not convinced that the two tables I described would bring all
kinds of EAV-related horrors upon on us... Maybe it helps if you think
of the catalog table as "categories" and the catalog_entries table as
"subcategories", like they are commonly used with tagging. To continue
the analogy, we would require certain records to be tagged with certain
subcategories (the records have foreign key fields to the
subcategories), that's all.
>> If the programming is done right, and if data-manipulation is
>> done through a few carefully tested procedures, like
>
> catalog_id = catalog_add(catalog,dshort,dlong)
> catalogEntry_id
> =catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
> getCatalogByName(catalogname)
>
> etc, orphans shouldn't happen. <<
>
> Wrong, as I know all too well from experience. Where do you prevent
> an improper reference in the data (i.e. a postal code that does not
> exist)?
The catalog tables aren't used for things like postal codes, these are
kept in the address records where they belong. They are used mostly for
classification of other records (job status, contact type, expiration
status, etc). We could (but don't at the moment) add constraints to
prevent improper references, like this (using PostgreSQL):
CREATE FUNCTION belongs_to_catalog (VARCHAR(255), INT)
RETURNS BOOLEAN
AS '
SELECT EXISTS (
SELECT 1
FROM catalog c,
catalog_entry ce
WHERE ce.catalog = c.id
AND c.systext = $1
AND ce.id = $2
)
' LANGUAGE 'SQL';
CREATE TABLE made_up_example (
id SERIAL NOT NULL PRIMARY KEY,
foo TEXT NOT NULL DEFAULT '-',
example_type_id INT NOT NULL,
CONSTRAINT fk_example_type_id
FOREIGN KEY (example_type_id)
REFERENCES catalog_entry (id),
CONSTRAINT check_catalog
CHECK (belongs_to_catalog('example_type', example_type_id))
);
> How do you maintain data integrity (we have to cascade a
> delete or update on a postal code).
We don't remove catalog entries -> no cascades.
Edits on catalog entries are OK, but still no need for cascades.
>> Also, it appears to me that catalog and catalog_entry is only
>> populated during the development, not later. Later only dshort
>> and dlong is changed.
>
> Where did you get that spec? How do prevent inserts, updates and
> deletes to catalog and catalog_entry?
Yes, I should have mentioned that the catalog related records are
usually defined during development. The descriptions can change, and
although we do have an interface to add and delete them later, that's
only used rarely and only by persons who know exactly what they are
doing. If an entry is already used, the delete will fail, and no harm
will be done.
>> Why doesn't it look good? Now, you understand what's going on,
>> can you give a reason for not using EAV?
Call it a hunch. It didn't "feel right" to put attributes like
{active, cancelled, locked, accounted}
from the "invoice_status" catalog and
{internal, external, none}
from the "translation_type" catalog into the same table, because they
represent different sets.
> I found an old "cut & paste". Someone like you posted this:
>
> CREATE TABLE EAV -- no key declared
> (key_col VARCHAR (10), -- what does null mean?
> attrib_value VARCHAR (50)); -- what does null mean?
[rest snipped]
Not sure which of us you're referring to as "someone like you" here, but
this setup is not very related to what we're using. The monster query
that followed the table definitions is certainly ugly, but why should we
do aggregation like that on a categories table? We don't have all the
other entities in a single table, so our queries would look more like this:
SELECT f.title,
type_cat.dlong AS type,
status_cat.dlong AS status
FROM file f
JOIN catalog_entry type_cat
ON type_cat.id = f.type_id
JOIN catalog_entry status_cat
ON status_cat.id = f.status_id
WHERE status_cat.systext = 'review';
This would select all files that have the status "review" set, and also
show the file type, both fields coming from the catalog_entries table.
> All data integrity is destroyed. Any typo becomes a new attribute or
> entity. Entities are found missing attributes, so all the reports are
> wrong.
I'm not sure how you reach that conclusion. In our setup, typos would
hardly have any practical consequences[*]: entity tables contain foreign
key references to the respective catalog_entries which in turn have a
foreign key to their catalog. For example, if the "file" table from the
example above had a status_id field that's NOT NULL, how could there be
missing attributes?
[*] except for the systext field, but that is a different problem and
has nothing to do with the table layout per se. This field also can't be
edited except by writing directly to the DB.
> Try to write a single CHECK() constraint that works for all the
> attributes of those 30+ entities your users created because you were
> too dumb or too lazy to do your job. It can be done! You need a case
> expression almost 70 WHEN clauses for a simple invoice and order
> system when I tried it as an exercise.
See the (improvised and only quickly tested) belongs_to_catalog()
function from above. This check is really all that's needed, I think.
> Try to write a single DEFAULT clause for 30+ entities crammed into one
> column. Impossible!
Default values are not desirable here, and hence not allowed.
> A really good horror story about this kind of disaster is at:
>
> http://www.simple-talk.com/opinion/o...ces/bad-carma/
Quoting that article:
|
| Here are some facts about the Vision system:
| - The data model comprised a single table named DATA.
| - The DATA table had 240+ columns.
| (...)
| - When the Vision system was finally decommissioned, a year after
| it went into production [in 1994!], the DATA table consumed 50GB
| of space.
| - 40+ associated indexes consumed another 250GB of space
This is just mindboggling. To think that somebody actually built a thing
like that... But in all honesty, you can't really compare a monster like
that to the two tables I'm dealing with. Those are sitting in a
perfectly normal relational database, and have 3 columns each, and 63
and 591 rows, respectively.
>>> All of these 63 tables would look exactly alike;
>
> Not in a properly designed schema. Each table will model one and only
> set of entities or one and only one relationship. Identical tables
> would be another common error -- attribute splitting. This is basic
> stuff!!
Then please enlighten me. What would be a better design? To continue
with the "file" query example from above, we would need two new tables here:
file_status:
id systext dshort dlong
---------------------------------------
1 pending pending Pending
2 active active Active
3 review in review Currently in review
4 conflict contested Contents have been contested
5 closed aev Moved to archive
file_type
id systext dshort dlong
---------------------------------------
1 ae_request AE Request/AE
2 adr_record ADR Record/ADR
3 noi_recor NOI Record/NOI
We could move the identical columns to a another new table, which would
then look pretty much like the catalog_entries table that we had before.
I have to ask again:
is it really desirable, or "good design", to have 63 very small tables,
that either
a) all have the same columns, or
b) only have an ID column, after we have moved the text columns to
another table.
The more I think about it, the less I'm actually worried by what I'm
working with. But I would sure like to know how a pro like you would go
about this.
- Conrad
-
Re: Suggestions for refactoring unusual tables
Conrad Lender wrote:
>On 2008-09-24 06:08, --CELKO-- wrote:
>>> No need to throw a perfectly good design out because of
>>> programmer incompetence. Better throw the programmer away :-)
>>
>> Throw away the idiot who did the EAV.
>
>Thank you both for your replies, although I'm a little surprised at the
>heated comments. The programmer in this project (not the database
It is a mistake that people keep making. We see postings about
it here over and over and over.
Did you know that Russian Roulette has an over 80% success rate?
Would you like to play?
I advise against it and EAV.
>designer) is me, and see no reason to throw myself away or consider
>myself incompetent for having to work with a certain kind of data model.
>I saw what I thought was a dubious design choice, and asked for
>opinions, that's all. The rest of the schema is designed very well.
And you got them, didn't you?
When you know, EAV is not dubious (doubtful). It is just plain
bad.
>We're not going to throw anybody away.
[snip]
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.
-
Re: Suggestions for refactoring unusual tables
On 2008-09-25 01:12, Gene Wirchenko wrote:
> Conrad Lender wrote:
>>On 2008-09-24 06:08, --CELKO-- wrote:
>>> Throw away the idiot who did the EAV.
>>Thank you both for your replies, although I'm a little surprised at the
>>heated comments. The programmer in this project (not the database
>
> It is a mistake that people keep making. We see postings about
> it here over and over and over.
>
> Did you know that Russian Roulette has an over 80% success rate?
>
> Would you like to play?
>
> I advise against it and EAV.
[...]
> When you know, EAV is not dubious (doubtful). It is just plain
> bad.
I can see that a number of people have had bad experiences with EAV
designs, and are vehemently opposed to the whole idea; I understand the
reasons for this. However, I'm still not convinced that our two
"catalog" tables can really be described as EAV.
There was an implicit conclusion that the tables should be refactored,
but no suggestions about how go about it. Allow me to rephrase my question:
1) When you have 63 groups of relatively static categorization options
like I have described (all with the same fields, but belonging to
semantically different categories), would it *really* be advisable to
break that larger table up into 63 separate very small tables with the
same column layout?
2) With the added constraint checks from my previous post, is there any
way that the data/references could possibly become inconsistent?
3) How would you design a database where many of the entities can have
distinct "types" in one or more fields (eg. a "status" field could have
the values "old", "active", "in_review"), and these types need to have
editable textual descriptions? Do you create small
{entityname}_{typename} tables for each of those fields, each containing
only a few rows?
- Conrad
-
Re: Suggestions for refactoring unusual tables
On Oct 7, 12:20*am, Conrad Lender wrote:
> On 2008-09-25 01:12, Gene Wirchenko wrote:
>
>
>
> > Conrad Lender wrote:
> >>On 2008-09-24 06:08, --CELKO-- wrote:
> >>> Throw away the idiot who did the EAV.
> >>Thank you both for your replies, although I'm a little surprised at the
> >>heated comments. The programmer in this project (not the database
>
> > * * *It is a mistake that people keep making. *We see postings about
> > it here over and over and over.
>
> > * * *Did you know that Russian Roulette has an over 80% success rate?
>
> > * * *Would you like to play?
>
> > * * *I advise against it and EAV.
> [...]
> > * * *When you know, EAV is not dubious (doubtful). *It is just plain
> > bad.
>
> I can see that a number of people have had bad experiences with EAV
> designs, and are vehemently opposed to the whole idea; I understand the
> reasons for this. However, I'm still not convinced that our two
> "catalog" tables can really be described as EAV.
>
> There was an implicit conclusion that the tables should be refactored,
> but no suggestions about how go about it. Allow me to rephrase my question:
>
> 1) When you have 63 groups of relatively static categorization options
> like I have described (all with the same fields, but belonging to
> semantically different categories), would it *really* be advisable to
> break that larger table up into 63 separate very small tables with the
> same column layout?
Yes. What's wrong with several small tables? You may actually save a
little space (the "type" column likely becomes the new table name so
it is not repeated).
>
> 2) With the added constraint checks from my previous post, is there any
> way that the data/references could possibly become inconsistent?
I do not know, but I think it must be possible to get it inconsistent.
Just set a new programmer up to add some new values.
>
> 3) How would you design a database where many of the entities can have
> distinct "types" in one or more fields (eg. a "status" field could have
> the values "old", "active", "in_review"), and these types need to have
> editable textual descriptions? Do you create small
> {entityname}_{typename} tables for each of those fields, each containing
> only a few rows?
I have used a Descriptions table, but only in applications which had
to be multilingual. If you never configure a translation of your
application, why split the description out?
Do you really want to let users (even power users who should "know
what they are doing") change value descriptions?
>
> * - Conrad
I am currently working on a system that has a design similar to yours.
When programming I have to either:
hard code the ID values (very bad style and brittle code, ie, easy to
break)
hard code the descriptions and query for the IDs (less brittle, more
human friendly)
Because of the nested relationships among the codes in my system, I
could not write simple SQL statements to insert a new service. It
required writing a procedure to handle the relationships.
It would be a lot easier to understand and program for a system which
has Status codes in a status table and Service types in a Service
table and so on. Even if those tables have only a few rows. It
actually would make it a lot easier to expand the options.
So I would argue less from the Relational theory view and more from a
practical, how-the-hell-do-I-maintain- this-beast, view.
Take the chance while you have it and split those tables out. You will
be much happier later.