+ Reply to Thread
Results 1 to 10 of 10

Suggestions for refactoring unusual tables

  1. Suggestions for refactoring unusual tables

    Hello.

    I'm looking at a medium/large web application with 72 tables on a
    company intranet. Two of the tables are unusual; they appear to have
    been created to avoid having too many additional (very small) tables in
    the database. The rough structure of these tables is like this:

    Table "catalog":
    ----------------
    id (int, primary key)
    systext (string)
    dshort (string)
    dlong (string)

    Table "catalog_entry":
    ----------------------
    id (int, primary key)
    catalog_id (int, references catalog)
    systext (string)
    dshort (string)
    dlong (string)
    rank (int)

    "systext" is an alternate unique human-readable identifier for the
    record, so that it can be referenced by name from SQL queries.
    "dshort" and "dlong" are short and long labels, respectively.
    "rank" is used to determine in which order the catalog_entry records
    should be displayed.

    Most of the other tables in the schema reference catalog_entry, for
    various purposes (such as classification, current status, etc).
    For example:

    Table "contact":
    ----------------
    id (int, primary key)
    name (string)
    ....
    type_id (int, references catalog_entry)
    language_id (int, references catalog_entry)
    billing_type_id (int, references catalog_entry

    When the users edit a record, they usually see a dropdown box to select
    one of the entries in a catalog. That's where the "dshort", "dlong", and
    "rank" fields are used.

    There are 63 catalogs. Only 6 of them contain 20+ entries (max. is 57),
    half of them contain 5 entries or less (min. is 2). Here is some example
    data from catalog_entry:

    SELECT cat.systext, entry.systext
    FROM catalog cat
    JOIN catalog_entry entry ON entry.catalog_id = cat.id
    WHERE cat.systext IN ('coverage', 'document_type');

    cat.systext | entry.systext
    ---------------+------------------------
    coverage | international
    coverage | national
    coverage | regional
    document_type | notice_of_registration
    document_type | reminder_1
    document_type | reminder_2
    ( ... 25 more results ... )
    document_type | publication_invoice
    document_type | registragion_expired

    My first thought was this looks unhealthy, and that every catalog should
    be a separate table. That way the referencial integrity checks could be
    enforced by the database - as it is, it would not be a foreign key
    violation if the "language_id" value from the "contact" table pointed to
    a catalog_entry that is actually part of the "document_type" catalog.

    However, I hesitate to break this up, not just because that would almost
    double the number of tables, or because it would mean a lot of
    restructuring, but also because most of these new tables would be *very*
    small (5 rows or less), and they would all have the same columns.

    To make matters worse, catalogs can (optionally) be organized
    hierarchically - some catalogs have one or more parent catalogs (there
    is a separate table for this n:m relationship).

    I'm not sure how to approach this. Is that part of the database design
    "good enough" to be left alone, or should I break it up? If so, is it
    really usual or desirable to have lots of mini-tables that all look
    alike, and only have 2-5 rows each?
    On the other hand, I suppose I could solve the current weakness
    concerning referential integrity with column constraints, so that
    shouldn't be a problem.

    Any suggestions or hints would be very appreciated.


    TIA,
    stefan

  2. Re: Suggestions for refactoring unusual tables

    >> Any suggestions or hints would be very appreciated. <<

    This nightmare is called an EAV design; you can Google for the
    painful details. Youn will have to throw it all out and start over
    with a relational design. But if this has been used for about a year,
    you can expect to have orphans and bad data everywhere.




  3. Re: Suggestions for refactoring unusual tables

    On 2008-09-16 15:23, --CELKO-- wrote:
    >>> Any suggestions or hints would be very appreciated. <<

    >
    > This nightmare is called an EAV design; you can Google for the
    > painful details. Youn will have to throw it all out and start over
    > with a relational design. But if this has been used for about a year,
    > you can expect to have orphans and bad data everywhere.


    Thanks for your reply. I have never implemented an EAV design myself,
    but I'm somewhat familiar with what it means, and what the main problems
    are. I wasn't aware that the design that I described would also qualify
    as EAV. It looked more like a concatenation of a number of small tables
    with identical layout into one larger table (like categories). Orphans
    would not be possible in this case, but bad references could happen.
    After 4 years in use, I've only ever seen that happen during development
    - I guess we were lucky.

    I agree that it doesn't look good. I did not design that database
    myself, I only write the software that uses it. Would it really be
    better to split the catalog_entries table into 63 very small tables, like:

    print_status:
    id systext dshort dlong
    -----------------------------------------------------
    1 "pending" "Queued" "Queued to be sent to the printer"
    2 "error" "Error" "Print job had errors"
    3 "done" "Printed" "Document was printed without errors"

    renewal_type:
    id systext dshort dlong
    -----------------------------------------------------
    1 "expiration" "Expiration" "Date of expiration"
    2 "grant" "Issuance" "Date of issuance"

    (etc.)

    All of these 63 tables would look exactly alike; some of them really
    only have two rows, others have 40-50 rows. Simple enumeration wouldn't
    work, because the text fields dshort and dlong are necessary and can be
    edited by (some) users.

    What are the best practices of dealing with very small tables like
    these? I assume that they must exist in some larger data models (I heard
    that the Siebel CRM uses 1500+ tables).


    - Conrad

  4. Re: Suggestions for refactoring unusual tables

    Conrad Lender skrev:
    > On 2008-09-16 15:23, --CELKO-- wrote:
    >>>> Any suggestions or hints would be very appreciated. <<

    >> This nightmare is called an EAV design; you can Google for the
    >> painful details. Youn will have to throw it all out and start over
    >> with a relational design. But if this has been used for about a year,
    >> you can expect to have orphans and bad data everywhere.


    No need to throw a perfectly good design out because of programmer
    incompetence. Better throw the programmer away :-)
    >
    > Thanks for your reply. I have never implemented an EAV design myself,
    > but I'm somewhat familiar with what it means, and what the main problems
    > are. I wasn't aware that the design that I described would also qualify
    > as EAV. It looked more like a concatenation of a number of small tables
    > with identical layout into one larger table (like categories). Orphans
    > would not be possible in this case, but bad references could happen.
    > After 4 years in use, I've only ever seen that happen during development
    > - I guess we were lucky.


    If the programming is done right, and if data-manipulation is done
    through a few carefully tested proceures, like

    catalog_id = catalog_add(catalog,dshort,dlong)
    catalogEntry_id =catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
    getCatalogByName(catalogname)

    etc, orphans shouldn't happen.

    Also, it appears to me that catalog and catalog_entry is only populated
    during th development, not later. Later only dshort and dlong is changed.
    >
    > I agree that it doesn't look good. I did not design that database
    > myself, I only write the software that uses it. Would it really be
    > better to split the catalog_entries table into 63 very small tables, like:


    Why doesn't it look good? Now, you understand what's going on, can you
    give a reason for not using EAV?

    > All of these 63 tables would look exactly alike; some of them really
    > only have two rows, others have 40-50 rows. Simple enumeration wouldn't
    > work, because the text fields dshort and dlong are necessary and can be
    > edited by (some) users.


    Horrible idea. Would you also create 63 routines for editing each of the
    63 tables?

    Leif

  5. Re: Suggestions for refactoring unusual tables

    >> No need to throw a perfectly good design out because of programmer incompetence. Better throw the programmer away :-) <<

    Throw away the idiot who did the EAV. This is not a good design -- in
    fact, it is not a design at all. It is a metadata framework which is
    lacking in any kind of data integrity or performance.

    >> If the programming is done right, and if data-manipulation is done through a few carefully tested procedures, like


    catalog_id = catalog_add(catalog,dshort,dlong)
    catalogEntry_id
    =catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
    getCatalogByName(catalogname)

    etc, orphans shouldn't happen. <<

    Wrong, as I know all too well from experience. Where do you prevent
    an improper reference in the data (i.e. a postal code that does not
    exist)? How do you maintain data integrity (we have to cascade a
    delete or update on a postal code).

    >> Also, it appears to me that catalog and catalog_entry is only populated during the development, not later. Later only dshort and dlong is changed. <<


    Where did you get that spec? How do prevent inserts, updates and
    deletes to catalog and catalog_entry?

    >>Why doesn't it look good? Now, you understand what's going on, can you give a reason for not using EAV? <<


    I found an old "cut & paste". Someone like you posted this:

    CREATE TABLE EAV -- no key declared
    (key_col VARCHAR (10), -- what does null mean?
    attrib_value VARCHAR (50)); -- what does null mean?

    INSERT INTO EAV VALUES ('LOCATION', 'Bedroom');
    INSERT INTO EAV VALUES ('LOCATION', 'Dining Room');
    INSERT INTO EAV VALUES ('LOCATION', 'Bathroom');
    INSERT INTO EAV VALUES ('LOCATION', 'courtyard');
    INSERT INTO EAV VALUES ('EVENT', 'verbal aggression');
    INSERT INTO EAV VALUES ('EVENT', 'peer');
    INSERT INTO EAV VALUES ('EVENT', 'bad behavior');
    INSERT INTO EAV VALUES ('EVENT', 'other');

    CREATE TABLE EAV_DATA --no constraints, defaults, DRI
    (id INTEGER IDENTITY (1, 1) NOT NULL, --vague names
    bts_id INTEGER NULL,
    key_col VARCHAR (10) NULL,
    attrib_value VARCHAR (50) NULL );

    INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
    INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
    INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
    INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
    INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
    INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
    INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
    INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
    INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

    Ideally, the result set of the query would be Location Event count
    (headings if possible)

    Bedroom verbal aggression 1
    Bedroom peer 0
    Bedroom bad behavior 0
    Bedroom other 2
    Dining Room verbal aggression 0
    Dining Room peer 0
    Dining Room bad behavior 0
    Dining Room other 0
    Bathroom verbal aggression 0
    Bathroom peer 0
    Bathroom bad behavior 0
    Bathroom other 0
    courtyard verbal aggression 0
    courtyard peer 1
    courtyard bad behavior 0
    courtyard other 1

    Also, if possible, another query would return this result set. (I
    think I know how to do this one.)

    Location Event count
    Bedroom verbal aggression 1
    Bedroom other 2
    courtyard peer 1
    courtyard other 1

    Here is an answer from Thomas Coleman

    SELECT Locations.locationvalue, Events.eventvalue,
    (SELECT COUNT(*)
    FROM (SELECT LocationData.locationvalue,
    EventData.eventvalue
    FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
    FROM eav_data AS TD1
    WHERE TD1.key = 'location') AS LocationData
    INNER JOIN
    (SELECT TD2.bts_id, TD2.value AS eventvalue
    FROM eav_data AS TD2
    WHERE TD2.key = 'event'
    ) AS EventData
    ON LocationData.bts_id = EventData.bts_id
    ) AS CollatedEventData
    WHERE CollatedEventData.locationvalue = Locations.locationvalue
    AND CollatedEventData.eventvalue = Events.eventvalue
    FROM (SELECT T1.value AS locationvalue
    FROM EAV AS T1
    WHERE T1.key = 'location') AS Locations,
    (SELECT T2.value AS eventvalue
    FROM EAV AS T2
    WHERE T2.key = 'event') AS Events
    ORDER BY Locations.locationvalue, Events.eventvalue ,
    SELECT Locations.locationvalue, Events.eventvalue
    (SELECT COUNT(*)
    FROM (SELECT LocationData.locationvalue,
    EventData.eventvalue
    FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
    FROM eav_data AS TD1
    WHERE TD1.key = 'location') AS LocationData
    INNER JOIN
    (SELECT TD2.bts_id, TD2.value AS eventvalue
    FROM eav_data AS TD2
    WHERE TD2.key = 'event') AS EventData
    ON LocationData.bts_id = EventData.bts_id)
    AS CollatedEventData
    WHERE CollatedEventData.locationvalue = Locations.locationvalue
    AND CollatedEventData.eventvalue = Events.eventvalue)
    FROM (SELECT T1.value AS locationvalue
    FROM EAV AS T1
    WHERE T1.key = 'location') AS Locations,
    (SELECT T2.value AS eventvalue
    FROM EAV AS T2
    WHERE T2.key = 'event') AS Events;

    Is the same thing in a proper schema as:

    SELECT L.locationvalue, E.eventvalue, COUNT(*)
    FROM Locations AS L, Events AS E
    WHERE L.btd_id = E.btd_id
    GROUP BY L.locationvalue, E.eventvalue;

    The reason that I had to use so many subqueries is that those entities
    are all plopped into the same table. There should be separate tables
    for Locations and Events. Now write this same thing for a 8 table
    query.

    The column names are seriously painful. Don't use reserved words like
    "key" and "value" for column names; that is metadata. It means that
    the developer *has* surround the column name with double quotes for
    everything. And they are too vague to be data element names anyway!

    There is such a thing as "too" generic. There has to be some structure
    or everything becomes nothing more than a couple of tables called
    "things". The real key (no pun intended) is commonality. Is there a
    pattern to the data that they want to store? It may not be possible to
    create one structure to rule them all and in the darkness bind them.

    "To be is to be something in particular; to be nothing in particular
    is to be nothing." --Aristotle

    All data integrity is destroyed. Any typo becomes a new attribute or
    entity. Entities are found missing attributes, so all the reports are
    wrong.

    Try to write a single CHECK() constraint that works for all the
    attributes of those 30+ entities your users created because you were
    too dumb or too lazy to do your job. It can be done! You need a case
    expression almost 70 WHEN clauses for a simple invoice and order
    system when I tried it as an exercise.

    Try to write a single DEFAULT clause for 30+ entities crammed into one
    column. Impossible!

    Try to set up DRI actions among the entities. If you thought the WHEN
    clauses in the single CASE expression were unmaintainable, wait until
    you see the "TRIGGERs from Hell" -- Too bad that they might not fit
    into older SQL Server which had some size limits. Now maintain it.

    For those who are interested, there are couple of links to articles I
    found on the net:

    Generic Design of Web-Based Clinical Databases
    http://www.jmir.org/2003/4/e27*/

    The EAV/CR Model of Data Representation
    http://ycmi.med.yale.edu/nadka*rni/eav_CR_contents.htm

    An Introduction to Entity-Attribute-Value Design for Generic
    Clinical Study Data Management Systems
    http://ycmi.med.yale.edu/nadka*rni/I...20*systems.htm

    Data Extraction and Ad Hoc Query of an EntityŚ AttributeŚ Value
    Database
    http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...

    Exploring Performance Issues for a Clinical Database Organized Using
    an Entity-Attribute-Value Representation
    http://www.pubmedcentral.nih.g*ov/ar...=pub*med&pubme...

    A really good horror story about this kind of disaster is at:

    http://www.simple-talk.com/opinion/o...ces/bad-carma/

    >> All of these 63 tables would look exactly alike; <<


    Not in a properly designed schema. Each table will model one and only
    set of entities or one and only one relationship. Identical tables
    would be another common error -- attribute splitting. This is basic
    stuff!!

    >> Would you also create 63 routines for editing each of the 63 tables? <<


    I work with properly designed schemas that have hundred tables.
    Thanks to declarative DRI actions, DEFAULT, and CHECK() constraints,
    most of the work is done by the SQL engine itself. But, yes we do
    have more than one procedure to maintain something like the database
    for General Motors, a major hospital, etc.

  6. Re: Suggestions for refactoring unusual tables


    --CELKO-- wrote:
    >> No need to throw a perfectly good design out because of programmer
    >> incompetence. Better throw the programmer away :-) <<

    >
    > Throw away the idiot who did the EAV. This is not a good design -- in
    > fact, it is not a design at all. It is a metadata framework which is
    > lacking in any kind of data integrity or performance.


    So you took the flamebait :-)
    EAV is useful in this case, where it is used for a lot of pull-down options
    on a data-entry screen.
    It is a lookup-table. It is not the "real" data on each person/order/case
    record.

    >
    >> If the programming is done right, and if data-manipulation is done
    >> through a few carefully tested procedures, like

    >
    >
    >> etc, orphans shouldn't happen. <<

    >
    > Wrong, as I know all too well from experience. Where do you prevent
    > an improper reference in the data (i.e. a postal code that does not
    > exist)? How do you maintain data integrity (we have to cascade a
    > delete or update on a postal code).


    Simple: If the postal code does not exist, you can't select it from the data
    entry screen pulldown.
    And you don't just delete or update the entries in the catalog.
    Eg, if you change 90210 from "Beverly Hills" to "Beverly Mountains" in the
    catalog there i no cascading to the data, as only the postcode is changed.
    If you split 90210 into 90211 "Upper Beverly Hills" and 90212 "Lower Beverly
    Hills", you somehow have to manually/programmatically look at each street
    and decide which goes in "Upper" and "Lower"

    Then again, you only update the datatable, where postalcode and street is in
    its own fields in the records.

    If you delete a postcode, would you expect the persons in that postcode to
    be deleted from the table by a cascade?


    >>> Also, it appears to me that catalog and catalog_entry is only
    >>> populated during the development, not later. Later only dshort and
    >>> dlong is changed. <<

    >
    > Where did you get that spec? How do prevent inserts, updates and
    > deletes to catalog and catalog_entry?
    >

    The entries in the catalog is like this

    document_type | publication_invoice
    document_type | registragion_expired

    There should not be deletes to this; if a document is marked "Registration
    expired", this should never change.
    There might come new types, but never should old removed. There might
    however be a marking for "Obsolete", so it is not used for new records.

    >>> Why doesn't it look good? Now, you understand what's going on, can
    >>> you give a reason for not using EAV? <<


    I guess it is a little religious, how much is to be done in the database,
    and how much in the application.
    In this case of lookup-tables for 64 fields, find the EAV approach usable
    (Not sure EAV is the proper term in this case)
    I'd think more than twice of doing EAV on "real data"

    One place where it could be usefull is instead of having
    phone1,phone2,phone3,fax1,fax2,fax3,email1,email2,email3 as contact methods,
    I would have a contact-table of
    person-id
    contact_method (phone,fax,email,telex,carrierpigeon)
    contact_value(phonenumber,faxnumber,emailadress....)
    priority
    comments ("Only emergency on weekends","No female callers, wife jealous"...)

    So it would be possible to have as many contacts as needed.

    Again, use EAV, when usefull, otherwise "flat tables" or what you would call
    it.

    Leif



  7. Re: Suggestions for refactoring unusual tables

    On 2008-09-24 06:08, --CELKO-- wrote:
    >> No need to throw a perfectly good design out because of
    >> programmer incompetence. Better throw the programmer away :-)

    >
    > Throw away the idiot who did the EAV.


    Thank you both for your replies, although I'm a little surprised at the
    heated comments. The programmer in this project (not the database
    designer) is me, and see no reason to throw myself away or consider
    myself incompetent for having to work with a certain kind of data model.
    I saw what I thought was a dubious design choice, and asked for
    opinions, that's all. The rest of the schema is designed very well.
    We're not going to throw anybody away.

    I'm still not convinced that the two tables I described would bring all
    kinds of EAV-related horrors upon on us... Maybe it helps if you think
    of the catalog table as "categories" and the catalog_entries table as
    "subcategories", like they are commonly used with tagging. To continue
    the analogy, we would require certain records to be tagged with certain
    subcategories (the records have foreign key fields to the
    subcategories), that's all.

    >> If the programming is done right, and if data-manipulation is
    >> done through a few carefully tested procedures, like

    >
    > catalog_id = catalog_add(catalog,dshort,dlong)
    > catalogEntry_id
    > =catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
    > getCatalogByName(catalogname)
    >
    > etc, orphans shouldn't happen. <<
    >
    > Wrong, as I know all too well from experience. Where do you prevent
    > an improper reference in the data (i.e. a postal code that does not
    > exist)?


    The catalog tables aren't used for things like postal codes, these are
    kept in the address records where they belong. They are used mostly for
    classification of other records (job status, contact type, expiration
    status, etc). We could (but don't at the moment) add constraints to
    prevent improper references, like this (using PostgreSQL):

    CREATE FUNCTION belongs_to_catalog (VARCHAR(255), INT)
    RETURNS BOOLEAN
    AS '
    SELECT EXISTS (
    SELECT 1
    FROM catalog c,
    catalog_entry ce
    WHERE ce.catalog = c.id
    AND c.systext = $1
    AND ce.id = $2
    )
    ' LANGUAGE 'SQL';

    CREATE TABLE made_up_example (
    id SERIAL NOT NULL PRIMARY KEY,
    foo TEXT NOT NULL DEFAULT '-',
    example_type_id INT NOT NULL,
    CONSTRAINT fk_example_type_id
    FOREIGN KEY (example_type_id)
    REFERENCES catalog_entry (id),
    CONSTRAINT check_catalog
    CHECK (belongs_to_catalog('example_type', example_type_id))
    );

    > How do you maintain data integrity (we have to cascade a
    > delete or update on a postal code).


    We don't remove catalog entries -> no cascades.
    Edits on catalog entries are OK, but still no need for cascades.

    >> Also, it appears to me that catalog and catalog_entry is only
    >> populated during the development, not later. Later only dshort
    >> and dlong is changed.

    >
    > Where did you get that spec? How do prevent inserts, updates and
    > deletes to catalog and catalog_entry?


    Yes, I should have mentioned that the catalog related records are
    usually defined during development. The descriptions can change, and
    although we do have an interface to add and delete them later, that's
    only used rarely and only by persons who know exactly what they are
    doing. If an entry is already used, the delete will fail, and no harm
    will be done.

    >> Why doesn't it look good? Now, you understand what's going on,
    >> can you give a reason for not using EAV?


    Call it a hunch. It didn't "feel right" to put attributes like
    {active, cancelled, locked, accounted}
    from the "invoice_status" catalog and
    {internal, external, none}
    from the "translation_type" catalog into the same table, because they
    represent different sets.

    > I found an old "cut & paste". Someone like you posted this:
    >
    > CREATE TABLE EAV -- no key declared
    > (key_col VARCHAR (10), -- what does null mean?
    > attrib_value VARCHAR (50)); -- what does null mean?

    [rest snipped]

    Not sure which of us you're referring to as "someone like you" here, but
    this setup is not very related to what we're using. The monster query
    that followed the table definitions is certainly ugly, but why should we
    do aggregation like that on a categories table? We don't have all the
    other entities in a single table, so our queries would look more like this:

    SELECT f.title,
    type_cat.dlong AS type,
    status_cat.dlong AS status
    FROM file f
    JOIN catalog_entry type_cat
    ON type_cat.id = f.type_id
    JOIN catalog_entry status_cat
    ON status_cat.id = f.status_id
    WHERE status_cat.systext = 'review';

    This would select all files that have the status "review" set, and also
    show the file type, both fields coming from the catalog_entries table.

    > All data integrity is destroyed. Any typo becomes a new attribute or
    > entity. Entities are found missing attributes, so all the reports are
    > wrong.


    I'm not sure how you reach that conclusion. In our setup, typos would
    hardly have any practical consequences[*]: entity tables contain foreign
    key references to the respective catalog_entries which in turn have a
    foreign key to their catalog. For example, if the "file" table from the
    example above had a status_id field that's NOT NULL, how could there be
    missing attributes?
    [*] except for the systext field, but that is a different problem and
    has nothing to do with the table layout per se. This field also can't be
    edited except by writing directly to the DB.

    > Try to write a single CHECK() constraint that works for all the
    > attributes of those 30+ entities your users created because you were
    > too dumb or too lazy to do your job. It can be done! You need a case
    > expression almost 70 WHEN clauses for a simple invoice and order
    > system when I tried it as an exercise.


    See the (improvised and only quickly tested) belongs_to_catalog()
    function from above. This check is really all that's needed, I think.

    > Try to write a single DEFAULT clause for 30+ entities crammed into one
    > column. Impossible!


    Default values are not desirable here, and hence not allowed.

    > A really good horror story about this kind of disaster is at:
    >
    > http://www.simple-talk.com/opinion/o...ces/bad-carma/


    Quoting that article:
    |
    | Here are some facts about the Vision system:
    | - The data model comprised a single table named DATA.
    | - The DATA table had 240+ columns.
    | (...)
    | - When the Vision system was finally decommissioned, a year after
    | it went into production [in 1994!], the DATA table consumed 50GB
    | of space.
    | - 40+ associated indexes consumed another 250GB of space

    This is just mindboggling. To think that somebody actually built a thing
    like that... But in all honesty, you can't really compare a monster like
    that to the two tables I'm dealing with. Those are sitting in a
    perfectly normal relational database, and have 3 columns each, and 63
    and 591 rows, respectively.

    >>> All of these 63 tables would look exactly alike;

    >
    > Not in a properly designed schema. Each table will model one and only
    > set of entities or one and only one relationship. Identical tables
    > would be another common error -- attribute splitting. This is basic
    > stuff!!


    Then please enlighten me. What would be a better design? To continue
    with the "file" query example from above, we would need two new tables here:

    file_status:
    id systext dshort dlong
    ---------------------------------------
    1 pending pending Pending
    2 active active Active
    3 review in review Currently in review
    4 conflict contested Contents have been contested
    5 closed aev Moved to archive

    file_type
    id systext dshort dlong
    ---------------------------------------
    1 ae_request AE Request/AE
    2 adr_record ADR Record/ADR
    3 noi_recor NOI Record/NOI

    We could move the identical columns to a another new table, which would
    then look pretty much like the catalog_entries table that we had before.

    I have to ask again:
    is it really desirable, or "good design", to have 63 very small tables,
    that either
    a) all have the same columns, or
    b) only have an ID column, after we have moved the text columns to
    another table.

    The more I think about it, the less I'm actually worried by what I'm
    working with. But I would sure like to know how a pro like you would go
    about this.


    - Conrad

  8. Re: Suggestions for refactoring unusual tables

    Conrad Lender wrote:

    >On 2008-09-24 06:08, --CELKO-- wrote:
    >>> No need to throw a perfectly good design out because of
    >>> programmer incompetence. Better throw the programmer away :-)

    >>
    >> Throw away the idiot who did the EAV.

    >
    >Thank you both for your replies, although I'm a little surprised at the
    >heated comments. The programmer in this project (not the database


    It is a mistake that people keep making. We see postings about
    it here over and over and over.

    Did you know that Russian Roulette has an over 80% success rate?

    Would you like to play?

    I advise against it and EAV.

    >designer) is me, and see no reason to throw myself away or consider
    >myself incompetent for having to work with a certain kind of data model.
    >I saw what I thought was a dubious design choice, and asked for
    >opinions, that's all. The rest of the schema is designed very well.


    And you got them, didn't you?

    When you know, EAV is not dubious (doubtful). It is just plain
    bad.

    >We're not going to throw anybody away.


    [snip]

    Sincerely,

    Gene Wirchenko

    Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.

  9. Re: Suggestions for refactoring unusual tables

    On 2008-09-25 01:12, Gene Wirchenko wrote:
    > Conrad Lender wrote:
    >>On 2008-09-24 06:08, --CELKO-- wrote:
    >>> Throw away the idiot who did the EAV.

    >>Thank you both for your replies, although I'm a little surprised at the
    >>heated comments. The programmer in this project (not the database

    >
    > It is a mistake that people keep making. We see postings about
    > it here over and over and over.
    >
    > Did you know that Russian Roulette has an over 80% success rate?
    >
    > Would you like to play?
    >
    > I advise against it and EAV.

    [...]
    > When you know, EAV is not dubious (doubtful). It is just plain
    > bad.


    I can see that a number of people have had bad experiences with EAV
    designs, and are vehemently opposed to the whole idea; I understand the
    reasons for this. However, I'm still not convinced that our two
    "catalog" tables can really be described as EAV.

    There was an implicit conclusion that the tables should be refactored,
    but no suggestions about how go about it. Allow me to rephrase my question:

    1) When you have 63 groups of relatively static categorization options
    like I have described (all with the same fields, but belonging to
    semantically different categories), would it *really* be advisable to
    break that larger table up into 63 separate very small tables with the
    same column layout?

    2) With the added constraint checks from my previous post, is there any
    way that the data/references could possibly become inconsistent?

    3) How would you design a database where many of the entities can have
    distinct "types" in one or more fields (eg. a "status" field could have
    the values "old", "active", "in_review"), and these types need to have
    editable textual descriptions? Do you create small
    {entityname}_{typename} tables for each of those fields, each containing
    only a few rows?


    - Conrad

  10. Re: Suggestions for refactoring unusual tables

    On Oct 7, 12:20*am, Conrad Lender wrote:
    > On 2008-09-25 01:12, Gene Wirchenko wrote:
    >
    >
    >
    > > Conrad Lender wrote:
    > >>On 2008-09-24 06:08, --CELKO-- wrote:
    > >>> Throw away the idiot who did the EAV.
    > >>Thank you both for your replies, although I'm a little surprised at the
    > >>heated comments. The programmer in this project (not the database

    >
    > > * * *It is a mistake that people keep making. *We see postings about
    > > it here over and over and over.

    >
    > > * * *Did you know that Russian Roulette has an over 80% success rate?

    >
    > > * * *Would you like to play?

    >
    > > * * *I advise against it and EAV.

    > [...]
    > > * * *When you know, EAV is not dubious (doubtful). *It is just plain
    > > bad.

    >
    > I can see that a number of people have had bad experiences with EAV
    > designs, and are vehemently opposed to the whole idea; I understand the
    > reasons for this. However, I'm still not convinced that our two
    > "catalog" tables can really be described as EAV.
    >
    > There was an implicit conclusion that the tables should be refactored,
    > but no suggestions about how go about it. Allow me to rephrase my question:
    >
    > 1) When you have 63 groups of relatively static categorization options
    > like I have described (all with the same fields, but belonging to
    > semantically different categories), would it *really* be advisable to
    > break that larger table up into 63 separate very small tables with the
    > same column layout?


    Yes. What's wrong with several small tables? You may actually save a
    little space (the "type" column likely becomes the new table name so
    it is not repeated).

    >
    > 2) With the added constraint checks from my previous post, is there any
    > way that the data/references could possibly become inconsistent?


    I do not know, but I think it must be possible to get it inconsistent.
    Just set a new programmer up to add some new values.
    >
    > 3) How would you design a database where many of the entities can have
    > distinct "types" in one or more fields (eg. a "status" field could have
    > the values "old", "active", "in_review"), and these types need to have
    > editable textual descriptions? Do you create small
    > {entityname}_{typename} tables for each of those fields, each containing
    > only a few rows?


    I have used a Descriptions table, but only in applications which had
    to be multilingual. If you never configure a translation of your
    application, why split the description out?
    Do you really want to let users (even power users who should "know
    what they are doing") change value descriptions?
    >
    > * - Conrad


    I am currently working on a system that has a design similar to yours.
    When programming I have to either:
    hard code the ID values (very bad style and brittle code, ie, easy to
    break)
    hard code the descriptions and query for the IDs (less brittle, more
    human friendly)

    Because of the nested relationships among the codes in my system, I
    could not write simple SQL statements to insert a new service. It
    required writing a procedure to handle the relationships.

    It would be a lot easier to understand and program for a system which
    has Status codes in a status table and Service types in a Service
    table and so on. Even if those tables have only a few rows. It
    actually would make it a lot easier to expand the options.

    So I would argue less from the Relational theory view and more from a
    practical, how-the-hell-do-I-maintain- this-beast, view.

    Take the chance while you have it and split those tables out. You will
    be much happier later.

+ Reply to Thread