a neophytish normalization question - databases
This is a discussion on a neophytish normalization question - databases ; Hi all, I am designing a database for a social science project I'm doing. It is based on a Linnaen taxonomy (family, subfamily, etc). I know to get it up to 3rd normal form, I will have to have separate ...
![]() |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| I am designing a database for a social science project I'm doing. It is based on a Linnaen taxonomy (family, subfamily, etc). I know to get it up to 3rd normal form, I will have to have separate tables for each level in the hierarchy, with UIDs in each: tblFamily UID Name 1 Fam1 2 Fam2 tblSubfamily UID Name DominatedBy 1 SubFam1 FamUID_1 2 SubFam2 FamUID_1 3 SubFam3 FamUID_2 tblBranch UID Name DominatedBy 1 Branch1 SubFamUID_3 .... and so on. As I understand things (having never taken a formal database design course), I can keep it in 3rd normal form by moving down the levels and only keeping the DominatedBy field on an "immediate ancestor"/ nontransitive basis. And there is actually not all that much other attribute data until I get to the bottom level. My problem is that as I move down through the levels, the classifications become more and more controversial, as they are wont to do. And while I am going to assign things on the bottom a certain way, I am also going to release this database to others for them to change as they see fit. So, in a way, I need to build in a 4th form capacity, because things on the bottom may have a many-to-many relationship by the time somebody else gets done with it. So my question is, how do I build in the 4th form? I think I do it by doing this: tblBranch UID Name DominatedBySubFam DominatedByFam 1 Branch1 SubFamUID_3 FamUID_2 I hope I am not mucking up my explanation of the problem. Any advice is sincerely appreciated. Guy |
|
#2
| |||
| |||
|
On Oct 30, 8:36 pm, justaguy > Hi all, > > I am designing a database for a social science project I'm doing. It > is based on a Linnaen taxonomy (family, subfamily, etc). I know to > get it up to 3rd normal form, I will have to have separate tables for > each level in the hierarchy, with UIDs in each: > > tblFamily > UID Name > 1 Fam1 > 2 Fam2 > > tblSubfamily > UID Name DominatedBy > 1 SubFam1 FamUID_1 > 2 SubFam2 FamUID_1 > 3 SubFam3 FamUID_2 > > tblBranch > UID Name DominatedBy > 1 Branch1 SubFamUID_3 > > ... and so on. > > As I understand things (having never taken a formal database design > course), I can keep it in 3rd normal form by moving down the levels > and only keeping the DominatedBy field on an "immediate ancestor"/ > nontransitive basis. And there is actually not all that much other > attribute data until I get to the bottom level. > > My problem is that as I move down through the levels, the > classifications become more and more controversial, as they are wont > to do. And while I am going to assign things on the bottom a certain > way, I am also going to release this database to others for them to > change as they see fit. So, in a way, I need to build in a 4th form > capacity, because things on the bottom may have a many-to-many > relationship by the time somebody else gets done with it. > > So my question is, how do I build in the 4th form? I think I do it by > doing this: > > tblBranch > UID Name DominatedBySubFam DominatedByFam > 1 Branch1 SubFamUID_3 FamUID_2 > > I hope I am not mucking up my explanation of the problem. Any advice > is sincerely appreciated. > > Guy You might want to take a look at nested sets. |
|
#3
| |||
| |||
|
>> I am designing a database for a social science project I'm doing. *Itis based on a Linnaen taxonomy (family, subfamily, etc). *I know to get it up to 3rd normal form, I will have to have separate tables for each level in the hierarchy, with UIDs in each: << NO! NO! and NO with a cherry on top! There is no such thing as a magical UID in RDBMS -- do you also believe in the "Elixir of Life" of Kabhalah numbers? The levels are attributes of the creature you are categorizing, not entities by themselves. Then Tables never have that silly "tbl-" prefix to violate both ISO-11179 rules and basic data m The Linnaen taxonomy levels are attributes of the creature you are classifying, not entities in themselves. >> As I understand things (having never taken a formal database design course), I can keep it in 3rd normal form by moving down the levels and only keeping the DominatedBy field [sic: columns are not fields] on an "immediateancestor"/ non-transitive basis. *And there is actually not all that much other attribute data until I get to the bottom level. << Nope, totally wrong idea of 3NF. Have you worked with Dewey Decimal in a library? Would you build a table for the 100"s, one for the tens and one for the digits? As I recall and my biology is waaaaay out of date, you have less than ten levels that start at Kingdoms (Animal, Vegetable, Mineral?) so that is not too many columns. But you will want axillary tables that are referenced by the columns , so CREATE TABLE NoahArk (kingdom_name CHAR(10) NOT NULL REFERENCES Kingdoms(kingdom_name), ..); You might want to put the whole taxonomy into a Nested sets model, but I don't know enough to advise you. |
|
#4
| |||
| |||
|
On Oct 30, 8:36*pm, justaguy > Hi all, > > I am designing a database for a social science project I'm doing. *It > is based on a Linnaen taxonomy (family, subfamily, etc). *I know to > get it up to 3rd normal form, I will have to have separate tables for > each level in the hierarchy, with UIDs in each: > > tblFamily > UID * * Name > 1 * * * *Fam1 > 2 * * * *Fam2 > > tblSubfamily > UID * * *Name * * * * *DominatedBy > 1 * * * *SubFam1 * * *FamUID_1 > 2 * * * *SubFam2 * * *FamUID_1 > 3 * * * *SubFam3 * * *FamUID_2 > > tblBranch > UID * * * Name * * * * DominatedBy > 1 * * * * Branch1 * * * *SubFamUID_3 > > ... and so on. > > As I understand things (having never taken a formal database design > course), I can keep it in 3rd normal form by moving down the levels > and only keeping the DominatedBy field on an "immediate ancestor"/ > nontransitive basis. *And there is actually not all that much other > attribute data until I get to the bottom level. > > My problem is that as I move down through the levels, the > classifications become more and more controversial, as they are wont > to do. *And while I am going to assign things on the bottom a certain > way, I am also going to release this database to others for them to > change as they see fit. *So, in a way, I need to build in a 4th form > capacity, because things on the bottom may have a many-to-many > relationship by the time somebody else gets done with it. > > So my question is, how do I build in the 4th form? *I think I do it by > doing this: > > tblBranch > UID * * * *Name * * * DominatedBySubFam * * * * DominatedByFam > 1 * * * * Branch1 * * * *SubFamUID_3 * * * * * * * * * *FamUID_2 > > I hope I am not mucking up my explanation of the problem. *Any advice > is sincerely appreciated. > > Guy Hi, I am really curious why you think third normal form requires a UID. I see people making that assumption and I am just puzzled as to where they (you in this case) got that notion. As you say, it was not from formal training on database design. Thanks, Ed |
|
#5
| |||
| |||
|
Hi all, Thank you for your various replies. I'd like to provide some further information that I probably should have furnished in the first place, and hopefully my reasoning will seem a little more thought-out (if not accurate!). When I mentioned the Linnaen taxonomy, I didn't want to get too much into what I thought were unnecessary specifics. But in point of fact, what I'm trying to do is to build a GIS database that maps linguistic diversity. So the taxonomy comes into play in that linguistics utilizes such labels. What I would like to do is create separate feature classes for each level of the taxonomy - e.g., the extent of families, subfamilies, and so on. (The issue that language boundaries are never clean-cut is duly acknowledged - this is just what can be done based on historical data. Also, as it is historical data, the boundaries are considerably more discrete than they would be if I were mapping present-day data.) Here's an example of what I expect users to want to do: they want to isolate and look at the distribution of a set of languages from subfamilyA, and compare it to the distribution of an entire subfamilyB. If families, subfamilies, branches and so on were all in one large table, and the columns moved from top - down as they went left - right, surely it would violate 2NF - right? I hope I'm not wrong about that as well. So I have separate tables: one for languages that lists their entire taxonomy, which can be used to generate the set of languages. And, to keep things simple for the user, they can then just bring in a separate table that has only subfamilies and select the one they want. That's essentially why I don't want only a large table that moves from down - top as columns go from left - right. Given my need for separate feature classes that can be loaded independently in a GIS, I threw together the structure that I mentioned above. (I should say here that while I'm experienced in GIS and working with databases, it's the first time that I've independently tried to build something from the ground up. So I apologize for what must be an odd mix of jargon and cluelessness.) Given all that, looking at your replies, here is my reply: I concede the point that having broken down tables the way I did, the UIDs are not necessary. I didn't mean to imply that it was required nor "magic". But if it makes sense to have separate tables for each level, as in my user scenario, why not assign them UIDs? Also, I will look into nested sets, but I would like to make sure that I'm straight on this part first. And, looking at my user scenario, I see that I could have a language table that has family, subfamily, etc as attributes (i.e., down-top:left-right) and still get away with having separate tables for each level. I *think* this eliminates the concerns shown about how I'm not meeting 3NF. Sorry this is so long - I've tried to be concise. If anybody is still reading this, I look forward to hearing if I am right about 3NF and how, given the constraints I've named, I can get to 4NF. |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
All times are GMT -4. The time now is 02:40 PM.




Linear Mode