dbaspot
Tags Register FAQ Calendar Search Today's Posts Mark Forums Read

a neophytish normalization question - databases

This is a discussion on a neophytish normalization question - databases ; Hi all, I am designing a database for a social science project I'm doing. It is based on a Linnaen taxonomy (family, subfamily, etc). I know to get it up to 3rd normal form, I will have to have separate ...


Home > Database Forum > Other Databases > databases > a neophytish normalization question

Reply

 

LinkBack Thread Tools Display Modes
  #1  
Old 10-30-2008, 09:36 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default a neophytish normalization question

Hi all,

I am designing a database for a social science project I'm doing. It
is based on a Linnaen taxonomy (family, subfamily, etc). I know to
get it up to 3rd normal form, I will have to have separate tables for
each level in the hierarchy, with UIDs in each:

tblFamily
UID Name
1 Fam1
2 Fam2

tblSubfamily
UID Name DominatedBy
1 SubFam1 FamUID_1
2 SubFam2 FamUID_1
3 SubFam3 FamUID_2

tblBranch
UID Name DominatedBy
1 Branch1 SubFamUID_3

.... and so on.

As I understand things (having never taken a formal database design
course), I can keep it in 3rd normal form by moving down the levels
and only keeping the DominatedBy field on an "immediate ancestor"/
nontransitive basis. And there is actually not all that much other
attribute data until I get to the bottom level.

My problem is that as I move down through the levels, the
classifications become more and more controversial, as they are wont
to do. And while I am going to assign things on the bottom a certain
way, I am also going to release this database to others for them to
change as they see fit. So, in a way, I need to build in a 4th form
capacity, because things on the bottom may have a many-to-many
relationship by the time somebody else gets done with it.

So my question is, how do I build in the 4th form? I think I do it by
doing this:

tblBranch
UID Name DominatedBySubFam DominatedByFam
1 Branch1 SubFamUID_3 FamUID_2

I hope I am not mucking up my explanation of the problem. Any advice
is sincerely appreciated.

Guy
Reply With Quote
  #2  
Old 11-01-2008, 02:14 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: a neophytish normalization question

On Oct 30, 8:36 pm, justaguy wrote:
> Hi all,
>
> I am designing a database for a social science project I'm doing. It
> is based on a Linnaen taxonomy (family, subfamily, etc). I know to
> get it up to 3rd normal form, I will have to have separate tables for
> each level in the hierarchy, with UIDs in each:
>
> tblFamily
> UID Name
> 1 Fam1
> 2 Fam2
>
> tblSubfamily
> UID Name DominatedBy
> 1 SubFam1 FamUID_1
> 2 SubFam2 FamUID_1
> 3 SubFam3 FamUID_2
>
> tblBranch
> UID Name DominatedBy
> 1 Branch1 SubFamUID_3
>
> ... and so on.
>
> As I understand things (having never taken a formal database design
> course), I can keep it in 3rd normal form by moving down the levels
> and only keeping the DominatedBy field on an "immediate ancestor"/
> nontransitive basis. And there is actually not all that much other
> attribute data until I get to the bottom level.
>
> My problem is that as I move down through the levels, the
> classifications become more and more controversial, as they are wont
> to do. And while I am going to assign things on the bottom a certain
> way, I am also going to release this database to others for them to
> change as they see fit. So, in a way, I need to build in a 4th form
> capacity, because things on the bottom may have a many-to-many
> relationship by the time somebody else gets done with it.
>
> So my question is, how do I build in the 4th form? I think I do it by
> doing this:
>
> tblBranch
> UID Name DominatedBySubFam DominatedByFam
> 1 Branch1 SubFamUID_3 FamUID_2
>
> I hope I am not mucking up my explanation of the problem. Any advice
> is sincerely appreciated.
>
> Guy


You might want to take a look at nested sets.
Reply With Quote
  #3  
Old 11-01-2008, 04:23 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: a neophytish normalization question

>> I am designing a database for a social science project I'm doing. *Itis based on a Linnaen taxonomy (family, subfamily, etc). *I know to get it up to 3rd normal form, I will have to have separate tables for
each level in the hierarchy, with UIDs in each: <<

NO! NO! and NO with a cherry on top! There is no such thing as a
magical UID in RDBMS -- do you also believe in the "Elixir of Life" of
Kabhalah numbers? The levels are attributes of the creature you are
categorizing, not entities by themselves.

Then Tables never have that silly "tbl-" prefix to violate both
ISO-11179 rules and basic data m

The Linnaen taxonomy levels are attributes of the creature you are
classifying, not entities in themselves.

>> As I understand things (having never taken a formal database design course), I can keep it in 3rd normal form by moving down the levels and only keeping the DominatedBy field [sic: columns are not fields] on an "immediateancestor"/ non-transitive basis. *And there is actually not all that much other attribute data until I get to the bottom level. <<


Nope, totally wrong idea of 3NF.

Have you worked with Dewey Decimal in a library? Would you build a
table for the 100"s, one for the tens and one for the digits?

As I recall and my biology is waaaaay out of date, you have less than
ten levels that start at Kingdoms (Animal, Vegetable, Mineral?) so
that is not too many columns.

But you will want axillary tables that are referenced by the columns ,
so

CREATE TABLE NoahArk
(kingdom_name CHAR(10) NOT NULL
REFERENCES Kingdoms(kingdom_name),
..);

You might want to put the whole taxonomy into a Nested sets model, but
I don't know enough to advise you.
Reply With Quote
  #4  
Old 11-04-2008, 09:40 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: a neophytish normalization question

On Oct 30, 8:36*pm, justaguy wrote:
> Hi all,
>
> I am designing a database for a social science project I'm doing. *It
> is based on a Linnaen taxonomy (family, subfamily, etc). *I know to
> get it up to 3rd normal form, I will have to have separate tables for
> each level in the hierarchy, with UIDs in each:
>
> tblFamily
> UID * * Name
> 1 * * * *Fam1
> 2 * * * *Fam2
>
> tblSubfamily
> UID * * *Name * * * * *DominatedBy
> 1 * * * *SubFam1 * * *FamUID_1
> 2 * * * *SubFam2 * * *FamUID_1
> 3 * * * *SubFam3 * * *FamUID_2
>
> tblBranch
> UID * * * Name * * * * DominatedBy
> 1 * * * * Branch1 * * * *SubFamUID_3
>
> ... and so on.
>
> As I understand things (having never taken a formal database design
> course), I can keep it in 3rd normal form by moving down the levels
> and only keeping the DominatedBy field on an "immediate ancestor"/
> nontransitive basis. *And there is actually not all that much other
> attribute data until I get to the bottom level.
>
> My problem is that as I move down through the levels, the
> classifications become more and more controversial, as they are wont
> to do. *And while I am going to assign things on the bottom a certain
> way, I am also going to release this database to others for them to
> change as they see fit. *So, in a way, I need to build in a 4th form
> capacity, because things on the bottom may have a many-to-many
> relationship by the time somebody else gets done with it.
>
> So my question is, how do I build in the 4th form? *I think I do it by
> doing this:
>
> tblBranch
> UID * * * *Name * * * DominatedBySubFam * * * * DominatedByFam
> 1 * * * * Branch1 * * * *SubFamUID_3 * * * * * * * * * *FamUID_2
>
> I hope I am not mucking up my explanation of the problem. *Any advice
> is sincerely appreciated.
>
> Guy


Hi,
I am really curious why you think third normal form requires a UID. I
see people making that assumption and I am just puzzled as to where
they (you in this case) got that notion. As you say, it was not from
formal training on database design.

Thanks,
Ed
Reply With Quote
  #5  
Old 11-06-2008, 06:17 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: a neophytish normalization question

Hi all,

Thank you for your various replies. I'd like to provide some further
information that I probably should have furnished in the first place,
and hopefully my reasoning will seem a little more thought-out (if not
accurate!).

When I mentioned the Linnaen taxonomy, I didn't want to get too much
into what I thought were unnecessary specifics. But in point of fact,
what I'm trying to do is to build a GIS database that maps linguistic
diversity. So the taxonomy comes into play in that linguistics
utilizes such labels. What I would like to do is create separate
feature classes for each level of the taxonomy - e.g., the extent of
families, subfamilies, and so on. (The issue that language boundaries
are never clean-cut is duly acknowledged - this is just what can be
done based on historical data. Also, as it is historical data, the
boundaries are considerably more discrete than they would be if I were
mapping present-day data.)

Here's an example of what I expect users to want to do: they want to
isolate and look at the distribution of a set of languages from
subfamilyA, and compare it to the distribution of an entire
subfamilyB. If families, subfamilies, branches and so on were all in
one large table, and the columns moved from top - down as they went
left - right, surely it would violate 2NF - right? I hope I'm not
wrong about that as well. So I have separate tables: one for
languages that lists their entire taxonomy, which can be used to
generate the set of languages. And, to keep things simple for the
user, they can then just bring in a separate table that has only
subfamilies and select the one they want. That's essentially why I
don't want only a large table that moves from down - top as columns go
from left - right.

Given my need for separate feature classes that can be loaded
independently in a GIS, I threw together the structure that I
mentioned above. (I should say here that while I'm experienced in GIS
and working with databases, it's the first time that I've
independently tried to build something from the ground up. So I
apologize for what must be an odd mix of jargon and cluelessness.)

Given all that, looking at your replies, here is my reply: I concede
the point that having broken down tables the way I did, the UIDs are
not necessary. I didn't mean to imply that it was required nor
"magic". But if it makes sense to have separate tables for each
level, as in my user scenario, why not assign them UIDs? Also, I will
look into nested sets, but I would like to make sure that I'm straight
on this part first. And, looking at my user scenario, I see that I
could have a language table that has family, subfamily, etc as
attributes (i.e., down-top:left-right) and still get away with having
separate tables for each level. I *think* this eliminates the
concerns shown about how I'm not meeting 3NF.

Sorry this is so long - I've tried to be concise. If anybody is still
reading this, I look forward to hearing if I am right about 3NF and
how, given the constraints I've named, I can get to 4NF.
Reply With Quote
Reply

Thread Tools
Display Modes



All times are GMT -4. The time now is 02:40 PM.