+ Reply to Thread
Results 1 to 9 of 9

is there an equivavlent to auto_increment in ingres ?

  1. is there an equivavlent to auto_increment in ingres ?

    The header says it all really, I can't seem to find any reference in the
    docs explaining how to do auto incrementing primary keys in tables in an
    ingres DB. I've gone with incrementing a biging value in a separate
    table every time an entry gets added, but this seems like a pretty hacky
    work-around. Any suggestions ?

    Morgan.

  2. Re: is there an equivavlent to auto_increment in ingres ?

    "morgan brickley" wrote in message
    news:JTLCe.2611$R5.577@news.indigo.ie...
    > The header says it all really, I can't seem to find any reference in the
    > docs explaining how to do auto incrementing primary keys in tables in an
    > ingres DB. I've gone with incrementing a biging value in a separate
    > table every time an entry gets added, but this seems like a pretty hacky
    > work-around. Any suggestions ?


    IMO the absence of an autoincrementing type is actually a valuable feature
    of Ingres, not a gap.

    Ingres r3 supports sequences, which can be used to obtain a unique value in
    a marginally more elegant way than using a counter in a table.

    I am aware that your question above is well-informed about the
    near-universal practices of our industry, but that doesn't mean that what
    you are seeking to do is wise or desirable. I could object to the idea of
    an *automatically* generated synthetic key; I could object to the concept
    that one logical key could be objectively superior (primary) to another
    logical key, and I could object to monotonic (incrementing) synthetic key
    values. And indeed I do. But if you are determined to do it, Ingres at
    least gives you sequences, if not an autoincrementing type.

    Happily you invite suggestions. In the Resources section of our website you
    will find a rather stale old paper called "Key Points About Surrogate Keys."
    It talks about the pros and cons of the hacky approaches to what you want to
    do, though today I find myself backing away from most of what it says.
    However it has an appendix which I wrote more recently, which is the reason
    I allow it to stay on the site. The appendix explains my (possibly
    absurd-sounding) views above and describes a far preferable approach.

    Roy Hann (rhann at rationalcommerce dot com)
    Rational Commerce Ltd.
    www.rationalcommerce.com
    "Ingres development, tuning, and training experts"



  3. Re: is there an equivavlent to auto_increment in ingres ?

    >> auto incrementing primary keys .<<

    A contradiction in terms! You never learned RDBMS, did you? Back to
    foundations!

    A key is a subset of attributes that are unique to each instance of a
    kind of entity model by a table. A auto-increment is an exposed
    physical locator based on the internal state of the hardware at the
    time of insertion. Like a pointer change in a 1970's navigational
    database, only you can see it and have to maintain it yourself.

    Let me go ahead one more step and play Q&A with the direction I think
    you are going:

    Q: Couldn't a compound key become very long?

    A1: So what? This is the 21-s's century and we have much better
    computers than we did in the 1950's when key size was a real physical
    issue. What is funny to me is the number of idiots who replace a
    natural two or three integer compound key with a huge GUID that no
    human being or other system can possibly understand because they think
    it will be "faster and easy" to program.

    A2: This is an implementation problem that the SQL engine can handle.
    For example, Teradata is an SQL designed for VLDB apps that uses
    hashing instead of B-tree or other indexes. They guarantee that no
    search requires more than two probes, no matter how large the database.
    A tree index requires more and more probes as the size of the database
    increases.

    A3: A long key is not always a bad thing for performance. For example,
    if I use (city, state) as my key, I get a free index on just (city) in
    a tree index. Longer keys hash better, etc.

    You are faking a sequential file's positional record number to find the
    physical storage location? Sure, if I want to lose all the advantages
    of an abstract data model, SQL set oriented programming, carry extra
    data and destroy the portability of code!

    More and more programmers who have absolutely no database training are
    being told to design a database. They are using GUIDs, IDENTITY, ROWID
    and other proprietary auto-numbering "features" in SQL products to
    imitate either a record number (sequential file system mindset) or OID
    (OO mindset) since they don't know anything else.

    Experienced database designers tend toward intelligent keys they find
    in industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They
    know that they need to verify the data against the reality they are
    modeling. A trusted external source is a good thing to have.

    The auto-increment column is a holdover from the early programming
    languages which were very close to the hardware. For example, the
    fields (not columns; big difference) in a COBOL or FORTRAN program were
    assumed to be physically located in main storage in the order they were
    declared in the program. The languages have constructs using that
    model -- logical and physical implementations are practically one! The
    data has meaning BECAUSE of the program reading it (i.e. the same bits
    could be a character in one program and be an integer in another)

    The early SQLs were based on existing file systems. The data was kept
    in physically contiguous disk pages, in physically contiguous rows,
    made up of physically contiguous columns. In short, just like a deck
    of punch cards or a magnetic tape. Most programmer still carry that
    mental model, which is why I keep doing that rant about file vs. table,
    row vs. record and column vs. field.

    But physically contiguous storage is only one way of building a
    relational database and it is not the best one. The basic idea of a
    relational database is that user is not supposed to know *how* or
    *where* things are stored at all, much less write code that depends on
    the particular physical representation in a particular release of a
    particular product on particular hardware at a particular time.

    One of the biggest errors is the IDENTITY column (actually property,
    not a column at all) in the Sybase/SQL Server family. People actually
    program with this "feature" and even use it as the primary key for the
    table! Now, let's go into painful details as to why this thing is bad.


    The first practical consideration is that IDENTITY is proprietary and
    non-portable, so you know that you will have maintenance problems when
    you change releases or port your system to other products. Newbies
    actually think they will never port code! Perhaps they only work for
    companies that are failing and will be gone. Perhaps their code is
    such crap nobody else want their application.

    But let's look at the logical problems. First try to create a table
    with two columns and try to make them both IDENTITY. If you cannot
    declare more than one column to be of a certain data type, then that
    thing is not a datatype at all, by definition. It is a property which
    belongs to the PHYSICAL table, not the LOGICAL data in the table.

    Next, create a table with one column and make it an IDENTITY. Now try
    to insert, update and delete different numbers from it. If you cannot
    insert, update and delete rows from a table, then it is not a table by
    definition.

    Finally create a simple table with one IDENTITY and a few other
    columns. Use a few statements like

    INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
    INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
    INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

    To put a few rows into the table and notice that the IDENTITY
    sequentially numbered them in the order they were presented. If you
    delete a row, the gap in the sequence is not filled in and the sequence
    continues from the highest number that has ever been used in that
    column in that particular table. This is how we did record numbers in
    pre-allocated sequential files in the 1950's, by the way. A utility
    program would then "pack" or "compress" the records that were flagged
    as deleted or unused to move the empty space to the physical end of the
    physical file. IDENTITY leaves the gaps.

    But now use a statement with a query expression in it, like this:

    INSERT INTO Foobar (a, b, c)
    SELECT x, y, z
    FROM Floob;

    Since a query result is a table, and a table is a set which has no
    ordering, what should the IDENTITY numbers be? The entire, whole,
    completed set is presented to Foobar all at once, not a row at a time.
    There are (n!) ways to number (n) rows, so which one do you pick? The
    answer has been to use whatever the *physical* order of the result set
    happened to be. That non-relational phrase "physical order" again!

    But it is actually worse than that. If the same query is executed
    again, but with new statistics or after an index has been dropped or
    added, the new execution plan could bring the result set back in a
    different physical order.

    Can you explain from a logical model why the same rows in the second
    query get different IDENTITY numbers? In the relational model, they
    should be treated the same if all the values of all the attributes are
    identical.

    Using IDENTITY as a primary key is a sign that there is no data model,
    only an imitation of a sequential file system. Since this "magic,
    all-purpose, one-size-fits-all" pseudo-identifier exists only as a
    result of the physical state of a particular piece of hardware at a
    particular time as read by the current release of a particular database
    product, how do you verify that an entity has such a number in the
    reality you are modeling?

    You will see newbies who design tables like this:

    CREATE Drivers
    (driver_id IDENTITY (1,1) NOT NULL PRIMARY KEY,
    ssn CHAR(9) NOT NULL REFERENCES Personnel(ssn),
    vin CHAR(17) NOT NULL REFERENCES Motorpool(vin));

    Now input data and submit the same row a thousand times, a million
    times. Your data integrity is trashed. The natural key was this:

    CREATE Drivers
    (ssn CHAR(9) NOT NULL REFERENCES Personnel(ssn),
    vin CHAR(17) NOT NULL REFERENCES Motorpool(vin),
    PRIMARY KEY (ssn, vin));

    To demonstrate, here is a typical newbie schema -- you will them all
    over the news groups. I call them "idiots" because they always name
    the IDENTITY property column "id" in EVERY table. They don't
    understand basic data modeling -- one and only name for an attribute.
    About half the time they don't use any DRI, but let's show it.

    CREATE TABLE MotorPool
    (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
    ssn CHAR(9) NOT NULL REFERENCES Personnel(snn),
    vin CHAR(17) NOT NULL REFERENCES Vehicle(vin));

    CREATE TABLE Personnel
    (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
    ssn CHAR(9) NOT NULL UNIQUE,
    ..);

    CREATE TABLE Vehicles
    (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
    vin CHAR(17) NOT NULL UNIQUE,
    ...);

    Now change a row in Personnel:

    UPDATE Personnel
    SET ssn = '666666666'
    WHERE id = 1;

    or

    UPDATE Personnel
    SET ssn = '666666666'
    WHERE ssn = '999999999';

    or
    BEGIN ATOMIC
    DELETE FROM Personnel WHERE id = 1;
    INSERT INTO Personnel VALUES ('666666666');
    END;

    What happened in Motorpool? There is no logical relationship between
    the real key and the exposed physical locator, not is there any way to
    make one.

    Another cute way to destroy data integrity:

    BEGIN ATOMIC
    DELETE FROM Foobar
    WHERE id = <>;
    INSERT INTO Foobar
    VALUES ( <>)
    END;

    Logically this should do nothing, but since IDENTITY has gaps, it
    trashes the data.

    Now you are REALLY thinking about relations and keys instead of 1950's
    sequential record numbering. Adding an IDENTITY column to either of
    these tables as a candidate key would be dangerously redundant; one
    query uses the IDENTITY and another uses the real key, and like a man
    with two watches, you are never sure what time it is.

    Finally, an appeal to authority, with a quote from Dr. Codd:
    "..Database users may cause the system to generate or delete a
    surrogate, but they have no control over its value, nor is its value
    ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
    E. (1979), Extending the database relational model to capture more
    meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.

    This means that a surrogate ought to act like an index; created by the
    user, managed by the system and NEVER seen by a user. That means never
    used in queries, DRI or anything else that a user does.

    Codd also wrote the following:

    "There are three difficulties in employing user-controlled keys as
    permanent surrogates for entities.

    (1) The actual values of user-controlled keys are determined by users
    and must therefore be subject to change by them (e.g. if two companies
    merge, the two employee databases might be combined with the result
    that some or all of the serial numbers might be changed.).

    (2) Two relations may have user-controlled keys defined on distinct
    domains (e.g. one uses social security, while the other uses employee
    serial numbers) and yet the entities denoted are the same.

    (3) It may be necessary to carry information about an entity either
    before it has been assigned a user-controlled key value or after it has
    ceased to have one (e.g. and applicant for a job and a retiree).

    These difficulties have the important consequence that an equi-join on
    common key values may not yield the same result as a join on common
    entities. A solution - proposed in part [4] and more fully in [14] -
    is to introduce entity domains which contain system-assigned
    surrogates. Database users may cause the system to generate or delete
    a surrogate, but they have no control over its value, nor is its value
    ever displayed to them....." (Codd in ACM TODS, pp 409-410).

    References

    Codd, E. (1979), Extending the database relational model to capture
    more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434


  4. Re: is there an equivavlent to auto_increment in ingres ?

    --CELKO-- wrote:

    >>>auto incrementing primary keys .<<

    >
    >
    > A contradiction in terms! You never learned RDBMS, did you? Back to
    > foundations!


    I'm not sure that auto incrementing keys are always a bad thing, though
    your argument is quite solid on theoretical grounds. Consider the
    following situation.

    System A drops a request for SYSTEM B to do something into a table. The
    requests consists of a customer ID, a site ID, some action flags, and a
    date/time stamp.

    System A doesn't care how to identify the request; it's done its job.

    System B comes along and starts doing its job. It spawns information out
    into other tables, sends letters, does other stuff.

    Now, while it is true that the combined information in the original
    request is unique and constitutes a valid primary key, it is not a very
    usable key. It may consist of several columns and it's a perishing
    nuisance to have to replicate all these columns into tables where their
    only usefulness is to act as the primary key. Developers make mistakes
    when they forget to add the n-th part of the join, and the QEP goes
    spare with the apparent multiplicity of join conditions. A short snappy
    reference number is also useful for customers and administrators.

    In this instance, an autoincrement field can be very useful, especially
    since you can use it in set based operations when you are inserting a
    hundred or thousand such request en masse.

    Personally, I prefer to add a randomly generated ID to such a record
    (rather than sequential) (there are several reasons for this), but then
    you have to do your own checks that the ID is unique.

  5. [Info-ingres] Re: is there an equivavlent to auto_increment iningres ?

    Gang,

    For the last three years, I've been working on a system that
    has the LOGICAL problem of ambiguous entities. Without
    surrogate keys, things would be a mess. Our "entities" actually
    lie about their primary key data. And figuring out which entities
    are lying about which conflicting primary key data is difficult,
    time-consuming, and sometimes impossible.

    Worse, the ambiguous entity is the "root" of the hierarchy. We
    often have to "merge" entities when their apparent identity is
    revealed.

    The user never sees these surrogate keys anywhere in the application.

    Many O/R mapping technologies require an integer surrogate key.
    But one piece of misinformation is that the surrogate key must
    be the primary key. This is simply not true. Who cares if there
    is a surrogate key in the table as long as it can be ignored by
    the application programmers, and more importantly, the user.

    But "auto increment" and "sequences" are still useful. We use them
    in places where the user WANTS an artificial identifier. A
    receipt number, a voucher number, etc. I realize a receipt's logical
    key is " + ". But customer support
    specialists will tell you that they can key in a 10 digit receipt
    number to find a transaction faster than looking through a customer
    order lookup screen using last name, first name, middle name,
    address, date.

    A better example is FedEx/UPS/DHL. A package's "tracking number" is a
    definite surrogate key. No doubt about it. But do you think
    the actual logical key is usable? No doubt it needs to be
    considered and modeled, but you can't seriously expect the
    customer to use it. And you can't expect every related
    entity to repeat such a key in their tables further and further
    down the line, can you?

    My last example of a surrogate key that is useful is the "user"
    you need to create at nearly every commerce site on the Internet.
    This "user", although often your e-mail address, is a completely
    artificial piece of data. It is a "handle" to an entire customer
    entity that you cannot possibly expect someone to re-identify upon
    login. Worse, reasons of privacy don't allows a "narrowing" lookup
    interface like a customer support representative might enjoy.

    These "engineering" problems are just a few examples of a great
    iceberg of issues that make the academic statement that "surrogate
    keys are wrong" just plain nuts.

    --
    Michael Leo Java, J2EE, BEA WebLogic,
    Caribou Lake LLC Oracle, Open Source, Ingres,
    mleo@cariboulake.com Real Enterprise Applications

  6. Re: [Info-ingres] Re: is there an equivavlent to auto_increment in ingres ?

    "Michael Leo" wrote in message
    news:mailman.1121792403.2069.info-ingres@cariboulake.com...

    > These "engineering" problems are just a few examples of a great
    > iceberg of issues that make the academic statement that "surrogate
    > keys are wrong" just plain nuts.


    Mikey, as I know you know, I make many of the same arguments you do, in my
    "Key Points About Surrogate Keys" paper. So plainly I agree that surrogate
    keys can be useful--when used with discipline, understanding, and very great
    reluctance.

    When an entity originates in a system, of course the system can generate a
    reference number for it. For that reason a billing system can, and must
    generate invoice numbers. (But strictly speaking, that is a synthetic key,
    not a surrogate key. Surrogate keys are invisible to the user.)

    Where I disagree with most database designers is that I would use a
    surrogate as a last resort, while many seem to resort to it first. And that
    is just plain nuts. For one thing, it usually creates an entirely spurious
    illusion of certainty where none can really exist because it has masked
    errors in the analysis of the business process or the business model. Just
    because it makes failures of identification undetectable, that doesn't mean
    they've gone away.

    Next you'll be lining up with Emiliano to tell me that null-yet-unique keys
    make sense! ;-)

    Roy







  7. Re: [Info-ingres] Re: is there an equivavlent to auto_incrementin ingres ?

    At 8:07 PM +0100 7/19/05, Roy Hann wrote:
    >"Michael Leo" wrote in message
    >news:mailman.1121792403.2069.info-ingres@cariboulake.com...
    >
    >> These "engineering" problems are just a few examples of a great
    >> iceberg of issues that make the academic statement that "surrogate
    >> keys are wrong" just plain nuts.

    >
    >Mikey, as I know you know, I make many of the same arguments you do, in my
    >"Key Points About Surrogate Keys" paper. So plainly I agree that surrogate
    >keys can be useful--when used with discipline, understanding, and very great
    >reluctance.
    >
    >When an entity originates in a system, of course the system can generate a
    >reference number for it. For that reason a billing system can, and must
    >generate invoice numbers. (But strictly speaking, that is a synthetic key,
    >not a surrogate key. Surrogate keys are invisible to the user.)


    I strive to make surrogate keys invisible to the programmer. Not always
    possible, but desirable.

    >Where I disagree with most database designers is that I would use a
    >surrogate as a last resort, while many seem to resort to it first. And that
    >is just plain nuts. For one thing, it usually creates an entirely spurious
    >illusion of certainty where none can really exist because it has masked
    >errors in the analysis of the business process or the business model. Just
    >because it makes failures of identification undetectable, that doesn't mean
    >they've gone away.


    Oh, I agree with all of the above. Especially the last point about
    masking errors in analysis. That is why it takes a lot for me to make
    a surrogate key a primary key.

    >Next you'll be lining up with Emiliano to tell me that null-yet-unique keys
    >make sense! ;-)
    >
    >Roy


    Nope. Can't bring myself to believe that.

    --
    Michael Leo Java, J2EE, BEA WebLogic,
    Caribou Lake LLC Oracle, Open Source, Ingres,
    mleo@cariboulake.com Real Enterprise Applications

  8. Re: is there an equivavlent to auto_increment in ingres ?

    >> A better example is FedEx/UPS/DHL. A package's "tracking number" is a definite surrogate key. No doubt about it. <<

    No, no, no; those are not surrogate keys. Go back to my post and read
    Dr. Codd's definition. They are "intelligent keys" that can be put in
    a bar code and used by machinery to track and to route a package. They
    have check digits, the destination, etc. in the created key.

    In the book trade, we have the SAN (Standard Address Number) which was
    used for shipping and controlled by a trusted outside source.
    Bookdealers, publishers, etc. were not in the shipping, so this was
    pretty handy. FedEx/UPS/DHL et al **are** the shipping business, so
    they have to be their own internal trusted source.


  9. [Info-ingres] Re: is there an equivavlent to auto_increment iningres ?

    At 2:30 AM -0700 7/20/05, --CELKO-- wrote:
    > >> A better example is FedEx/UPS/DHL. A package's "tracking

    >number" is a definite surrogate key. No doubt about it. <<
    >
    >No, no, no; those are not surrogate keys. Go back to my post and read
    >Dr. Codd's definition. They are "intelligent keys" that can be put in
    >a bar code and used by machinery to track and to route a package. They
    >have check digits, the destination, etc. in the created key.
    >
    >In the book trade, we have the SAN (Standard Address Number) which was
    >used for shipping and controlled by a trusted outside source.
    >Bookdealers, publishers, etc. were not in the shipping, so this was
    >pretty handy. FedEx/UPS/DHL et al **are** the shipping business, so
    >they have to be their own internal trusted source.
    >


    I'm not sure I agree. I don't think FedEx/UPS/DHL considers the
    "tracking number" as merely an intelligent key used by machinery.

    Some experience with their systems tells me they are used as the
    primary key for everything.

    Remember the consumer experience with shipping is very simple. 99%
    of us never have more than one item in transit at any one time.

    But shippers work with multi-national companies that ship 100,000
    items per day from 100 locations to 30,000 destinations at varying
    urgencies, transports, and shipping specifications.

    Thinking about it, I guess they aren't surrogate keys, as they are
    exposed to the user and not sitting side by side with the logical
    primary key. I suspect they started as surrogate keys.

    When the truly logical primary key is so hideously verbose an
    complex as a shipped package's tracking identifier, the human mind (in my
    opinion) reverts to something easier to represent it. I don't think
    anyone relates to package "12395757631923" by anything other than
    that number. I don't think people mentally substitute the dozen or
    so unique fields that logically distinguish one shipped package
    from another.

    So I'm not sure if we agree or disagree. But I really can't see how
    we can live without abstract "keys" exposed to users in everyday
    life.

    Cheers,

    --
    Michael Leo Java, J2EE, BEA WebLogic,
    Caribou Lake LLC Oracle, Open Source, Ingres,
    mleo@cariboulake.com Real Enterprise Applications

+ Reply to Thread