+ Reply to Thread
Results 1 to 4 of 4

Converting text column to XML

  1. Converting text column to XML

    We just completed the conversion of one text column in our database to
    the XML datatype (about 15 million rows):
    ALTER TABLE xmlTable ALTER COLUMN xmlColumn XML

    Our database has gone from 36GB to 54GB with this single change (the
    36GB and 54GB both indicate post-shrink utilized space).

    I had read in another thread that XML was stored in a more efficient
    binary format so, I was rather suprised at the growth of the database.

    Is this to be expected or is there something I can configure to reduce
    the storage requirement?

    We're going to test a text to varchar(max) conversion next, to see how
    that stacks up against the XML conversion but would love to just store
    XML as XML.

    Thanks,

    Ken

    Assuming I can link directly so the google group aggregator, the thread
    with the more efficient reference can be viewed at
    http://groups.google.com/group/micro...a6e8a32f668783


  2. Re: Converting text column to XML

    Hello KenJ,

    > We just completed the conversion of one text column in our database to
    > the XML datatype (about 15 million rows):
    > ALTER TABLE xmlTable ALTER COLUMN xmlColumn XML
    > Our database has gone from 36GB to 54GB with this single change (the
    > 36GB and 54GB both indicate post-shrink utilized space).


    I'm not too surprised, actually. The TEXT datatype was storing the XML which
    is essentially one byte of storage for one byte of lexical XML. My guess
    is that you only stored UTF-8 encoded XML in that type instance. The internal
    storage of XML datatype is actually UTF-16 (as SQL Server understands collations,
    not encodings). So in effect, you've likely doubled the amount of space consumed
    to store the instances. Why not exactly double? SQL Server does do some tokenizing
    of entity names, so we do see something less than 2x in most cases.

    > Is this to be expected or is there something I can configure to reduce
    > the storage requirement?


    Expect it, and no, there's only ways to increase the space impact, not really
    reduce it.

    > We're going to test a text to varchar(max) conversion next, to see how
    > that stacks up against the XML conversion but would love to just store
    > XML as XML.


    The critical question comes downt to this: are you validating or other processing
    the XML with the SQL context or not? If not, don't use the XML type for exactly
    this reason. If SQL Server is just secured storage of the XML for some other
    application, consider using varbinary(max) instead so you don't have to worry
    about the encoding issues.

    Thanks,
    Kent Tegels, DevelopMentor
    http://staff.develop.com/ktegels/



  3. Re: Converting text column to XML

    I have found that it's not obvious what the most compact way to save xml data is. I blogged the tests I did and how did them here:

    http://pluralsight.com/blogs/dan/arc.../06/32814.aspx

    Dan AT Pluralsight DOT COM

  4. Re: Converting text column to XML

    Hi Kent,

    Great reply from both you and Dan Sullivan. The UTF-16 encoding was
    what caught us! We would have really been in trouble had we converted
    both TEXT columns; the high-water mark for converting the one column
    was about 215GB :)

    Since the SQL 2000 database we're looking to upgrade just uses the text
    columns as dumb storage for XML documents, we can definitely get by
    without using the XML datatype. Assuming nobody is interested in
    having the database enforce valid XML (it has to make it through MS
    bizTalk before it gets to the DB), I'll take a closer look at
    varbinary(max); I had never even considered that one.

    FYI: As I mentioned, we did run a varchar(max) conversion and it was
    virtually instantaneous. I'm guessing this is handled internally in
    almost the exact same manner as the text data type.

    Thanks again for the helpful information.

    KenJ


+ Reply to Thread