+ Reply to Thread
Results 1 to 3 of 3

Data aggregation, syhcronization, and search

  1. Data aggregation, syhcronization, and search

    Hello All,

    I know that this problem is not SQL Server specific but I decided to
    post it here because it is a problem that I am certain many of you
    expericenced (or at least thought about).

    In my system, I have a central SQL Server Database and number of
    remote "data providers". Some of these data providers are other SQL
    Servers and others are other databases or other data provider services
    that expose a database-like interface.

    We provide two levels of search capability in the system. The user can
    search either the central database, or search one specific data
    provider. One of the new requirements, was to provide a global search,
    which would search the central database and all the data providers at
    once.

    Performing a distributed is prohibitive in our scenario because we
    have many data providers and they remotelly located. To minimize the
    search time, my first reaction was to create a copy of the data from
    the remote providers on the central database and keep it in synch with
    the data in the remote data providers. By doing this, I would expect
    to simplify the search by having simply a search in the central
    database.
    However, a number of new question concerning data synchronization
    arise:
    1. How can I keep the data in the central db in synch with the remote
    data providers? (If they all were SQL Server databases, I would use
    merger replication...but they aren't?)
    2. How to handle disconnects? If the remote provider disconnets and
    reconnects, some of the data might be stale. In pronciple, I could
    take a new snapshot and keep the data in synch from that point.
    However, this is very expensive!! Do you know of any other techniques
    for data synchronization that would minimize network traffic?
    3. Is this a good approach? What do you think?

    Your opinion is gratly appreciated.

    Kind regards
    CD


  2. RE: Data aggregation, syhcronization, and search

    Hi

    "crbd98@yahoo.com" wrote:

    > Hello All,
    >
    > I know that this problem is not SQL Server specific but I decided to
    > post it here because it is a problem that I am certain many of you
    > expericenced (or at least thought about).
    >
    > In my system, I have a central SQL Server Database and number of
    > remote "data providers". Some of these data providers are other SQL
    > Servers and others are other databases or other data provider services
    > that expose a database-like interface.
    >
    > We provide two levels of search capability in the system. The user can
    > search either the central database, or search one specific data
    > provider. One of the new requirements, was to provide a global search,
    > which would search the central database and all the data providers at
    > once.
    >
    > Performing a distributed is prohibitive in our scenario because we
    > have many data providers and they remotelly located. To minimize the
    > search time, my first reaction was to create a copy of the data from
    > the remote providers on the central database and keep it in synch with
    > the data in the remote data providers. By doing this, I would expect
    > to simplify the search by having simply a search in the central
    > database.
    > However, a number of new question concerning data synchronization
    > arise:
    > 1. How can I keep the data in the central db in synch with the remote
    > data providers? (If they all were SQL Server databases, I would use
    > merger replication...but they aren't?)
    > 2. How to handle disconnects? If the remote provider disconnets and
    > reconnects, some of the data might be stale. In pronciple, I could
    > take a new snapshot and keep the data in synch from that point.
    > However, this is very expensive!! Do you know of any other techniques
    > for data synchronization that would minimize network traffic?
    > 3. Is this a good approach? What do you think?
    >
    > Your opinion is gratly appreciated.
    >
    > Kind regards
    > CD
    >

    You don't say how you currently search the remote data providers! It may be
    more acceptable for the users if the global search initially searched the
    central database and returned the results and then searched the remote
    databases which would return something to the user quicker and reduce the
    need to speed up the remote searches.

    If you held the data centrally, you would also need to know how much latency
    would be acceptable for the data, if you could get away with uploading once a
    day out of hours then this could be an easier solution to implement.

    John

  3. Re: Data aggregation, syhcronization, and search

    Answers inline.

    --
    Hilary Cotter

    Looking for a SQL Server replication book?
    http://www.nwsu.com/0974973602.html

    Looking for a FAQ on Indexing Services/SQL FTS
    http://www.indexserverfaq.com



    wrote in message
    news:1176094164.164628.321030@q75g2000hsh.googlegroups.com...
    > Hello All,
    >
    > I know that this problem is not SQL Server specific but I decided to
    > post it here because it is a problem that I am certain many of you
    > expericenced (or at least thought about).
    >
    > In my system, I have a central SQL Server Database and number of
    > remote "data providers". Some of these data providers are other SQL
    > Servers and others are other databases or other data provider services
    > that expose a database-like interface.
    >
    > We provide two levels of search capability in the system. The user can
    > search either the central database, or search one specific data
    > provider. One of the new requirements, was to provide a global search,
    > which would search the central database and all the data providers at
    > once.
    >
    > Performing a distributed is prohibitive in our scenario because we
    > have many data providers and they remotelly located. To minimize the
    > search time, my first reaction was to create a copy of the data from
    > the remote providers on the central database and keep it in synch with
    > the data in the remote data providers. By doing this, I would expect
    > to simplify the search by having simply a search in the central
    > database.
    > However, a number of new question concerning data synchronization
    > arise:
    > 1. How can I keep the data in the central db in synch with the remote
    > data providers? (If they all were SQL Server databases, I would use
    > merger replication...but they aren't?)


    I would use transactional replication to replicate the remote SQL Servers to
    the central location. For the non-SQL Server data providers depending on the
    amount of changes, the volume of data and whether you can track changes you
    should be able to write something that will bring the data locally.
    > 2. How to handle disconnects? If the remote provider disconnets and
    > reconnects, some of the data might be stale. In pronciple, I could
    > take a new snapshot and keep the data in synch from that point.
    > However, this is very expensive!! Do you know of any other techniques
    > for data synchronization that would minimize network traffic?


    You need to implement some method of change tracking so only the changes
    will move each time rather than the entire data set.
    > 3. Is this a good approach? What do you think?
    >

    The network hop is always problematic and moving data locally is one
    approach to avoid it. Another option might be to move all providers locally
    and then have the clients use a remote access solution to access the central
    server/repository. This will work well if your links are stable and well
    connected.

    > Your opinion is gratly appreciated.
    >
    > Kind regards
    > CD
    >




+ Reply to Thread