dbaspot
Tags Register FAQ Calendar Search Today's Posts Mark Forums Read

BTree comparison function - berkeley-db

This is a discussion on BTree comparison function - berkeley-db ; Hi, I'd be really grateful for any advice that you can give me regarding user-defined btree comparison functions. It might help to have a concrete example of what I'm trying to do, so for the sake of argument let's say ...


Home > Database Forum > Other Databases > berkeley-db > BTree comparison function

Reply

 

LinkBack Thread Tools Display Modes
  #1  
Old 09-01-2003, 07:57 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default BTree comparison function

Hi,

I'd be really grateful for any advice that you can give me regarding
user-defined btree comparison functions.


It might help to have a concrete example of what I'm trying to do, so
for the sake of argument let's say that my database contains keys of
the form:
struct my_key {
a_type_t A;
b_type_t B;
c_type_t C;
};

Perhaps this is a bad decision, but I have set up the database so that
the associated data are 0 bytes long. I took this decision because:
1) all the information I need is contained logically in the key,
2) a structure of this form arrives all nicely prepared for me, and
3) I want to be able to search the DB based on any field in the key,
as described below.


I would like to be able to perform searches on the database of the
kind: "Find all the database entries where key field B==value_to_match
regardless of what is in the other fields". In order to achieve this
I would like to pass to my btree comparison function a structure
initialised with:

struct my_key search_key;

search_key.A = IGNORE_ME;
search_key.B = value_to_match;
search_key.C = IGNORE_ME;

The comparison function that I had imagined I'd write would skip
fields A and C which contain the special value IGNORE_ME and would
return 0 (indicating a match) for each database entry which has
B==value_to_match in the key.

The thing that I'm unsure about is the return value from the
comparison function for cases where the keys don't match and how the
return is used internally by Berkeley DB. The docs say that the
comparison function:
"must return an integer value less than, equal to, or greater than
zero if the first key argument is considered to be respectively less
than, equal to, or greater than the second key argument."
There is also the requirement that the comparison function leads to
the entries being well-ordered.

What are the implications on these requirements of returning 0,
indicating that a match has been found, when only some parts of the
key have been compared?

Is the database affected in any way when a non-zero value is returned
by the comparison function when a match is not found? How important
is the value of the non-zero return for non-matches?

Is there a better way of doing this kind of search than the one I'm
proposing?

Thanks a lot for any thoughts and suggestions.

Mark


--
Mark Stevens
Concept Systems Ltd
Reply With Quote
  #2  
Old 09-01-2003, 11:29 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Mark Stevens wrote:

> I would like to be able to perform searches on the database of the
> kind: "Find all the database entries where key field B==value_to_match
> regardless of what is in the other fields". In order to achieve this
> I would like to pass to my btree comparison function a structure
> initialised with:


> struct my_key search_key;


> search_key.A = IGNORE_ME;
> search_key.B = value_to_match;
> search_key.C = IGNORE_ME;


I don't think you can do that; or if you can, it's a bad idea.

The problem is that the comparison function is used not only for
searches, but also for *constructing* the BTree --- it defines the
sort order of entries in the database. This sort order is exploited
when running searches, too. Having an essentially unsortable key
field value like your "IGNORE_ME" violates this structure.

You should rather store your data as data in a RECNO or other
database, and set up a secondary index database for every field of
struct my_key.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply With Quote
  #3  
Old 09-02-2003, 04:35 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Hans-Bernhard Broeker wrote in message news:...
> Mark Stevens wrote:
>
> > I would like to be able to perform searches on the database of the
> > kind: "Find all the database entries where key field B==value_to_match
> > regardless of what is in the other fields". In order to achieve this
> > I would like to pass to my btree comparison function a structure
> > initialised with:

>
> > struct my_key search_key;

>
> > search_key.A = IGNORE_ME;
> > search_key.B = value_to_match;
> > search_key.C = IGNORE_ME;

>
> I don't think you can do that; or if you can, it's a bad idea.
>
> The problem is that the comparison function is used not only for
> searches, but also for *constructing* the BTree --- it defines the
> sort order of entries in the database. This sort order is exploited
> when running searches, too. Having an essentially unsortable key
> field value like your "IGNORE_ME" violates this structure.
>


I understand that the comparison function will be used for
constructing the BTree. It would certainly only ever be the case that
complete structures would be inserted into the database (i.e. with no
fields set to IGNORE_ME). In this case the comparison function would
compare every field in the structure and should work OK, so long as it
conforms to the rules for comparison functions.

What I am suggesting is to use incomplete structures (i.e. with some
fields marked as IGNORE_ME) for *searching* the database. In other
words, my database would have previously been populated with
structures where A, B and C are all real values, and at some time
later I want to find all the entries which have field B==some_value,
regardless of what is in A and C.


> You should rather store your data as data in a RECNO or other
> database, and set up a secondary index database for every field of
> struct my_key.


I'm a bit reluctant to do this since the structures that I'm dealing
with are going to have at least 10 fields (possibly many more). Is
there an easy way to set up large numbers of secondary indices without
doing everything by hand?


Thanks,
Mark
Reply With Quote
  #4  
Old 09-02-2003, 07:49 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Mark Stevens wrote:

> What I am suggesting is to use incomplete structures (i.e. with some
> fields marked as IGNORE_ME) for *searching* the database.


I did understand that the first time round. But that can't work.

You can't have the comparison function sort the actual keys like this

(A1,B1) < (A1,B2) < (A2,B1) < (A2,B2) < (A3,B3)

and at the same time have it return both

(*,B1) == (A1,B1)
and
(*,B1) == (A2,B1)

because these two results would also, by the requirement of
transitivity of the comparison function, mandate

(A1,B1) == (A2,B1)

which violates the original sorting order shown above, since that has

(A1,B1) < (A2,B1)

Your comparison function would no longer set-up a well-ordering of
your keys.

> I'm a bit reluctant to do this since the structures that I'm dealing
> with are going to have at least 10 fields (possibly many more). Is
> there an easy way to set up large numbers of secondary indices
> without doing everything by hand?


Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
such indices is one of the reasons the elders invented relational
database management systems...
--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply With Quote
  #5  
Old 09-02-2003, 12:53 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Hans-Bernhard Broeker wrote in message news:...
Hans-Bernhard Broeker wrote in message news:...

> Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
> such indices is one of the reasons the elders invented relational
> database management systems...


Berkeley DB can automatically maintain large numbers of secondary
indices, it's not a problem. You have to create a secondary index
function for each one, and call the Db.associate method to associate
the two databases, but that's not a lot of work. For more information,
see the "Secondary indices" section of the Berkeley DB Reference Guide,
included in your download package and also available at:

http://www.sleepycat.com/docs/ref/am/second.html

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
Reply With Quote
  #6  
Old 09-02-2003, 04:27 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Hans-Bernhard Broeker wrote in message news:...
> > Mark Stevens wrote:
> > I'm a bit reluctant to do this since the structures that I'm dealing
> > with are going to have at least 10 fields (possibly many more). Is
> > there an easy way to set up large numbers of secondary indices
> > without doing everything by hand?

>
> Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
> such indices is one of the reasons the elders invented relational
> database management systems...



Or maybe you should try Berkeley DB XML, which adds automatic indexing
and a query layer to Berkeley DB. See the Sleepycat website for
details and download.

The obvious downside is that db-xml works only with XML documents, and
not everone is happy to accomodate the verbosity of XML records and
parsing overhead etc. However I recently suggested that db-xml allow
the application to install toXML and fromXML hooks when dealing with
physical records being passed to/from the database. These hooks would
map your own records to an equivalent XML document (often trivial),
and (if necessary -- I'm not sure that it is yet), convert an XML
document into one of your own records.

I think this would allow many classes of legacy application to keep
their data in the current packed-binary form, and still make use of
db-xml for indexing and queries. Obviously some parsing overhead
remains, but the application should not have to change very much,
which is the main thing.

The db-xml developers are considering this request. Apparently db-xml
2.0 architecture may cause problems with this, but it could be done
fairly easily to the existing (released) v1.0.

If you are interested then I'd recommend posting to xml@sleepycat.com
and support@sleepycat.com to add weight to the request.

Regards,

Chris
Reply With Quote
  #7  
Old 09-02-2003, 04:27 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

Hans-Bernhard Broeker wrote in message news:...
> > Mark Stevens wrote:
> > I'm a bit reluctant to do this since the structures that I'm dealing
> > with are going to have at least 10 fields (possibly many more). Is
> > there an easy way to set up large numbers of secondary indices
> > without doing everything by hand?

>
> Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
> such indices is one of the reasons the elders invented relational
> database management systems...



Or maybe you should try Berkeley DB XML, which adds automatic indexing
and a query layer to Berkeley DB. See the Sleepycat website for
details and download.

The obvious downside is that db-xml works only with XML documents, and
not everone is happy to accomodate the verbosity of XML records and
parsing overhead etc. However I recently suggested that db-xml allow
the application to install toXML and fromXML hooks when dealing with
physical records being passed to/from the database. These hooks would
map your own records to an equivalent XML document (often trivial),
and (if necessary -- I'm not sure that it is yet), convert an XML
document into one of your own records.

I think this would allow many classes of legacy application to keep
their data in the current packed-binary form, and still make use of
db-xml for indexing and queries. Obviously some parsing overhead
remains, but the application should not have to change very much,
which is the main thing.

The db-xml developers are considering this request. Apparently db-xml
2.0 architecture may cause problems with this, but it could be done
fairly easily to the existing (released) v1.0.

If you are interested then I'd recommend posting to xml@sleepycat.com
and support@sleepycat.com to add weight to the request.

Regards,

Chris
Reply With Quote
  #8  
Old 09-03-2003, 05:14 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

bostic@sleepycat.com (Keith Bostic) wrote in message news:...
>
> Berkeley DB can automatically maintain large numbers of secondary
> indices, it's not a problem. You have to create a secondary index
> function for each one, and call the Db.associate method to associate
> the two databases, but that's not a lot of work.


That looks like it might do the job, provided that it will work over
RPC.

My understanding from the RPC docs is that the primary and secondary
databases must be created locally on the server. I take this to mean
that the application which writes to the primary and secondary
databases cannot do so via RPC - it must be running on the machine
where berkeley_db_svc runs. From the docs it appears that any client
which needs to search the database using secondary indices may do so
via RPC so long as it opens the primary and secondary databases as
DB_RDONLY.

Is all this correct?

Many thanks for all the help and feedback that has been provided.

Mark
Reply With Quote
  #9  
Old 09-03-2003, 04:10 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: BTree comparison function

mas78910@hotmail.com (Mark Stevens) wrote in message news:...
> bostic@sleepycat.com (Keith Bostic) wrote in message news:...


> My understanding from the RPC docs is that the primary and secondary
> databases must be created locally on the server. I take this to mean
> that the application which writes to the primary and secondary
> databases cannot do so via RPC - it must be running on the machine
> where berkeley_db_svc runs. From the docs it appears that any client
> which needs to search the database using secondary indices may do so
> via RPC so long as it opens the primary and secondary databases as
> DB_RDONLY.
>
> Is all this correct?


Not quite. Because Berkeley DB uses function callbacks to
implement secondary indices, it's not possible to create a
secondary index from an RPC client -- there's not function
callback available.

However, if the server creates the necessary secondary index
databases and relationships, they can then be read and written
from the RPC client, as it's the server's callback function
that will be used.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
Reply With Quote
Reply

Thread Tools
Display Modes



All times are GMT -4. The time now is 12:19 PM.