BTree comparison function - berkeley-db
This is a discussion on BTree comparison function - berkeley-db ; Hi, I'd be really grateful for any advice that you can give me regarding user-defined btree comparison functions. It might help to have a concrete example of what I'm trying to do, so for the sake of argument let's say ...
![]() |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| I'd be really grateful for any advice that you can give me regarding user-defined btree comparison functions. It might help to have a concrete example of what I'm trying to do, so for the sake of argument let's say that my database contains keys of the form: struct my_key { a_type_t A; b_type_t B; c_type_t C; }; Perhaps this is a bad decision, but I have set up the database so that the associated data are 0 bytes long. I took this decision because: 1) all the information I need is contained logically in the key, 2) a structure of this form arrives all nicely prepared for me, and 3) I want to be able to search the DB based on any field in the key, as described below. I would like to be able to perform searches on the database of the kind: "Find all the database entries where key field B==value_to_match regardless of what is in the other fields". In order to achieve this I would like to pass to my btree comparison function a structure initialised with: struct my_key search_key; search_key.A = IGNORE_ME; search_key.B = value_to_match; search_key.C = IGNORE_ME; The comparison function that I had imagined I'd write would skip fields A and C which contain the special value IGNORE_ME and would return 0 (indicating a match) for each database entry which has B==value_to_match in the key. The thing that I'm unsure about is the return value from the comparison function for cases where the keys don't match and how the return is used internally by Berkeley DB. The docs say that the comparison function: "must return an integer value less than, equal to, or greater than zero if the first key argument is considered to be respectively less than, equal to, or greater than the second key argument." There is also the requirement that the comparison function leads to the entries being well-ordered. What are the implications on these requirements of returning 0, indicating that a match has been found, when only some parts of the key have been compared? Is the database affected in any way when a non-zero value is returned by the comparison function when a match is not found? How important is the value of the non-zero return for non-matches? Is there a better way of doing this kind of search than the one I'm proposing? Thanks a lot for any thoughts and suggestions. Mark -- Mark Stevens Concept Systems Ltd |
|
#2
| |||
| |||
|
Mark Stevens > I would like to be able to perform searches on the database of the > kind: "Find all the database entries where key field B==value_to_match > regardless of what is in the other fields". In order to achieve this > I would like to pass to my btree comparison function a structure > initialised with: > struct my_key search_key; > search_key.A = IGNORE_ME; > search_key.B = value_to_match; > search_key.C = IGNORE_ME; I don't think you can do that; or if you can, it's a bad idea. The problem is that the comparison function is used not only for searches, but also for *constructing* the BTree --- it defines the sort order of entries in the database. This sort order is exploited when running searches, too. Having an essentially unsortable key field value like your "IGNORE_ME" violates this structure. You should rather store your data as data in a RECNO or other database, and set up a secondary index database for every field of struct my_key. -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain. |
|
#3
| |||
| |||
|
Hans-Bernhard Broeker > Mark Stevens > > > I would like to be able to perform searches on the database of the > > kind: "Find all the database entries where key field B==value_to_match > > regardless of what is in the other fields". In order to achieve this > > I would like to pass to my btree comparison function a structure > > initialised with: > > > struct my_key search_key; > > > search_key.A = IGNORE_ME; > > search_key.B = value_to_match; > > search_key.C = IGNORE_ME; > > I don't think you can do that; or if you can, it's a bad idea. > > The problem is that the comparison function is used not only for > searches, but also for *constructing* the BTree --- it defines the > sort order of entries in the database. This sort order is exploited > when running searches, too. Having an essentially unsortable key > field value like your "IGNORE_ME" violates this structure. > I understand that the comparison function will be used for constructing the BTree. It would certainly only ever be the case that complete structures would be inserted into the database (i.e. with no fields set to IGNORE_ME). In this case the comparison function would compare every field in the structure and should work OK, so long as it conforms to the rules for comparison functions. What I am suggesting is to use incomplete structures (i.e. with some fields marked as IGNORE_ME) for *searching* the database. In other words, my database would have previously been populated with structures where A, B and C are all real values, and at some time later I want to find all the entries which have field B==some_value, regardless of what is in A and C. > You should rather store your data as data in a RECNO or other > database, and set up a secondary index database for every field of > struct my_key. I'm a bit reluctant to do this since the structures that I'm dealing with are going to have at least 10 fields (possibly many more). Is there an easy way to set up large numbers of secondary indices without doing everything by hand? Thanks, Mark |
|
#4
| |||
| |||
|
Mark Stevens > What I am suggesting is to use incomplete structures (i.e. with some > fields marked as IGNORE_ME) for *searching* the database. I did understand that the first time round. But that can't work. You can't have the comparison function sort the actual keys like this (A1,B1) < (A1,B2) < (A2,B1) < (A2,B2) < (A3,B3) and at the same time have it return both (*,B1) == (A1,B1) and (*,B1) == (A2,B1) because these two results would also, by the requirement of transitivity of the comparison function, mandate (A1,B1) == (A2,B1) which violates the original sorting order shown above, since that has (A1,B1) < (A2,B1) Your comparison function would no longer set-up a well-ordering of your keys. > I'm a bit reluctant to do this since the structures that I'm dealing > with are going to have at least 10 fields (possibly many more). Is > there an easy way to set up large numbers of secondary indices > without doing everything by hand? Well, maybe BerkeleyDB isn't for you, then. Automatic handling of such indices is one of the reasons the elders invented relational database management systems... -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain. |
|
#5
| |||
| |||
|
Hans-Bernhard Broeker Hans-Bernhard Broeker > Well, maybe BerkeleyDB isn't for you, then. Automatic handling of > such indices is one of the reasons the elders invented relational > database management systems... Berkeley DB can automatically maintain large numbers of secondary indices, it's not a problem. You have to create a secondary index function for each one, and call the Db.associate method to associate the two databases, but that's not a lot of work. For more information, see the "Secondary indices" section of the Berkeley DB Reference Guide, included in your download package and also available at: http://www.sleepycat.com/docs/ref/am/second.html Regards, --keith =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Keith Bostic bostic@sleepycat.com Sleepycat Software Inc. keithbosticim (ymsgid) 118 Tower Rd. +1-781-259-3139 Lincoln, MA 01773 http://www.sleepycat.com |
|
#6
| |||
| |||
|
Hans-Bernhard Broeker > > Mark Stevens > > I'm a bit reluctant to do this since the structures that I'm dealing > > with are going to have at least 10 fields (possibly many more). Is > > there an easy way to set up large numbers of secondary indices > > without doing everything by hand? > > Well, maybe BerkeleyDB isn't for you, then. Automatic handling of > such indices is one of the reasons the elders invented relational > database management systems... Or maybe you should try Berkeley DB XML, which adds automatic indexing and a query layer to Berkeley DB. See the Sleepycat website for details and download. The obvious downside is that db-xml works only with XML documents, and not everone is happy to accomodate the verbosity of XML records and parsing overhead etc. However I recently suggested that db-xml allow the application to install toXML and fromXML hooks when dealing with physical records being passed to/from the database. These hooks would map your own records to an equivalent XML document (often trivial), and (if necessary -- I'm not sure that it is yet), convert an XML document into one of your own records. I think this would allow many classes of legacy application to keep their data in the current packed-binary form, and still make use of db-xml for indexing and queries. Obviously some parsing overhead remains, but the application should not have to change very much, which is the main thing. The db-xml developers are considering this request. Apparently db-xml 2.0 architecture may cause problems with this, but it could be done fairly easily to the existing (released) v1.0. If you are interested then I'd recommend posting to xml@sleepycat.com and support@sleepycat.com to add weight to the request. Regards, Chris |
|
#7
| |||
| |||
|
Hans-Bernhard Broeker > > Mark Stevens > > I'm a bit reluctant to do this since the structures that I'm dealing > > with are going to have at least 10 fields (possibly many more). Is > > there an easy way to set up large numbers of secondary indices > > without doing everything by hand? > > Well, maybe BerkeleyDB isn't for you, then. Automatic handling of > such indices is one of the reasons the elders invented relational > database management systems... Or maybe you should try Berkeley DB XML, which adds automatic indexing and a query layer to Berkeley DB. See the Sleepycat website for details and download. The obvious downside is that db-xml works only with XML documents, and not everone is happy to accomodate the verbosity of XML records and parsing overhead etc. However I recently suggested that db-xml allow the application to install toXML and fromXML hooks when dealing with physical records being passed to/from the database. These hooks would map your own records to an equivalent XML document (often trivial), and (if necessary -- I'm not sure that it is yet), convert an XML document into one of your own records. I think this would allow many classes of legacy application to keep their data in the current packed-binary form, and still make use of db-xml for indexing and queries. Obviously some parsing overhead remains, but the application should not have to change very much, which is the main thing. The db-xml developers are considering this request. Apparently db-xml 2.0 architecture may cause problems with this, but it could be done fairly easily to the existing (released) v1.0. If you are interested then I'd recommend posting to xml@sleepycat.com and support@sleepycat.com to add weight to the request. Regards, Chris |
|
#8
| |||
| |||
| bostic@sleepycat.com (Keith Bostic) wrote in message news: > > Berkeley DB can automatically maintain large numbers of secondary > indices, it's not a problem. You have to create a secondary index > function for each one, and call the Db.associate method to associate > the two databases, but that's not a lot of work. That looks like it might do the job, provided that it will work over RPC. My understanding from the RPC docs is that the primary and secondary databases must be created locally on the server. I take this to mean that the application which writes to the primary and secondary databases cannot do so via RPC - it must be running on the machine where berkeley_db_svc runs. From the docs it appears that any client which needs to search the database using secondary indices may do so via RPC so long as it opens the primary and secondary databases as DB_RDONLY. Is all this correct? Many thanks for all the help and feedback that has been provided. Mark |
|
#9
| |||
| |||
| mas78910@hotmail.com (Mark Stevens) wrote in message news: > bostic@sleepycat.com (Keith Bostic) wrote in message news: > My understanding from the RPC docs is that the primary and secondary > databases must be created locally on the server. I take this to mean > that the application which writes to the primary and secondary > databases cannot do so via RPC - it must be running on the machine > where berkeley_db_svc runs. From the docs it appears that any client > which needs to search the database using secondary indices may do so > via RPC so long as it opens the primary and secondary databases as > DB_RDONLY. > > Is all this correct? Not quite. Because Berkeley DB uses function callbacks to implement secondary indices, it's not possible to create a secondary index from an RPC client -- there's not function callback available. However, if the server creates the necessary secondary index databases and relationships, they can then be read and written from the RPC client, as it's the server's callback function that will be used. Regards, --keith =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Keith Bostic bostic@sleepycat.com Sleepycat Software Inc. keithbosticim (ymsgid) 118 Tower Rd. +1-781-259-3139 Lincoln, MA 01773 http://www.sleepycat.com |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
All times are GMT -4. The time now is 12:19 PM.




Linear Mode