Unified key-value interface

Currently there are 5 (or more?) ways of handling key-value pairs in SWI-Prolog:

They all provide roughly the same set of functionality, but with different performance/storage costs. Their interfaces are inconsistent, e.g.

  • member(K-V, List), memberchk(K-V, List), ord_memberchk(K-V, List).
  • get_assoc(Key, Assoc, Value), gen_assoc(Key, Assoc, Value)
  • rb_lookup(Key, Value, Rbtree), rb_in(Key, Value, Rbtree)
  • get_dict(Key, Dict, Value)

I propose creating a new library(kv_pairs) that provides a unified set of lookup/update/delete predicates.

I have two reasons for doing this:

  1. It’s not obvious which data structure provides best performance, so it’s nice to be able to change algorithms by changing only the data structure creation calls.
  2. Leverage the existing test cases for all the algorithms.

If this new library seems like a good idea, I’ll write up a proposal for the unified predicates.

3 Likes

Indeed, I’d love interchanging assoc with rbtrees to compare.

Right now i have a wrapper for assoc, and my plan is to adapt rbtrees to the wrapper – although, the application (semantic) of the wrapper is somewhat different to both, so its a kind of third implementation.

If they all can be consistently put under one API structure + semantic that would be great.

One could then even envision a switch during runtime as well

If I recall correctly, YAP decided to rewrite library(assoc) as a thin interface to library(rbtrees), so the two disappear. Maybe we should do the same. I have been hesitating about this for a long time. Library(assoc) feels so “good old” and so many people contributed to it.

Before doing so we should make sure we gain performance or at least only use marginally. One of the issues is that get_assoc/3 uses a C helper when present, whereas rb_lookup/3 does not. Maybe the helper works there too. Maybe it needs a little tweaking.

That would remove one. Surely the runtime behavior of library(assoc) and library(rbtrees) is very similar, so we only have three. I personally never (almost?) use library(pairs) as a key-value set where you deal with individual keys. More as a set mapping on which to do things like ordering or grouping.

Still, a common interface is probably a good idea. It is not so easy to see how though. Do you consider a runtime switch, e.g.

:- module(kv_pairs, ...)

kv_get(Key, Dict, Value), is_dict(Dict) => get_dict(Key, Dict, Value).
kv_get(Key, RbTree, Value), RbTree = t(_,_) => rb_lookup(Key, Value, RbTree).
...

Or some other way? A module where you can decide which implementation it loads? A set of macros that select an implementation?

This is surely a good idea to get a set of tests you an share and running the same tests on the different implementations is a good way to avoid buggy tests.

I envisage something similar to the following, so that individual kv-trees can have different implementations:

kv_empty_assoc(kv_assoc(Assoc)) :- empty_assoc(Assoc).

kv_empty_rbtree(kv_rbtree(Rbtree)) :- rb_empty(Rbtree).

kv_empty_dict(kv_dict(Dict), Tag) :- Dict = Tag{}.

kv_empty_list(kv_list(List) :- List = [].

and similar for creating from a list.

Then lookup would be something like:

kv_get(kv_assoc(Assoc), Key, Value) => get_assoc(Key, Value, Assoc).
kv_get(kv_rbtree(Rbtree), Key, Value) => rb_lookup(Key, Value, Rbtree).
kv_get(kv_dict(Dict), Key, Value) => dict_get(Key, Dict, Value).
kv_get(kv_list(List), Key, Value) => memberchk(Key-Value, List).

and so on for the other predicates in the unified interface.

For now, I’d prefer to leave assoc and rbtrees as-is. I suspect that they have different performance with things like creating in key order or random order.

I doubt it. Both are balancing binary trees. As I understand it, the rbtree balancing is a bit cheaper/more relaxed than AVL trees used by library assoc. Most likely it is only get_dict/3 that relies on low level C scanning that makes a difference.

I thought I read somewhere that rbtrees is faster at updates/inserts and assoc is faster at lookups. But I might be mistaken. Also, assoc has unnecessary checks for ground, so that might slow it down. (Many years ago, I implemented a simple database using B±trees (lots of fun making sure no data was lost if the machine crashed at just the wrong moment) – as I recall, insertion in key order resulted in a lot of node rotations and it was faster to insert in random order … whether this also applies to AVL or R-B trees, I don’t know and my algorithms book isn’t handy.)

For debugging, dicts are nicer; it’s nice to be able to start do initial development with a dict and later switch to rbtrees (or assoc) by changing a single line of code. So, I’d like to keep the one level of indirection, to allow both dicts and rb-trees … and extending this to assoc is trivial.

Perhaps library(option) should also be added as a choice?

Also, library(assoc)'s get_assoc/3 can probably be sped up a bit by removing the call to must_be(assoc,Assoc), using SSU. However, this will change the behavior when a non-assoc is given to it - the documentation says that a type_error(assoc,...) is thrown but this would become an existence_error(matching_rule,...). I presume that such incompatible changes are acceptable?

I doubt it. Switching between library(options) and one of the others is unlikely.

I can live with that. In my view there are errors that are causes by the programmer and that merely require an error that helps the programmer. The matching rule error has to become familiar with users. That will take some time. As a means to see what is wrong it is typically ok and we could even provide rules that map specific failing rule matches to more appropriate errors.

On the other hand you have errors causes by the environment, such as open/3 telling you the file does not exist. These must throw well defined errors as the programmer may want to handle these.

Just noticed Splay Trees.

These seem in particular useful in a Prolog world, where backtracking can cause prior lookups to occur again.

Dan

“Unified key-value interface” sounds interesting also for me, because
the state model of zdd library of mine has both assoc and key-value hash table as main components. I noticed the memo/2 is very similar to table/1, but
according to W. Jan, shift/reduce is difficult to be integrated into the table/1.
As my zdd uses shift/reduce not in serious sense but for its ease to use, the zdd use memo explicitly. I want to revise in the future version to hide ‘memo’ in an elegant way. Fortunately, memo seems about twice time fast thanks to term_hash compared to table/1 for Fibonacci series things.

As I am satisfied with having two types key-value structures separately, hash and assoc, I have no idea about the unified key-value interface. But I am looking forward to seeing nice concept to be proposed.

In addition, the zdd state recently was revised so that it can be nested
for making garbage_collect easy to free unused objects, though I am not sure it works as I expected. However my ZDD has become drastically robust for the “rectangle grid graph problem” in a way more than I expected. Still I don’t know why deleting the key-hash table works so well. Rather I suspect I am still missing very fundamental property of the problem.

% ?- open_state(S), set_key(a, 1, S), get_key(a, V, S), 
%	memo(m-2, S), memo(m-M, S).
%@ S = ..,
%@ V = 1,
%@ M = 2.
% ?- open_state(S), memo(a-1, S), memo(a-V, S).
%@ S = ..,
%@ V = 1.
% ?- open_state(S), memo(a-V, S), memo(a-1, S).
%@ S = ..,
%@ V = 1.
% ?- open_state(S), memo(a-1, S), memo(a-V, S), set_memo(a-2, S), memo(a-U, S).
%@ S = ..,
%@ V = 1,
%@ U = 2.
% ?- open_state(S), memo(a-1, S), memochk(a-X, S).
%@ S = ..,
%@ X = 1.
% ?- open_state(S), memo(a-1, S), memoq(a-X, S).	% false
%@ false.
% ?- open_state(S), memo(a-X, S), memoq(a-1, S).	% false
%@ false.