Library(yall) expression killing performance ... until you reload the code a second time

I wanted to test how “large dicts” (50’000 entries) behave, i.e. how fast lookup works on them.

The answer: Pretty fast (I still have to compare with library(assoc))!

Unless…

I hit a bit of a phenomenon.

(First things first, this is a “Intel® Xeon® CPU W3520 @ 2.67GHz” with 20-GiB of RAM, so there is space in the system)

The code (it’s quite big and I have dumped all the necessary predicates into this single file for easy running):

It is supposed to be run like this:

?- using_dict(50_000,500_000,builtin,dict,verbose).

which means

“build a dict with 50’000 entries (which will be the dict on which lookups will be performed) using the builtin dict_create/3, create a lookup sequence (just a list of valid keys) of length 500’000 using a dict (to look up keys randomly by index), and finally be verbose when doing the test, printing a line every 10’000 lookups”,

Then the following happens:

  • on the first run, lookup is quite slow, one reaches about 1100 dict lookup/s. So the test runs a few minutes.
  • you then consult the code again
  • after that the lookup is quite fast, one reaches about 800’000 lookup/s in the large dict. So the test runs in under a second.

I thought this had something to do with the memory growth (hence the printout of process information), or some code problem however, it apparently turns out that it has to do with the lambda expression in the test!

The above is observed if the code uses this implementation to run the 500’000 lookups on Dict by visiting each key in the list Sequence. Counters is just a list of monotonically increasing integers, used to provide a counting service for the predicate in maplist/4.

perform_lookup_and_store_timed_2(Dict,Quiet,Sequence,SequenceLength,Counters) :-
   maplist(
      ({Dict,Quiet,SequenceLength}/[Key,Value,C]
      >>
         (get_dict(Key,Dict,Value),
          emit(Quiet,SequenceLength,C)))
     ,Sequence,Result,Counters),
   length(Result,ResultLength),
   format("ResultLength is ~d~n",[ResultLength]).

If the code uses a replacement implementation that does not rely on a library(yall) expression one reaches top performance on first load (that code is commented out in the source for now):

perform_lookup_and_store_timed_2(Dict,Quiet,Sequence,SequenceLength,Counters) :-
   maplist(
      inside_maplist_b(Dict,Quiet,SequenceLength),
      Sequence,Result,Counters),
   length(Result,ResultLength),
   format("ResultLength is ~d~n",[ResultLength]).

inside_maplist_b(Dict,Quiet,SequenceLength,Key,Value,C) :-
   get_dict(Key,Dict,Value),
   emit(Quiet,SequenceLength,C).

I don’t know why though. It’s a mystery!

Took me some time to find it it too.

For my particular use case – many more inserts than lookups – library(rbtree) was considerably faster than dicts, and somewhat faster than library(assoc). Unfortunately, the interfaces aren’t consistent, so I wrote a thin layer on top of rbtree/assoc/dicts to allow easy benchmarking.

As to your performance question, it might be worth using profile/1 rather than time/1, to see where the extra time is being spent

1 Like

As you do not load library(yall) explicitly the first run ends up using the predicates from library(yall) rather than the compilation. If you look at the implemention you’ll see it performs term copying on the arguments. That gets rather costly with huge dicts. In general, if you use library yall, make sure you explicitly load it as well as the libraries for any meta predicate in which the lambdas are embedded. That way you should normally end up with the compiled yall which performs as good as what you would write by hand. If you want to be sure, use listing/1 to verify the code.

I think library(yall) is the only library that suffers this bad. maplist/2 and friends are also subject to compilation using library(apply_macros), but the impact is far less drastic. The implementation merely replaces a meta call using call/2-N with a direct call. Unless you call something pretty cheap inside the maplist the difference is barely measurable.

1 Like

I tried loading it with

:- use_module(library(yall)).

explicitly (with the instruction in the source file, or given on the toplevel).

But this doesn’t “fix” it.

After a first load, the compiler has done nothing with the yall expression yet:

?- [dictperftest].

?- listing(perform_lookup_and_store_timed_2/5).
perform_lookup_and_store_timed_2(Dict, Quiet, Sequence, SequenceLength, Counters) :-
    maplist({Dict, Quiet, SequenceLength}/[Key, Value, C]>>(get_dict(Key, Dict, Value), emit(Quiet, SequenceLength, C)),
            Sequence,
            Result,
            Counters),
    length(Result, ResultLength),
    format("ResultLength is ~d~n", [ResultLength]).

true.

Only after the second consultation of the same file do we see some compiler action:

?- [dictperftest].
true.

?- listing(perform_lookup_and_store_timed_2/5).
perform_lookup_and_store_timed_2(A, B, C, D, E) :-
    maplist('__aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a'(A,
                                                                  B,
                                                                  D),
            C,
            F,
            E),
    length(F, G),
    format("ResultLength is ~d~n", [G]).

true.

I must still be missing something?

P.S.

(Listing __aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a seems to bring up the whole yall library code, which is unexpected:

?- with_output_to(string(X),listing(__aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a/3)),string_length(X,L)
...
L = 787349.

770 KiB of text!

Solution: load both library(yall) and library(apply_macros)

I also have have this in the source:

:- use_module(library(apply_macros)).

Then one immediately obtains:

?- [dictperftest].
true.

?- listing(perform_lookup_and_store_timed_2/5).
perform_lookup_and_store_timed_2(Dict, Quiet, Sequence, SequenceLength, Counters) :-
    '__aux_maplist/4___aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a+3'(Sequence,
                                                                            Result,
                                                                            Counters,
                                                                            Dict,
                                                                            Quiet,
                                                                            SequenceLength),
    length(Result, ResultLength),
    format("ResultLength is ~d~n", [ResultLength]).

true.

with a fast execution.

And:

?- listing('__aux_maplist/4___aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a+3'/6).
'__aux_maplist/4___aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a+3'([], [], [], _, _, _).
'__aux_maplist/4___aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a+3'([A|B], [C|D], [E|F], G, H, I) :-
    '__aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a'(G,
                                                          H,
                                                          I,
                                                          A,
                                                          C,
                                                          E),
    '__aux_maplist/4___aux_yall_2a8ae290a22f3a975127a35c048883e898a6b77a+3'(B,
                                                                            D,
                                                                            F,
                                                                            G,
                                                                            H,
                                                                            I).

true.

Well, library(apply) instead of library(apply_macros) is enough. The compiler needs to know maplist/3 is a meta-predicate. Using library(apply_macros) also compiles the maplist/3 call, but that is far less important than compiling the yall expression.

1 Like

Thanks! There is no sign at the SWI-Prolog doc site that a Red-Black tree implementation exists.

One has to search for it in the search box :sweat: ?

Wel, it’s in here:

library index

but not

library section

I never encountered it.

Only libraries that appear in the main PDF docs are in the latter. I’m not sure I’m so happy with library(assoc), a good old lib and library(rbtrees). The latter comes from YAP. The former originates from good old DEC10 library and is around in most Prolog systems, though most have replaced the old non-balancing trees with an AVL tree. Red-Black trees seem to be slightly more efficient. We could take YAP’s route and make library(assoc) be a thin layer around library(rbtrees). I don’t know whether there are any disadvantages doing so.

2 Likes