Ha, yes - I’ve actually got an mmap implementation that spreads the stacks all over the address space, so they never have to shift at all, but like you said that only works quite so well in a 64-bit (well, 48-bit) address space.
Anyway, the reason I’m looking into it isn’t because the stack shifts themselves are eating much CPU time, but because there’s always a risk of stack shifting - and the need to save/restore the VM registers around every call out to other libswipl code is causing a lot of unnecessary register/memory churn—unnecessary precisely because stack shifts are so rare.
One way around this, on mmap’able systems at least, would be to go a Python-esque “forgiveness-before-permission” route and simply assume the values won’t change, have the GC unmap the old regions whenever there’s a stack shift, and handle the segfault gracefully by getting the new values and resuming execution. That requires signal handling, though, which is never particularly fun, and it doesn’t work on systems where we don’t have some level of control over the hardware memory map.
Alternately, we could go all-in on the register-pinning and pin all three of PC
, ARGP
, and FR
to hardware registers, at which point GC running on that thread could alter the pointer values with the VM being none the wiser. Of course, that’s three hardware registers—four, if you include LD
—that are now unusable for any other purpose in the whole of the libswipl codebase. And, for 32-bit x86 systems especially, I think that may in fact be all the available registers. (And, if I’m remembering my computer science courses correctly, not allowing the computer to use any registers could in some cases be a Bad Thing.)
Anyway yeah, that’s why I’m trying to figure out what else can be done with the Prolog stacks 