Experience using MacOS Monterey in the M1 (aka Apple silicon)

The Apple M1 based machines are getting popular and for a reason. I’ve got a new MacBook Air, M1, 2020 running Monterey. This should allow to provide M1 binaries in due time. I’ve built SWI-Prolog (current GIT version) using XCode 13, using Macports to provide the dependencies. See here. The build works fine (using the current GIT sources), except for the Qt based console because the console is based on Qt5 which only exists for Intel on the Mac while Qt6 supports both Intel and the M1, but is SWI-Prolog doesn’t build using it.

Here are some comparisons to my dev machine running Ubuntu 21.10 on AMD3950X (16 cores, 32 threads) I get these figures (times in seconds):

Action M1 AMD3950X
Build time 26 19
Benchmark avg 0.648 0.678

I also tried building with GCC-11 from Macports (port install gcc11). The build does something wrong with GCC’s runtime library. You get a working build using

CC=gcc-mp-11 CXX=g++-mp-11 cmake -DCMAKE_BUILD_TYPE=PGO -G Ninja ..
cp /opt/local/lib/libgcc/libgcc_s.1.1.dylib src
ninja

You get some suspicious warnings and a version that doesn’t pass ctest. It does run the above benchmark in 0.617 seconds compared to 0.648 sec for Clang 13. That is pretty marginal compared to the almost double speed we get from GCC when comparing Clang and GCC on Mac Intel. It is rumored that gcc-12 will provide better support for the M1.

Finally I did some stress testing on concurrency by running chat80 N times concurrently over M threads. Times are elapsed times in seconds.

M N M1 AMD3950X
1 1,000 5.599 6.416
2 1,000 5.790 6.346
4 1,000 6.005 6.447
8 1,000 8.840 6.553
16 1,000 17.579 6.692
32 1,000 42.723 9.753
16 10,000 193.825 67.625

We see the M1 slowing down between 4 and 8. AFAIK it has 4 fast and 4 slower cores which may explain that. The last two rows hint at CPU throttling to avoid overheating. It is still impressive and completely silent as it has no fan! For the desktop we see a slowdown between 16 and 32 due to hyperthreading. Otherwise it scales nearly perfectly.

Almost cut&paste of the three lines onto command line worked, though I understand little what they were doing. Thanks. Now I am Looking forward to coming release of homebrew gcc-12.

Note that while all tests succeed using the Clang compiled version, a number break when using gcc. Don’t trust this version. The issues could be caused by illegal C code or my a bug in the current version of GCC for the M1. I haven’t looked into that and I’m not planning to do so shortly.

I could not reproduce compiling swi-prolog for M1 Mac with gcc-11.
now I think, I must have seen an illusion.

This time I see messages on missing a library seemingly related to natural language processing, and also am required to use sudo for installing. Anyway using this occasion, I would like to learn how to make by gcc-11 instead of Xcode’s SDK as far as I could.

Finally, I have managed to recover from the illusion.

% swipl --version
SWI-Prolog version 8.5.3 for arm64-darwin

One of recoveries is
cp /opt/homebrew/opt/gcc/lib/gcc/11/libgcc_s.1.1.dylib src

I found this by info from this after just try and errors without any wisdom of prologers.

% gcc-11 -v

Using built-in specs.

COLLECT_GCC=gcc-11

COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc/11.2.0_3/bin/../libexec/gcc/aarch64-apple-darwin21/11/lto-wrapper

Target: aarch64-apple-darwin21

Configured with: ../configure --prefix=/opt/homebrew/opt/gcc --libdir=/opt/homebrew/opt/gcc/lib/gcc/11 --disable-nls --enable-checking=release --with-gcc-major-version-only --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-11 --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 11.2.0_3' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --build=aarch64-apple-darwin21 --with-system-zlib --with-native-system-header-dir=/usr/include --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk

Thread model: posix

Supported LTO compression algorithms: zlib zstd

gcc version 11.2.0 (Homebrew GCC 11.2.0_3)

EDIT.
I have compared times of swipls made with gcc-11 and clang on my M1 macbook pro. The test result: clang swipl is a little bit faster than gcc-11 one. I doubt this piece of information is useful for the expert like you, but just in case. Of course, I have no idea of comment on this result.

Clang

% ?- time(test(rect(7, 7), C)).
%@ % 412,054,447 inferences, 22.903 CPU in 22.954 seconds (100% CPU, 17991570 Lips)
%@ C = 789360053252.

% ?- time(test(rect(7, 7), C)).
%@ % 412,054,578 inferences, 24.623 CPU in 24.677 seconds (100% CPU, 16734847 Lips)

% ?- time(test(rect(7, 7), C)).
%@ C = 789360053252.
%@ % 412,054,447 inferences, 23.614 CPU in 23.670 seconds (100% CPU, 17449909 Lips)
%@ C = 789360053252.

gcc-11

% ?- time(test(rect(7, 7), C)).
%@ % 412,054,578 inferences, 26.118 CPU in 26.246 seconds (100% CPU, 15776842 Lips)
%@ C = 789360053252.

% ?- time(test(rect(7, 7), C)).
%@ % 412,054,447 inferences, 24.308 CPU in 24.428 seconds (100% CPU, 16951342 Lips)
%@ C = 789360053252.

% ?- time(test(rect(7, 7), C)).
%@ % 412,054,578 inferences, 26.043 CPU in 26.170 seconds (100% CPU, 15822058 Lips)
%@ C = 789360053252.

I noticed that cmake/BuildType.cmake only uses -O2 … wouldn’t '-O3` (and perhaps some other optimization settings) be better?

And if it’s clang, are any optimization flags set?

BTW, I tried -Wextra -Wuninitialized -Wmaybe-uninitialized and found a few potential bugs (I’ve only fixed the code I’m currently working on). These flags don’t work as well in “debug” mode because they require some level of optimization to do flow analysis.

I don’t know. For a long time anything higher than -O2 miscompiled the code due to strict aliasing rules. I think that is ok now. For gcc, -O2 and -O3 made no significant difference when I tested it last time while -O3 compiles slower and produces larger binaries. That was some years ago though. The effect of these options depend on the compiler version and CPU. It might be wise to reinvestigate this for at least some popular combinations.

CMake’s default options. Checking build.ninja, this seems to be -O2.

I’m always in favor of cleaner code. There is one exception: I do not want to initialize a variable with a dummy value at the moment of declaration to avoid possibly uninstantiated warnings. If there are multiple paths where the variable is instantiated and used it is much better when the compiler figures out that all paths are covered. A dummy initialization avoids the warning, but prevents the useful analysis. New compilers come with new warnings. Sometimes there it really finds a bug. It required quite a few nasty changes to silence gcc-11 while no bug was fixed :frowning: There have been more productive times …