Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable arm64 optimizations that exist for power/x86 #3393

Merged
merged 3 commits into from Aug 13, 2020

Conversation

@AGSaidi
Copy link
Contributor

@AGSaidi AGSaidi commented Aug 6, 2020

Enable a set of optimizations that exist already for power and x86 for aarch64/arm64 systems.

Passes make check after these changes.

Running the string benchmarks the unaligned access change improves performance
by an average of 1.04x, min .96x, max 1.21x, median 1.01x

The gc optimization improves benchmark/gc/hash1 by 5%

The vm_exec changes make a massive difference on some benchmarks (e.g. 1.38x).

AGSaidi added 3 commits Aug 6, 2020
64-bit Arm platforms support unaligned accesses.

Running the string benchmarks this change improves performance
by an average of 1.04x, min .96x, max 1.21x, median 1.01x
Similar to x86 and powerpc optimizations.

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|hash1  |       0.225|     0.237|
|       |           -|     1.05x|
|hash2  |       0.110|     0.110|
|       |       1.00x|         -|
|                               |compare-ruby|built-ruby|
|:------------------------------|-----------:|---------:|
|vm_array                       |     26.501M|   27.959M|
|                               |           -|     1.06x|
|vm_attr_ivar                   |     21.606M|   31.429M|
|                               |           -|     1.45x|
|vm_attr_ivar_set               |     21.178M|   26.113M|
|                               |           -|     1.23x|
|vm_backtrace                   |       6.621|     6.668|
|                               |           -|     1.01x|
|vm_bigarray                    |     26.205M|   29.958M|
|                               |           -|     1.14x|
|vm_bighash                     |    504.155k|  479.306k|
|                               |       1.05x|         -|
|vm_block                       |     16.692M|   21.315M|
|                               |           -|     1.28x|
|block_handler_type_iseq        |       5.083|     7.004|
|                               |           -|     1.38x|
@nurse
nurse approved these changes Aug 6, 2020
#elif defined(__GNUC__) && defined(__aarch64__)
DECL_SC_REG(const VALUE *, pc, "19");
DECL_SC_REG(rb_control_frame_t *, cfp, "20");
#define USE_MACHINE_REGS 1

Comment on lines +99 to +103

This comment has been minimized.

@shyouhei

shyouhei Aug 6, 2020
Member

Does this really benefit? We know that recent compilers are smarter than they were when we wrote those sibling codes. Read more: https://bugs.ruby-lang.org/issues/12225

cc @nurse

This comment has been minimized.

@AGSaidi

AGSaidi Aug 6, 2020
Author Contributor

@shyouhei the only changes between compare-ruby and built-ruby in the number in the commit message above are the two hunks in vm_exec.c. I'm happy to run other benchmarks if you'd like, but it appears to improve substantially. Double checked my result again by removing all diffs and comparing to the ruby I built prior to my patches. The results were +-2% and then reapplied these two hunks and re-ran again, and observed the improvements here (up to 1.38x).

This comment has been minimized.

@nurse

nurse Aug 6, 2020
Member

As far as I remember, there're another example with clang which says it's still effective.
And the commit comment says 1.2x seems worth introducing this change.

This comment has been minimized.

@shyouhei

shyouhei Aug 7, 2020
Member

OK then, we need to investigate what is going on but this pull request can be a separate thing.

This comment has been minimized.

@AGSaidi

AGSaidi Aug 8, 2020
Author Contributor

@nurse anything else you'd like to see before you merge?

This comment has been minimized.

@nurse

nurse Aug 12, 2020
Member

I think this is OK to merge.
@shyouhei Do you have another topic?

This comment has been minimized.

@shyouhei

shyouhei Aug 13, 2020
Member

@nurse No, it is LTGM.

@nurse nurse merged commit 511b55b into ruby:master Aug 13, 2020
100 checks passed
100 checks passed
CodeQL-Build CodeQL-Build
Details
gcc-10
Details
make (check, --jit)
Details
make (check)
Details
check_branch
Details
make (check, ubuntu-20.04)
Details
make (test, windows-2019, 2019)
Details
make (check)
Details
gcc-9
Details
make (check, --jit-wait)
Details
make (check, ubuntu-20.04, -DRUBY_DEBUG)
Details
make (test-bundler-parallel)
Details
gcc-8
Details
make (check, ubuntu-18.04)
Details
make (test-bundled-gems)
Details
gcc-7
Details
make (check, ubuntu-18.04, -DRUBY_DEBUG)
Details
make (leaked-globals)
Details
gcc-6
Details
make (check, ubuntu-16.04)
Details
gcc-5
Details
make (test-bundler-parallel, ubuntu-20.04)
Details
gcc-4.8
Details
make (test-bundler-parallel, ubuntu-20.04, -DRUBY_DEBUG)
Details
clang-11
Details
make (test-bundler-parallel, ubuntu-18.04)
Details
clang-10
Details
make (test-bundler-parallel, ubuntu-18.04, -DRUBY_DEBUG)
Details
clang-9
Details
make (test-bundled-gems, ubuntu-20.04)
Details
clang-8
Details
make (test-bundled-gems, ubuntu-20.04, -DRUBY_DEBUG)
Details
clang-7
Details
make (test-bundled-gems, ubuntu-18.04)
Details
clang-6.0
Details
make (test-bundled-gems, ubuntu-18.04, -DRUBY_DEBUG)
Details
clang-5.0
Details
make (test-all TESTS=--repeat-count=2, ubuntu-20.04)
Details
clang-4.0
Details
make (test-all TESTS=--repeat-count=2, ubuntu-18.04)
Details
clang-3.9
Details
make (leaked-globals, ubuntu-20.04)
Details
c99
Details
make (leaked-globals, ubuntu-18.04)
Details
c11
Details
c17
Details
c2x
Details
c++98
Details
c++11
Details
c++14
Details
c++17
Details
c++2a
Details
-O0
Details
-O3
Details
gmp
Details
jemalloc
Details
valgrind
Details
coroutine=ucontext
Details
coroutine=copy
Details
disable-mathn
Details
disable-jit-support
Details
disable-dln
Details
disable-rubygems
Details
OPT_THREADED_CODE=1
Details
OPT_THREADED_CODE=2
Details
OPT_THREADED_CODE=3
Details
NDEBUG
Details
RUBY_DEBUG
Details
ARRAY_DEBUG
Details
BIGNUM_DEBUG
Details
CCAN_LIST_DEBUG
Details
CPDEBUG=-1
Details
ENC_DEBUG
Details
GC_DEBUG
Details
HASH_DEBUG
Details
ID_TABLE_DEBUG
Details
RGENGC_DEBUG=-1
Details
SYMBOL_DEBUG
Details
THREAD_DEBUG=-1
Details
RGENGC_CHECK_MODE
Details
TRANSIENT_HEAP_CHECK_MODE
Details
VM_CHECK_MODE
Details
USE_EMBED_CI=0
Details
USE_FLONUM=0
Details
USE_LAZY_LOAD
Details
USE_RINCGC=0
Details
USE_SYMBOL_GC=0
Details
USE_THREAD_CACHE=0
Details
USE_TRANSIENT_HEAP=0
Details
USE_RUBY_DEBUG_LOG=1
Details
DEBUG_FIND_TIME_NUMGUESS
Details
DEBUG_INTEGER_PACK
Details
ENABLE_PATH_CHECK
Details
GC_DEBUG_STRESS_TO_CLASS
Details
GC_ENABLE_LAZY_SWEEP=0
Details
GC_PROFILE_DETAIL_MEMOTY
Details
GC_PROFILE_MORE_DETAIL
Details
CALC_EXACT_MALLOC_SIZE
Details
MALLOC_ALLOCATED_SIZE_CHECK
Details
IBF_ISEQ_ENABLE_LOCAL_BUFFER
Details
@AGSaidi AGSaidi deleted the AGSaidi:arm64-unaligned branch Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.