Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge MJIT infrastructure with conservative JIT compiler #1782

Closed
wants to merge 39 commits into from

Conversation

@k0kubun
Copy link
Member

@k0kubun k0kubun commented Dec 25, 2017

https://bugs.ruby-lang.org/issues/14235

Background

In Feature#12589, Vladimir Makarov proposed to improve VM performance by replacing VM instructions
to RTL and introduce method JIT compiler based on those instructions.

While his approach for JIT (write C code to local file system, let C compiler executable
to compile it to shared object file and load it dynamically) was great and proven to work,
replacing all VM instructions may change the behavior of all Ruby programs and we can't turn off
such changes once it's released, even if we find a problem. So it's a little risky unlike optional JIT enablement.

Then I developed a JIT compiler called YARV-MJIT, which does not require any VM instruction changes.
After it, I heard Vladimir started to work on another approach to compile from current YARV instructions and
use RTL as IR for JIT compilation, but it's not published yet as far as I know.

Problems

  • We're developing the same JIT infrastructure independently, which can be shared for both implementations
    • it's definitely a waste of time, unlike seeking different optimization approaches
  • If we continue to develop JIT in a big feature branch,
    • affected places will be big too, and thus it'll be a dangerous release
    • all of us will continue to waste our time by day-to-day conflict resolution against trunk
    • many valuable commit logs will be lost when we maintain the branch for rebase or squash commits on merge

Solution

  • Proposal: Merge MJIT infrastructure from Vladimir's patch with a conservative JIT compiler in early 2.6 development.
    • MJIT infrastructure means: JIT worker thread, profiler, gcc/clang compiler support, loading function from shared object file, some hooks to ensure JIT does not cause SEGV, etc...

What's the "conservative JIT compiler"?

  • Based on my YARV-MJIT, but this drops some problematic optimizations and is slower
  • Pass make test, make test-all, make test-spec with and without JIT https://travis-ci.org/ruby/ruby/builds/321589821
  • Unlike MJIT on RTL, we can play optcarrot (not just for benchmark, but on GUI) and run Rails application stably (not tested on production yet though)

Notes

  • As YARV-MJIT implementation improved MJIT infrastructure too, pthread was already ported to Windows native threads and it can be compiled with Visual Studio.
    • That's exactly why we should develop this in cooperation
  • Visual Sudio is not supported as C compiler for JIT compilation. I did some experiments and had some ideas to support cl.exe, but I didn't want to add extra complexity to initial merge.
    • But it's perfectly working on MinGW and this will be available on Windows if a user uses RubyInstaller2.

Optcarrot

Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores

Note: result was changed when it's commited. See ed935aa.

2.0.0 2.5.0 JIT off JIT on
optcarrot fps 35.05 46.75 46.05 63.06
vs 2.0.0 1.00x 1.33x 1.31x 1.80x
  • 2.0.0: Ruby 2.0.0-p0
  • 2.5.0: Ruby 2.5.0 (r61468)
  • JIT off: Patched Ruby (based on r61475), JIT disabled
  • JIT on: Patched Ruby (based on r61475), JIT enabled w/ gcc 5.4.0

Disclaimer: This JIT compiler performs better with gcc compared to clang for now, so it may be slow on macOS (clang).

Micro benchmarks

I used Vladimir's benchmark set which I modified for my convenience https://github.com/benchmark-driver/mjit-benchmarks.

Note: result was changed when it's commited. See ed935aa.

2.0.0-p0 2.5.0 JIT off JIT on
aread 1.00 1.01 0.97 2.33
aref 1.00 0.96 0.96 3.01
aset 1.00 1.39 1.37 3.70
awrite 1.00 1.07 1.03 2.54
call 1.00 1.25 1.22 3.39
const 1.00 0.96 0.96 4.00
const2 1.00 0.96 0.96 3.97
fannk 1.00 0.98 1.02 1.00
fib 1.00 1.16 1.24 3.19
ivread 1.00 0.94 0.93 4.96
ivwrite 1.00 1.09 1.09 3.32
mandelbrot 1.00 0.98 0.98 1.27
meteor 1.00 3.02 2.85 3.16
nbody 1.00 1.02 0.99 1.47
nest-ntimes 1.00 1.05 1.01 1.31
nest-while 1.00 0.96 0.96 1.63
norm 1.00 1.06 1.07 1.26
nsvb 1.00 0.98 0.88 0.88
red-black 1.00 1.03 1.02 1.54
sieve 1.00 1.22 1.22 1.75
trees 1.00 1.07 1.08 1.32
while 1.00 0.96 0.96 5.13

Interface details

If the proposal is accepted, I'm going to add following CLI options:

  -j, --jit       use MJIT with default options
  -j:option, --jit:option
                  use MJIT with an option

MJIT options:
  c, cc           C compiler to generate native code (gcc, clang)
  s, save-temps   Save MJIT temporary files in $TMP or /tmp
  w, warnings     Enable printing MJIT warnings
  d, debug        Enable MJIT debugging (very slow)
  v=num, verbose=num
                  Print MJIT logs of level num or less to stderr
  n=num, num-cache=num
                  Maximum number of JIT codes in a cache

Note that -j:l/--jit:llvm are changed to -j:c/--jit:cc so that we can support cl.exe (Visual Studio) in the future.

Also, for testing, I would like to have following module and method.

MJIT.enabled? #=> true / false

See the commit log for details.

mjit.h Outdated

extern int mjit_compile(FILE *f, const struct rb_iseq_constant_body *body, const char *funcname);
extern void mjit_init(struct mjit_options *opts);
extern void mjit_finish();

This comment has been minimized.

@nobu

nobu Dec 26, 2017
Member

Some functions are declared/defined in old K&R style.

This comment has been minimized.

@k0kubun

k0kubun Dec 26, 2017
Author Member

Unified the style in e22a8e2


if (pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0) {
fprintf(stderr, "Cannot enable cancelation in MJIT worker\n");
}

This comment has been minimized.

@nurse

nurse Dec 27, 2017
Member

It's nice to set thread name:

#ifdef SET_CURRENT_THREAD_NAME
    SET_CURRENT_THREAD_NAME("ruby-mjitworker");
#endif

Note that the max size of the name is 16 byte including NUL byte of the string.
A natural name of the thread "ruby-mjit-worker" is 16 byte; it exceeds 1 byte.

This comment has been minimized.

@k0kubun

k0kubun Dec 28, 2017
Author Member

Thank you to catch, fixed ddb2ff7

static unsigned long __stdcall
mjit_worker(void *arg)
{
void (*worker_func)(void) = arg;

This comment has been minimized.

@nurse

nurse Dec 27, 2017
Member

Add following:

rb_w32_set_thread_description(GetCurrentThread(), L"ruby-mjitworker");

SetThreadDescription doesn't have the max size limit but I think the name should equal to Unix.

This comment has been minimized.

@k0kubun

k0kubun Dec 28, 2017
Author Member

fixed at ddb2ff7

@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch from 674c4a6 to 21677b8 Dec 29, 2017
ruby.c Outdated
@@ -272,6 +272,7 @@ usage(const char *name, int help)
M("s", ", save-temps", "Save MJIT temporary files in $TMP or /tmp"),
M("w", ", warnings", "Enable printing MJIT warnings"),
M("d", ", debug", "Enable MJIT debugging (very slow)"),
M("a=num", ", aot=num", "Ahead of Time Compileation after num calls"),

This comment has been minimized.

@rurban

rurban Jan 3, 2018

typo: Compileation

This comment has been minimized.

@k0kubun

k0kubun Jan 5, 2018
Author Member

Thank you a87c8ef

@radarek
Copy link

@radarek radarek commented Jan 4, 2018

Where I should report issues related to mjit? I did some performance tests and I would like to share my results and observations with you.

@Fudoshiki
Copy link

@Fudoshiki Fudoshiki commented Jan 5, 2018

@k0kubun
Copy link
Member Author

@k0kubun k0kubun commented Jan 5, 2018

Thank you for your comments. Final results should be written to https://bugs.ruby-lang.org/issues/14235 for easy tracking, but at this moment here or https://github.com/k0kubun/yarv-mjit/issues is better.

That's because the result may be changed before merge. To remove all bugs, I'm going to add some guards and the benchmark score will be a little worse after that.

@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch 2 times, most recently from 813a526 to db6e3e4 Jan 6, 2018
@k0kubun
Copy link
Member Author

@k0kubun k0kubun commented Jan 7, 2018

I added some guards to let all tests pass with AOT compilation, and removed some optimizations to omit SEGV by race condition (which should be fixed and added later).

Then unfortunately performance is made worse as follows 😞, but now it's probably ready to merge.

Optcarrot

JIT (before fix) JIT (after fix) 2.6.0 2.0.0
optcarrot fps 60.96 52.72 45.50 34.74
vs 2.0.0 1.75x 1.51x 1.31x 1.00x

Micro benchmarks (revised)

Used https://github.com/benchmark-driver/mjit-benchmarks

2.0.0-p0 2.5.0 JIT off JIT on
aread 1.00 1.07 1.04 2.25
aref 1.00 0.96 0.96 2.44
aset 1.00 1.37 1.34 3.13
awrite 1.00 1.05 1.05 2.22
call 1.00 1.25 1.22 2.41
const 1.00 0.96 0.98 2.31
const2 1.00 0.97 0.98 2.31
fannk 1.00 0.98 0.99 0.98
fib 1.00 1.27 1.23 2.01
ivread 1.00 0.94 0.94 3.14
ivwrite 1.00 1.09 1.08 2.79
mandelbrot 1.00 0.93 0.95 1.18
meteor 1.00 2.97 2.95 3.19
nbody 1.00 1.03 1.00 1.43
nest-ntimes 1.00 1.05 1.08 1.26
nest-while 1.00 0.95 0.95 1.55
norm 1.00 1.06 1.02 1.18
nsvb 1.00 1.07 1.07 1.07
red-black 1.00 1.03 0.97 1.01
sieve 1.00 1.22 1.25 1.59
trees 1.00 1.17 1.18 1.29
while 1.00 0.96 0.96 3.23

Discourse v1.8.7

Without JIT

Your Results: (note for timings- percentile is first, duration is second in millisecs)
---
categories_admin:
  50: 12
  75: 13
  90: 15
  99: 23
home_admin:
  50: 12
  75: 12
  90: 12
  99: 19
topic_admin:
  50: 12
  75: 12
  90: 13
  99: 19
categories:
  50: 16
  75: 17
  90: 18
  99: 25
home:
  50: 3
  75: 3
  90: 4
  99: 9
topic:
  50: 10
  75: 10
  90: 11
  99: 19
timings:
  load_rails: 1229
ruby-version: 2.6.0-p-1
rss_kb: 201528
pss_kb: 133763
kernelversion: 4.4.0
virtual: physical
physicalprocessorcount: 1
architecture: amd64
operatingsystem: Ubuntu
processor0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
memorysize: 15.37 GB
rss_kb_19439: 297812
pss_kb_19439: 228245
rss_kb_19517: 297072
pss_kb_19517: 227262
rss_kb_19622: 297448
pss_kb_19622: 227615

With JIT (-j:n=10000)

With JIT, somehow things become worse 😢. I haven't researched the reason yet, but probably speculation of JIT is not met and JIT execution is frequently canceled... We should improve this.

Your Results: (note for timings- percentile is first, duration is second in millisecs)
---
categories_admin:
  50: 13
  75: 15
  90: 18
  99: 27
home_admin:
  50: 13
  75: 14
  90: 16
  99: 22
topic_admin:
  50: 13
  75: 14
  90: 16
  99: 21
categories:
  50: 18
  75: 20
  90: 24
  99: 28
home:
  50: 4
  75: 4
  90: 5
  99: 9
topic:
  50: 11
  75: 12
  90: 13
  99: 19
timings:
  load_rails: 1215
ruby-version: 2.6.0-p-1
rss_kb: 218600
pss_kb: 151622
kernelversion: 4.4.0
virtual: physical
physicalprocessorcount: 1
architecture: amd64
operatingsystem: Ubuntu
processor0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
memorysize: 15.37 GB
rss_kb_2030: 303228
pss_kb_2030: 233047
rss_kb_2397: 305372
pss_kb_2397: 235033
rss_kb_2560: 303944
pss_kb_2560: 233322
@k0kubun k0kubun mentioned this pull request Jan 7, 2018
@k0kubun
Copy link
Member Author

@k0kubun k0kubun commented Jan 9, 2018

Current status

I got some ideas to improve above performance to original proposal's one.

Until Jan 24th (next Ruby developers meeting at Tokyo), I'll work on it, try to figure out why it becomes worse with Rails, and create a script that generates compile_insn from insns.def for maintainability.

If things go well, hepefully this (or part of this) will be merged on that day.

(edit) I'm building JIT compiler generator to decrease maintenance cost before merging.

@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch 3 times, most recently from da27ad5 to 276c0b0 Jan 9, 2018
@matzbot matzbot force-pushed the ruby:trunk branch 2 times, most recently from 2677ddd to ce7ad3a Jan 18, 2018
@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch 7 times, most recently from 8807822 to 9afa513 Jan 20, 2018
k0kubun added 5 commits Dec 25, 2017
which is created by transforming a preprocessed vm.c.

Makefile.in: generate MJIT header for UNIX environments

win32/Makefile.sub: generate MJIT header for Windows environments

tool/transform_mjit_header.rb: New. This script was originally written by
Vladimir N. Makarov <vmakarov@redhat.com>. Then I refactored a little,
fixed some bugs and ported it to work on Windows.

Also, as original minimize_mjit_header.rb takes too long time to run,
this is modified to skip minimization step because having *static*
unused definitions does not waste compilation time on -O2 as compiler
can skip to compile unused static functions.

This header installation does NOT include a header to automatically
export symbols used by MJIT. That's because original MJIT code was
failing to export symbols in the import header. But I would like to have
the functionality for maintainability in the future.
mjit.c is authored by Vladimir Makarov <vmakarov@redhat.com>.
After he invented great method JIT infrastructure for MRI as MJIT,
Lars Kanis <lars@greiz-reinsdorf.de> sent the patch to support MinGW
in MJIT. In addition to merging it, I ported pthread to Windows native
threads. Now this MJIT infrastructure can be compiled on Visual Studio.

This commit simplifies mjit.c to decrease code at initial merge. For
example, this commit does not provide AOT compiler and multiple JIT
threads support. We can resurrect them later if we really want them,
but I wanted to minimize diff to make it easier to review this patch.

mjit.h: New. It has `mjit_exec` interface similar to `vm_exec`, which is
for triggering MJIT. This drops interface for AOT compared to the original
MJIT.

Makefile.in: define macros to let MJIT know the path of MJIT header.
Probably we can refactor this to reduce the number of macros (TODO).
win32/Makefile.sub: ditto.

common.mk: compile mjit.o and mjit_compile.o. Unlike original MJIT, this
commit separates MJIT infrastructure and JIT compiler code as
independent object files. As initial patch is going to have not-ultra-fast
JIT compiler, it's likely to replace JIT compiler, e.g. original MJIT's
compiler or some future JIT impelementations which are not public now.
mjit_compile.c: added empty compiler so that you can reuse this commit
to build your own JIT compiler.

inits.c: define MJIT module. This is added because `MJIT.enabled?` was
necessary for testing.
test/lib/zombie_hunter.rb: skip if `MJIT.enabled?`. Obviously this
wouldn't work with current code when JIT is enabled.
test/ruby/test_io.rb: skip this too. This would make no sense with MJIT.

ruby.c: define MJIT CLI options. As major difference from original MJIT,
"-j:l"/"--jit:llvm" are renamed to "-j:c"/"--jit:cc" because I want to
support not only gcc/clang but also cl.exe (Visual Studio) in the future.
It takes only "-j:c=gcc", "-j:c=clang" for now. Original "-j:c" is
renamed to "-j:n" as it conflicts. Also this initializes MJIT thread and
variables.
eval.c: finalize MJIT worker thread and variables.
test/ruby/test_rubyoptions.rb: fix number of CLI options for -j addition.

thread_pthread.c: change for pthread abstraction in MJIT. Prefix rb_ for
functions which are used by other files.
thread_win32.c: ditto, for Windows.  Those pthread porting is one of
major works that YARV-MJIT created, which is my fork of MJIT. We should
share those improvements.
thread.c: follow rb_ prefix changes

vm.c: trigger MJIT on VM invocation. Also trigger `mjit_mark` to avoid
SEGV by race between JIT and GC of ISeq. The improvement was provided by
wanabe <s.wanabe@gmail.com>.
vm_insnhelper.h: trigger MJIT on method calls during VM execution.

vm_core.h: add fields required for mjit.c. `bp` must be `cfp[6]` because
rb_control_frame_struct is likely to be casted to another struct. The
last position is the safest place to add the new field.
vm_insnhelper.c: save initial value of cfp->ep as cfp->bp. This is an
optimization which are done in both MJIT and YARV-MJIT. So this change
is added in this commit. Calculating bp from ep is a little heavy work,
so bp is kind of cache for it.

iseq.c: notify ISeq GC to MJIT. We should know which iseq in MJIT queue
is GCed to avoid SEGV.

gc.c: add hooks so that MJIT can wait GC, and vice versa. Simultaneous
JIT and GC executions may cause SEGV and so we should synchronize them.

cont.c: save continuation information in MJIT worker. As MJIT shouldn't
unload JIT-ed code which is being used, MJIT wants to know full list of
saved execution contexts for continuation and detect ISeqs in use.
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.

This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.

This commit strips off some aggressive optimizations even from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5.

Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, AND even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.

I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.

common.mk: update dependencies for mjit_compile.c.

internal.h: declare `rb_vm_insn_addr2insn` for MJIT.

vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.

win32/mkexports.rb: export thread/ec functions, which are used by MJIT.

include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.

array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.

I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.

I'll add unit tests of JIT compilations once `MJIT.compile` feature
(synchronous JIT compilation of ISeq) is accepted.
@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch from 22c7511 to fe02108 Feb 3, 2018
@k0kubun k0kubun force-pushed the k0kubun:merge-mjit branch from fe02108 to 4e19426 Feb 3, 2018
@k0kubun
Copy link
Member Author

@k0kubun k0kubun commented Feb 4, 2018

While current trunk's performance is worse than my original benchmark and I have some uncommitted ideas to fix it, most of this work is merged in 0af44e7, fd44a57 and ed935aa.

There are still a lot of things to be done, but it's time to close this PR.

@k0kubun k0kubun closed this Feb 4, 2018
@k0kubun k0kubun deleted the k0kubun:merge-mjit branch Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.