Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: use hash calculation for determining archive staleness #805

Open
wants to merge 12 commits into
base: master
from

Conversation

@myitcv
Copy link
Member

@myitcv myitcv commented Apr 23, 2018

Work towards #804 (will require a follow up to zero modification times)

Fixes #440 and likely other stale cache related issues.

@myitcv myitcv changed the title [WIP] compiler: use hash calculation for determining staleness [WIP] compiler: use hash calculation for determining archive staleness Apr 23, 2018
@myitcv myitcv changed the title [WIP] compiler: use hash calculation for determining archive staleness compiler: use hash calculation for determining archive staleness Apr 23, 2018
@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 23, 2018

@shurcooL @hajimehoshi this is ready for discussion and review.

@hajimehoshi
Copy link
Member

@hajimehoshi hajimehoshi commented Apr 23, 2018

Basically what this PR does is to remove SrcModTime?

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 23, 2018

Basically what this PR does is to remove SrcModTime?

Currently staleness of an archive is determined using mod time of the transitive set of input files, the gopherjs binary and the archive itself.

This PR switches from using modtimes (which can give false positives and incorrect answers) to using a hashing strategy similar to the build cache in Go 1.10 (although nowhere near as advanced - we still maintain just a single archive).

It also has the advantage of allowing us to introduce other aspects into the staleness calculation, such as build tags which I have also done and this fixes #440.

This PR also likely fixes a number of other staleness/caching issues that randomly get seen/reported.

It also paves the way for fixing #804 by allowing us to zero the modification times.

}

binHash := sha256.New()
io.Copy(binHash, binFile)

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Check error

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

Thanks - yes, I'll check these copies because they are reads from a file.

}

binHash := sha256.New()
io.Copy(binHash, binFile)
binFile.Close()

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Actually (*File).Close doesn't usually cause error (if error happens, this is a critical error), but checking error is not bad here?

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

I'd agree; if that fails we're in serious trouble. I think I'll leave this.

@@ -596,20 +597,28 @@ func (s *Session) BuildPackage(pkg *PackageData) (*compiler.Archive, error) {
return archive, nil
}

pkgHash := sha256.New()

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Could you comment the format that is passed to sha256?

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

Done.

}
defer objFile.Close()
return nil, err
}

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

defer objFIle.Close() here?

return nil, err
}
archive, err := compiler.ReadArchive(pkg.PkgObj, pkg.ImportPath, objFile, s.Types)
objFile.Close()

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Why is objFile closed here?

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

Because we end up opening a number of files as part of this hash calculation (recursive calls), I wanted to be clearer on when those open files will be closed.

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

OK so could you comment that since this is not an usual case? I think defer objFile.Close() is necessary anyway in case error happens.

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

I've just pushed up a commit that refactors to use defer; that way it's more consistent with other places, I agree.

tool.go Outdated
@@ -298,11 +298,13 @@ func main() {
run := cmdTest.Flags().String("run", "", "Run only those tests and examples matching the regular expression.")
short := cmdTest.Flags().Bool("short", false, "Tell long-running tests to shorten their run time.")
verbose := cmdTest.Flags().BoolP("verbose", "v", false, "Log all tests as they are run. Also print all text from Log and Logf calls even if the test succeeds.")
buildVerbose := cmdTest.Flags().BoolP("bv", "", false, "Use a verbose build context.")

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

Not essential to have this... but helps to verify when we get cache misses or not in running tests.

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

I was wondering why we need this besides the existing verbose.

This comment has been minimized.

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 23, 2018

Thanks for taking a look @hajimehoshi - addressed your feedback. PTAL.

// determine staleness. Set hashDebug to see this in action. The format is:
//
// ## sync
// gopherjs bin: 0x519d22c6ab65a950f5b6278e4d65cb75dbd3a7eb1cf16e976a40b9f1febc0446

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

What does gopherjs bin mean here?

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

This is a hash of the https://godoc.org/os#Executable that is self, i.e. the gopherjs binary.

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Hmm, there could be a better name but I don't have a good idea. At least, I feel like the keys should end with hash for consistency?

This comment has been minimized.

@myitcv

myitcv Apr 23, 2018
Author Member

Ok I've made it gopherjs binary hash to be explicit. I've also moved the calculation of this hash into a separate function because this need only be calculated once.

return nil, err
}
archive, err := compiler.ReadArchive(pkg.PkgObj, pkg.ImportPath, objFile, s.Types)
objFile.Close()

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

OK so could you comment that since this is not an usual case? I think defer objFile.Close() is necessary anyway in case error happens.

@myitcv myitcv force-pushed the myitcv:make_gopherjs_stateleness_hash_based branch from 1b4cb77 to f0ccf72 Apr 23, 2018
}

fmt.Fprintf(hw, "import: %v\n", importedPkgPath)
fmt.Fprintf(hw, " hash: %#x\n", importedArchive.Hash)
}

for _, name := range append(pkg.GoFiles, pkg.JSFiles...) {

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Is this assured that pkg.GoFiles and pkg.JSFiles is ordered in alphabetical order?

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

Yes it is, correct.

sort.Strings(orderedBuildTags)

fmt.Fprintf(hw, "build tags: %v\n", strings.Join(orderedBuildTags, ","))

for _, importedPkgPath := range pkg.Imports {

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Is this assured that pkg.Imports is ordered in alphabetical order?

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

Yes, this is also assured.

// For non-main and test packages we build up a hash that will help
// determine staleness. Set hashDebug to see this in action. The format is:
//
// ## sync

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 23, 2018
Member

Why do we need this?

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

This is actually a package import path for sync - it was an example. I've changed the example now to use placeholders instead of concrete examples, to hopefully make it clearer.

@myitcv myitcv force-pushed the myitcv:make_gopherjs_stateleness_hash_based branch from 3ed84d1 to b85c2e4 Apr 24, 2018
@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 24, 2018

@hajimehoshi - I've tweaked a couple of things in my latest commit which also addresses the points you made. I've also moved from gopherjs binary hash to compiler binary hash in case something other than gopherjs is importing this build package. PTAL.

func init() {
// We do this here because it will only fail in truly bad situations, i.e.
// machine running out of resources. We also panic if there is a problem
// because it's unlikely anything else will be useful/work

This comment has been minimized.

@flimzy

flimzy Apr 24, 2018
Member

Rather than panicking here, would it be useful to fallback to a cache-disabled state?

Maybe that would allow running the gopherjs compiler in unusual environments (such as within a browser?)? (That may already be impossible for other reasons, idk).

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

That's a fair point. I think I'd like to understand that situation a bit better; because a whole load of logic in build depends on the OS being available (writing archives etc).

The compiler is run within the browser by the playground, but that does not import build, just compiler.

Copy link
Member

@hajimehoshi hajimehoshi left a comment

lgtm

@shurcooL, please take a look.

//
// file: <file path>
// <file contents>
// N bytes

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 24, 2018
Member

Hmm, I feel like this format is a little bit arbitrary. Can we use JSON or YAML or something formatted? I don't have a strong opinion though...

This comment has been minimized.

@flimzy

flimzy Apr 24, 2018
Member

A possible problem with JSON/YAML is that it's not inherently ordered. While we could find a way to ensure constant ordering, it wouldn't necessarily happen by default, which could lead to false negative cache hits.

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

The goal here is to create a hash as quickly as possible in order to determine whether we have a cache miss or not. So yes, whilst the format is arbitrary, it is simple. And requires no additional processing in order to write to the hash. Using JSON/YAML adds more overhead in the middle, overhead that is unnecessary because ultimately the output is a 256 bit value. Humans will only ever read this whilst debugging (which itself will be a rare occurrence) with hashDebug = true.

tool.go Outdated
@@ -298,11 +298,13 @@ func main() {
run := cmdTest.Flags().String("run", "", "Run only those tests and examples matching the regular expression.")
short := cmdTest.Flags().Bool("short", false, "Tell long-running tests to shorten their run time.")
verbose := cmdTest.Flags().BoolP("verbose", "v", false, "Log all tests as they are run. Also print all text from Log and Logf calls even if the test succeeds.")
buildVerbose := cmdTest.Flags().BoolP("bv", "", false, "Use a verbose build context.")

This comment has been minimized.


pkgHash := sha256.New()
var hw io.Writer = pkgHash
var hashDebugOut *bytes.Buffer

This comment has been minimized.

@hajimehoshi

hajimehoshi Apr 24, 2018
Member

This is an optional suggestion: Instead of bytes.Buffer, how about using text/template?

This comment has been minimized.

@myitcv

myitcv Apr 24, 2018
Author Member

I'm unclear, why/how would we use text/template here?

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 24, 2018

@hajimehoshi thanks for taking a look. I've responded to your questions and removed the --bv flag for gopherjs test.

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Apr 24, 2018

@shurcooL, please take a look.

Sorry, but I don't have the bandwidth to properly review this change soon (maybe the coming weekend, but maybe not until the one after).

I have the following concern and question: how can we gain enough confidence that this change will not introduce regressions in the build staleness calculations? It would be quite bad if the compiler starts to use old code and not rebuild in some situations, and there are many edge cases.

Also, this PR changes code, but I would first like to see a high level description of the algorithm used, and review that. Afterwards, it would be easier to review the code and make sure it implements the algorithm.

Thanks for working on this. I agree in general that using hash of content can be better than the modify time, as it avoids unnecessary rebuilds when the content hasn't changed but modtime has, but it needs to work in all cases that the previous approach handled. I see you also took on fixing #440 which is great, but that issue needs to be re-verified whether it still exists in latest master version. I hope we can get this PR to a state where we can merge it with high confidence.

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 28, 2018

Hi @shurcooL - very happy to add more detail. I pushed up the code to help with our discussion.

What follows is based on my understanding of how archive staleness is currently, i.e. in master, calculated. To state the obvious, this understanding might be wrong/incomplete.

Below, ModTime is always a reference to os.FileInfo.ModTime()

how can we gain enough confidence that this change will not introduce regressions in the build staleness calculations?

Currently, an archive for a package P is deemed stale if its ModTime (i.e. the ModTime of the archive on disk) is before either of:

  • ModTime of the gopherjs binary; more specifically, the running program that has imported github.com/gopherjs/gopher/build
  • The latest (i.e. most recent) ModTime of the transitive dependencies of P, including standard library packages
  • The latest (i.e. most recent) ModTime of the Go and JS files of P

Test programs and main packages are not archived.

Here ModTime is acting as a proxy for the contents of the files in question; whether that be the gopherjs binary, or the archives of the dependencies of P. It is therefore possible, and does happen, that the ModTime falls out of "sync" with the contents; i.e. the contents change without the ModTime changing or vice versa. It is therefore an imperfect proxy.

Far better is to follow an approach that uses the actual contents of those files to determine staleness. A hash-based approach does exactly this. The approach I've adopted here is similar to the build cache introduced in Go 1.10, but differs in one important respect: with the go tool, the computed hash effectively acts as a pointer to an area of the cache. With the approach adopted here, the computed hash is simply used to determine whether the archive we have on disk is stale or not. The approach taken by the go tool is far more advanced and robust, but requires a cache eviction policy (else your disk runs out of space); hence as a first cut I have gone for the "hash as a means of determining staleness" approach. A more complex approach could be adopted but doesn't seem particularly worthwhile at this stage.

I would first like to see a high level description of the algorithm used, and review that

The algorithm is therefore quite simple: we take all inputs that go into determining/building a package archive, hash them in a well defined order, compare that hash with the hash in the current archive. If the hashes are different then the archive on disk is stale, at which point we recompile it and rewrite it to disk.

This approach allows us to include arbitrary data in the input hash, including build tags.

So the inputs to the hash calculation are now:

We use crypto/sha256 just as the go tool does for all our hash calculations.

... but it needs to work in all cases that the previous approach handled

Because ModTime is an imperfect proxy for the file contents, there are many cases that don't get handled with the ModTime approach. And it's hard to know when we are going to be bitten by those edge cases. Using the hash-based approach of file contents we side step this entirely and solve all cases.

The one case that is neither handled by the hash-based nor the current ModTime approaches is that of race conditions on the disk on archive. That is, if two gopherjs processes are trying to determine staleness/write an archive at the same time; they effectively race. But given this is not currently handled we can simply note the exception and fix in a future PR if required.

I see you also took on fixing #440 which is great, but that issue needs to be re-verified whether it still exists in latest master version

Based on my understand this will still be broken on master because build tags are not currently part of the calculation for staleness, hence the bug. With this PR they are, hence staleness is correctly calculated.

I hope we can get this PR to a state where we can merge it with high confidence.

So to my mind we can have a high degree of confidence with this PR if we can be confident that we have the correct list of inputs; and I believe we do.

The best way to see this in action (in the absence of specific tests at this point) is to:

cd $(go list -f '{{.Dir}}' github.com/gopherjs/gopherjs) && \
git fetch origin pull/805/head && \
git checkout FETCH_HEAD && \
go install
gopherjs serve -v

Then try out something like the playground via http://localhost:8080/github.com/gopherjs/gopherjs.github.io/playground/. On the first load you should see the following output:

Cache miss for errors
Cache miss for internal/race
Cache miss for runtime
Cache miss for runtime/internal/sys
Cache miss for github.com/gopherjs/gopherjs/js
Cache miss for sync/atomic
Cache miss for sync
Cache miss for io
Cache miss for unicode
Cache miss for unicode/utf8
Cache miss for bytes
Cache miss for bufio
Cache miss for math
Cache miss for syscall
Cache miss for internal/poll
Cache miss for time
Cache miss for github.com/gopherjs/gopherjs/nosync
Cache miss for internal/testlog
Cache miss for os
Cache miss for strconv
Cache miss for reflect
Cache miss for fmt
Cache miss for sort
Cache miss for go/token
Cache miss for strings
Cache miss for path/filepath
Cache miss for go/scanner
Cache miss for go/ast
Cache miss for io/ioutil
Cache miss for go/parser
Cache miss for text/tabwriter
Cache miss for go/printer
Cache miss for go/format
Cache miss for golang.org/x/tools/go/ast/astutil
Cache miss for path
Cache miss for regexp/syntax
Cache miss for regexp
Cache miss for github.com/gopherjs/gopherjs.github.io/playground/internal/imports
Cache miss for encoding/binary
Cache miss for encoding
Cache miss for math/bits
Cache miss for encoding/gob
Cache miss for encoding/base64
Cache miss for unicode/utf16
Cache miss for encoding/json
Cache miss for container/heap
Cache miss for math/rand
Cache miss for math/big
Cache miss for go/constant
Cache miss for go/types
Cache miss for github.com/gopherjs/gopherjs/compiler/astutil
Cache miss for github.com/gopherjs/gopherjs/compiler/typesutil
Cache miss for github.com/gopherjs/gopherjs/compiler/analysis
Cache miss for github.com/gopherjs/gopherjs/compiler/filter
Cache miss for github.com/gopherjs/gopherjs/compiler/prelude
Cache miss for github.com/gopherjs/gopherjs/compiler/vendor/github.com/neelance/astrewrite
Cache miss for net/url
Cache miss for text/template/parse
Cache miss for text/template
Cache miss for go/doc
Cache miss for log
Cache miss for go/build
Cache miss for text/scanner
Cache miss for golang.org/x/tools/go/types/typeutil
Cache miss for github.com/gopherjs/gopherjs/compiler
Cache miss for github.com/neelance/go-angularjs
Cache miss for honnef.co/go/js/dom
Cache miss for honnef.co/go/js/util
Cache miss for honnef.co/go/js/xhr
Cache miss for github.com/gopherjs/gopherjs.github.io/playground

If you empty cache and hard reload the page you should see:

Cache miss for github.com/gopherjs/gopherjs.github.io/playground

And that is because main packages and test programs are not cached.

Indeed if you kill and restart gopherjs serve -v and force reload the page you will see the same output; because none of the inputs have changed with respect to the hash calculation.

If you then kill gopherjs serve -v and instead gopherjs serve --tags blah -v, force reload the page you will see all the cache misses again.

Would very much welcome feedback/questions on the above and this PR.

Thanks.

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 28, 2018

I've also just added a basic test to show this in action and working.

myitcv added 11 commits Apr 23, 2018
@myitcv myitcv force-pushed the myitcv:make_gopherjs_stateleness_hash_based branch from 796778c to 1ed4e6a May 4, 2018
@myitcv
Copy link
Member Author

@myitcv myitcv commented May 5, 2018

Got hit by another staleness bug again this afternoon. Still unclear how the situation with mod times getting out of sync comes about, but the hash-based approach has been 100% good since I started using it.

@antong
Copy link

@antong antong commented May 29, 2018

Fixes #827 .

@antong
Copy link

@antong antong commented Sep 3, 2018

I've been using this for a while now without problems. Would be really cool to have this merged as the current mod staleness calculation can't be trusted (#827).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.