Skip to content

HTTPS Imports #36430

bmeck started this conversation in Ideas
HTTPS Imports #36430
Dec 7, 2020 · 18 comments · 88 replies

bmeck
Dec 7, 2020
Collaborator

Github PR UI is not well suited to this level of conversation

Discussing PR #36328

Discussing implications of HTTPS imports as a whole

Replies

18 comments
·
88 replies

bmeck
Dec 9, 2020
Collaborator Author

Concern To Mitigate: IO Timeouts

2 replies
@jasnell

Closely related to this is a never-ending import (a malicious server sending a never-ending stream of data).

Likely need a max import size as well as timeouts.

@bmeck

bmeck Dec 11, 2020
Collaborator Author

@jasnell agreed, but there are multiple concepts of granularity here. Graph size vs individual source text size. I think we have to solve for both either by mitigating or scoping out. I'm thinking for this topic we are really talking about mitigations on the individual level, but the infinite recursion might be better for the strongly connected static graph mitigations. I think at least for the initial approach to size, byte limits might be enough as laid out in https://github.com/bmeck/node/tree/policy-max-body . Timeouts are more complex.

bmeck
Dec 9, 2020
Collaborator Author

Concern To Mitigate: Infinite Recursive Importing

5 replies
@targos

How can that happen?

@bmeck

bmeck Dec 10, 2020
Collaborator Author

A server could dynamically generate a payload that ~= import "import.meta.url + 1 ". This would prevent the ESM graph from ever evaluating any code and memory leak until the process dies.

@bmeck

bmeck Dec 11, 2020
Collaborator Author

It is possible to do this in CJS already, but with ESM this is a bit different. CJS can write files as they are loaded. ESM must load files after they are written using dynamic import. With the introduction of Loaders and/or 3rd party actors like servers that makes static import also able to cause infinite recursion.

I'm not entirely sure how we want to limit this, could limit the dep graph source size of an import or the total distance of the graph from entry or from the main entry point. When we get to that point if the problem was through static imports it won't be too much of an issue since we can GC the modules prior to linking (linking makes them live for the life of the global unfortunately [cantfix / upstream]). With dynamic imports, we couldn't reclaim that lost memory.

bmeck
Dec 9, 2020
Collaborator Author

Concern To Mitigate: Server Downtime

2 replies
@bmeck

bmeck Dec 21, 2020
Collaborator Author

We can somewhat do this already with URL redirects, see a policy with:

{
  "scopes": {
    "file:": {
       "integrity": true,
       "dependencies": {
           "https://example.com/foo": "file://cache/https/example.com/foo"
       }
    }
  }
}

That would not cover the HTTP redirect semantics, leak the wrong import.meta.url, and would not respect cache / updating. It does bring up some questions though about what the update semantics should be for a cache. Should it ever be automatic? I somewhat lean on "no it should always be a manual step to sync the cache".

@targos

I agree the cache update should be manual. I think we need a good CLI to work with it though.

bmeck
Dec 9, 2020
Collaborator Author

Concern To Mitigate: Alteration of Source Code

13 replies
@bmeck

bmeck Dec 10, 2020
Collaborator Author

  • We already have integrity checks via policies, but the use cases of HTTPS imports also can lean towards automatic updating which wouldn't work with integrity checks.

HTTPS has other means of granting trust such as certificate verification of various types.

  • Support for HPKP and/or Certificate Transparency would rely on a similar need for out of band persistence as policies but does allow for ensuring that the entity that provided the previous source text is the same as the current source text.
  • Root CAs are already configurable but not able to be configured solely for import purposes.
@mcollina

This is really important to provide safety against social engineering attacks, e.g. left-pad and event-stream.

@bmeck

bmeck Dec 11, 2020
Collaborator Author

@mcollina those attacks would already be mitigated by policies as they exist today. Is there a specific aspect of those that you are seeking to guard against and should it be outside of the scope of HTTPS itself?

I guess a good starting point would be to look at the npm cli because those problems are all also present there; they're just hidden behind the veneer of the cli.

I think we also need to clarify something from the discussion that happened over in the PR: when we say "HTTPS imports" do we mean that they are simply another way to reference and import modules into the node_modules, or are they implicitly dynamic imports?

2 replies
@bmeck

bmeck Dec 10, 2020
Collaborator Author

I'm not sure I understand. My understanding / expectation is it must be possible to do dynamic fetching of source texts over the network. I think trying to state it must be either always online or offline isn't really an easy topic since if we allow only one it invalidates some of the use cases of HTTPS imports.

@wperron

that's how Deno does it and it's what makes more sense to me as well, I just wanted to clarify because a few of the comments on the PR seemed to assume it was only dynamic imports.

bmeck
Dec 11, 2020
Collaborator Author

Concern to Mitigate: Cross Origin Access

E.G. https: to file:, https: to node:

5 replies
@bmeck

bmeck Dec 11, 2020
Collaborator Author

WHATWG already only allows only same domain from http/https schemes. I think the presumption is that https: could load node:, but I actually think that isn't really a priority as many/most modules don't actually need direct access to things like fs:. I do think in general that it is a security concern, but not a new attack vector. Without really strong reasons I do think requiring https: to have those kinds of node: modules dynamically injected as params or something seems fine to me.

@bmeck

bmeck Dec 11, 2020
Collaborator Author

node_modules lives on file: so it would be prevented by nature of being cross origin and not allowing https: to file:.

@wperron

Browsers have a very clear origin to compare against, how would that work from the server? For reference this is also an ongoing conversation in Deno denoland/deno#4981

HTTPS Agent support.

Given we are going to download many files, hopefully from the same domain it's critical for "first time" startup performance to keep the connections alive.

6 replies
@bmeck

bmeck Dec 14, 2020
Collaborator Author

I don't think we should use Agent, it is mutable and means that an application could manipulate it to intercept things it shouldn't

@mcollina

I think it should have an Agent which is kept private for esm.

@bmeck

bmeck Dec 14, 2020
Collaborator Author

we could, but it cannot reuse the implementation currently in core

Visible caching on disk.

I think we should cache what is downloaded in the project folder and recommend this to be committed to the repo itself. I propose we use a web_modules folder. This is also a mitigato to availability attacks.

14 replies
@ExE-Boss

Also, it might be a good idea to require a flag (or use an external utility) to enable remote fetching, which will be used to enable remote fetching and initialise the web_modules folder.

Without the flag, any https:// import will only consult the local web_modules folder.


At the very least, there should be a way to disable remote lookup and only consult the local web_modules folder.

@wperron

Visible caching on disk is a must, on that we agree, but why create yet another folder when reusing node_modules/ would work just fine?

@mcollina

mcollina Dec 14, 2020
Maintainer

Visible caching on disk is a must, on that we agree, but why create yet another folder when reusing node_modules/ would work just fine?

The recommendation for node_modules is to not commit that to the repo. web_modules should be committed to repo. Using two folders makes this simpler.

HTTP Proxy support for downloading modules.

A good way to mitigate many availability attack is to download the modules through an HTTP proxy. Currently Node.js does not have this functionality built-in.

2 replies
@bmeck

bmeck Dec 14, 2020
Collaborator Author

Currently policies allow rewriting URLs, would allowing wildcard URL rewriting like export templates be enough?

@bmeck

bmeck Dec 21, 2020
Collaborator Author

Also, can this be done as a follow on during experimentation?

bmeck
Feb 2, 2021
Collaborator Author

I've made https://github.com/bmeck/local-fs-https-imports to play around with the purely ahead of time idea before trying to merge such a feature into core itself.

1 reply
@wperron

I like the node_modules/.https/ - seems like a nice way to make the old and the new play together nicely

Concern To Mitigate: process

  • process.exit()
  • process.env

the hole process contains way to much information that can be leaked

1 reply
@bmeck

bmeck Mar 1, 2021
Collaborator Author

Unfortunately we can't put this in scope until we have some newer JS features; this likely has to wait for Realms or some such feature. We looked into policies and other mechanisms around limiting it for ESM in a few things in the past like: #26334. We need a way to introduce new global scopes w/o creating new intrinsics for things like Array which we don't have a way to do so currently. Users could replace globalThis.process with a getter that does a callsite check like in that and the other PR we attempted to get through, but the overall performance hit is not negligible. Other ways to limit it for now would be up to source transforms detecting usage of the globals and replacing it with a corresponding import/require.

bmeck
May 12, 2021
Collaborator Author

I've made a doodle to have a call to discuss how to move this forward, likely this call will try to recur every 2 months as various things are likely to be around phasing and when/how to address concerns in this discussion when implementing this and how to move towards a stable release.

https://doodle.com/poll/wsdihf3q9gqtyack

6 replies
@mcollina

Can we move it one week further? I'm off that week.

@bmeck

bmeck May 12, 2021
Collaborator Author

bumped, updated to new link

@mcollina

Filled

Concern: npm packages using HTTPS imports.

Posting this to link discussion from @ljharb in #36328 (comment) re implications for supporting in node_modules.

Personally I would be concerned about npm packages being published that use HTTPS imports as that very much changes the security model / contract that npm provides to users in being able to comprehensively replicate a module graph via only a node_modules filesystem model.

This likely won't be a problem initially as no one would publish a package to npm that isn't supported in older Node.js versions, but once support reaches all versions, some company might think it's a good way to automatically manage updates / ensure registered code delivery.

While that is already possible today via network based processes and even install hooks, it's a pattern that we have luckily so far avoided from happening too much and I hope we can continue to work towards ensuring strong guarantees around the registry and package contracts that ensure end user agency.

I haven't watched the video so apologies if this is repeating talking points already said, feel free to fill in further here.

17 replies
@bmeck

bmeck Jun 14, 2021
Collaborator Author

Per discussions above and video, network access would not be enabled by default when running node. During experimental it will be enabled automatically under the flag while offline cache is worked out since the flag is the opt-in.

@guybedford

Thanks, not all of us have time to watch videos, and well-maintained high-level summaries would help a lot in ensuring adequate onboarding for those familiarizing.

Is the flag a CLI flag with the intention to unflag at some point? Is this not something that can be integrated into policy?

@bmeck

bmeck Jun 14, 2021
Collaborator Author

Is the flag a CLI flag with the intention to unflag at some point? Is this not something that can be integrated into policy?

Various features were punted and behavior under the flag will change to accomodate them; the flag as it exists likely won't be unflagged until policy integration etc. is done but even with policy integration features like manipulating how redirects are expected to work for HTTP are not possible currently. We do have a POC at https://github.com/bmeck/local-fs-https-imports but it doesn't properly mimic various nuances of web based loading.

bmeck
Jun 14, 2021
Collaborator Author

meta concern to mitigate: accidental inclusion of transitive mutable state via node_modules.

A user should not have node_modules introduce transitive mutable state unless they opt-in to allow such behavior. This is important to ensure that node_modules can be seen as effectively immutable. While node_modules are not technically immutable today the general practice allows them to be treated as immutable; the introduction of network based transitive mutable state is actively more unstable and a concern around security and robustness that needs to be mitigated by other means as seen in this discussion.

0 replies

bmeck
Jun 14, 2021
Collaborator Author

Concern to mitigate: using other protocols to hop to local resources.

It is possible currently for data: to access node: and file: URLs (of which there is some usage in the wild). Since it is possible at least in this PR to load data: via http{s,}: we might want to ban loading these by default.

3 replies
@bmeck

bmeck Jun 15, 2021
Collaborator Author

I've banned everything except network dependencies in the PR for now.

@targos

targos Jun 15, 2021
Maintainer

Note that depending on how the main application is started (notably via a CommonJS entry point), it's still possible to access the node: scheme using globalThis.process.mainModule.require().

@bmeck

bmeck Jun 15, 2021
Collaborator Author

@targos yup, this is known and we tried to fix Buffer/process so they were not implicit globals in ESM but failed to do so due to a perf concern. So... ambient ways to mitigate it are left up to users and not covered by things like policies. A lot of mitigations are about limiting how things can be accessed so they can properly be mitigated, and unfortunately not mitigating them in core itself. Removing data: for example means that you have to grab them via globals, or be passed them somehow via dependency injection. It removes import from the problem areas.

bmeck
Jun 17, 2021
Collaborator Author

Concern to mitigate: preventing injection of arbitrary locations to import()

It should be possible to differentiate if a dependency is allowed to be loaded by dynamic import. Static import has hard coded strings and is less vulnerable to manipulation than import() that can take an arbitrary expression. Unfortunately due to JS spec we cannot prevent import() from resolving something that was already loaded by static import.

CC: @vdeturckheim per https://docs.google.com/document/d/1tS80LmzraV4yXEYIQ-fWaCm2MIpXfbtrSYjsI8kWKko/edit#heading=h.ifctpvwuev0a

7 replies
@bmeck

bmeck Jun 17, 2021
Collaborator Author

CC: @jasnell from same doc (woops)

@targos

How does the spec prevent us from intercepting the call before it hits the cache?

@bmeck

bmeck Jun 17, 2021
Collaborator Author

@targos it states that import() and import share the cache and deterministically share the result (this enforced by V8 source text module local cache, not the global cache which node controls). If one succeeds, the other must also succeed in perpetuity for the same source text and cache key (URL + import assertions these days [may grow]). So if you do import "fs"; dynamic import feeds through to FinishDynamicImport which feeds into the same mechanism as static imports (HostResolveImportedModule) which has the restriction: "Each time this operation is called with a specific referencingScriptOrModule, specifier pair as arguments it must return the same Module Record instance if it completes normally."

This comment has been minimized.

@bmeck

This comment has been minimized.

IMHO this feature should be provided by --loader plugin according to the requirements of the situation. See https://nodejs.org/dist/latest-v16.x/docs/api/esm.html#esm_https_loader .

Therefore, it is more important to work on loaders API stability which will allow to loading modules from anywhere (e.g. from tgz file etc.) and however (on the fly, with file cache etc.).

1 reply
@JakobJingleheimer

There's quite a lot of rationale behind supporting this in core. It is possible to do with a loader, but that would likely result in divergent behaviour, security vulnerabilities, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment