Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFormalize support for zstd compression: v1.1.0 ? #803
Comments
@jonjohnsonjr @vbatts @mikebrow @dmcgowan @SteveLasker ptal (not sure if this is the right location for this discussion, or if it should be discussed in the OCI call; I just noticed this, so thought I'd write it down |
I had similar issues interpreting "ignore". The In case of a call, I will do my best to join. |
I must admit I'm not the most proficient reader of specifications, but good to hear I'm not the only person that was a bit confused by it I guess "ignoring" will lead to an "error" in any case, because skipping "unknown media types" should likely lead to a failure to calculate the digest |
Thanks, @thaJeztah! I also felt some relief @tych0, could you elaborate a bit on your use case? I don't want to break you a second time |
I'm not sure (3) solves the underlying problem here. That defines a way for understanding the media type, but it doesn't necessarily mean that clients can handle all possible permutations of a media type. The main issue is that if clients start pushing up images with |
Sure, I'm putting squashfs files in OCI images instead of gzipped tarballs, so I can direct mount them instead of having to extract them first. The "MUST ignore" part of the standard lets me do this, because tools like skopeo happily copy around OCI images with underlying blob types they can't decode. If we suddenly change the standard to not allow unknown blob types in images and allow tools to reject them, use cases like this will no longer be possible. Indeed, the standard does not need to change for docker to generate valid OCI images with zstd compression. The hard work goes into the tooling on the other end, but presumably docker has already done that. It might be worth adding a few additional known blob types to the spec here: https://github.com/opencontainers/image-spec/blob/master/media-types.md#oci-image-media-types but otherwise I don't generally understand the goals of this thread. |
I think in case of Skopeo, Skopeo itself is not consuming the image, and is used as a tool to pull those images; I think that's more the "distribution spec" than the "image spec" ? I think a runtime that does not support a specific type of layer should be able to reject that layer, and not accept "any" media-type. What use would there be for a runtime to pull an image with (say) For such cases, I think it'd make more sense to reject the image (/layer). |
No; the distribution spec is for repos serving content over http. skopeo translates to/from OCI images according to the OCI images spec.
If someone asks you to run something you can't run, I agree an error is warranted. But in the case of skopeo, it is a tool that is perfectly capable of handling layers with mime types it doesn't understand, and I think similar tools should not error out either. |
Yeah, poor choice of words; was trying to put in words that Skopeo itself is not the end-consumer of the image (hope I'm making sense).
The confusion in the words picked in the specs is about "mime types it doesn't understand". What makes a tool compliant with the image-spec? Should it be able to parse the manifest, or also be able to process the layers? Is While I understand the advantage of having some flexibility, if the spec does not dictate anything there, how can I know if an image would work with some tool implementing image-spec "X" ? Currently it For Skopeo's case, even though the mediaType is "unknown to the implementation", Skopeo is able to "handle" / "process" the layer (within the scope it's designed for), so perhaps "unknown" should be changed to something else; e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type. |
That seems like a reasonable clarification to me! |
Regarding the ambiguity of the MUST clause. The intention of that sentence is to say that implementations should act as though the layer (or manifest) doesn't exist if it doesn't know how to do whatever the user has requested, and should use an alternative layer (or manifest) if possible. This is meant to avoid implementations just breaking and always giving you an error if some extension was added to an image which doesn't concern that implementation -- it must use an alternative if possible rather than giving a hard error. Otherwise any new media-types will cause endless problems. In the example of pulling image data, arguably the tool supports pulling image data regardless of the media-type so there isn't any issue of it being "unknown [what to do with the blob] to the implementation" -- but if the image pulling is being done in order for an eventual unpacking step then you could argue that it should try to pull an alternative if it doesn't support the image type. I agree this wording could be a bit clearer though, this change was done during the period of some of the more contentious changes to the image-spec in 2016. Given that the above was the original intention of the language, I don't think it would be a breaking change to better clarify its meaning.
This is being worked on by @SteveLasker. The idea was to first register just one media-type so we get an idea of how the process works, and then to effectively go and register the rest. |
Another issue with the current way of representing compression is that the ordering of multiple media type modifiers (such as compression or encryption) isn't really well-specified since MIME technically doesn't support such things. There was some discussion last year about writing a library for dealing with MIME types so that programs can easily handle such types, but I haven't seen much since then. |
Ack: please assume the other mediaTypes will be registered. I'm providing clarity in the Artifacts Spec to help with both these issues. Once the Artifacts spec is merged, with clarity on the registration process, I'll register the other types. For the compression, what I think we're saying is this: There are other tools, like skopeo, (I think) or ORAS which work on any artifact type pushed to a registry. In these cases, they need to know some conventions to be generic. But, in the case of ORAS, it intentionally doesn't know about a specific artifact type and simply provides auth, push, pull of layers associated with a manifest. It's the outer wrapper, like Helm or Singularity that provide specific details on layer processing. We have an open agenda for the 4/22 call to discuss. |
I see I forgot to reply to some of the comments
So, I was wondering about that: I can see this "work" for a multi-manifest(isn) image, in which case there could be multiple variations of an image (currently used for multi-arch), and I can use "one" of those, but I'm having trouble understanding how this works for a single image. What if an image has layers with mixed compression?
I think it's technically possible to have mixed compressions. For example, in a situation where an existing image is pulled (using, e.g. However, the "reverse" could also make a valid use-case, to create a "fat/hybrid" image, offering alternative compressions for systems that support it ("gzip" layers for older clients, "zstd" for newer clients that support it). Looks like this needs further refinement to describe how this should be handled.
Thanks! I recall seeing a discussion (on the mailing list?) about registering, but noticed "some" were registered, but others were not, so thought I'd check |
While reviewing moby/moby#40820, I noticed that support for zstd was merged in master (proposal: #787, implementation in #788 and #790), and some runtimes started implementing this;
However, the current (v1.0.1) image-spec does not yet list zstd as a supported compression, which means that not all runtimes may support these images, and the ones that do are relying on a non-finalized specification, which limits interoperability (something that I think this specification was created for in the first place).
I think the current status is not desirable; not only does it limit interoperability (as mentioned), it will also cause complications Golang projects using this specification as a dependency; go modules will default to the latest tagged release, and some distributions (thinking of Debian) are quite strict about the use of unreleased versions. Golang project that want to support zstd would either have to "force" go mod to use a non-released version of the specification, or work around the issue by using a custom implementation (similar to the approach that containerd took: containerd/containerd#3649).
In addition to the above, concerns were raised about the growing list of media-types (#791), and suggestions were made to make this list more flexible.
The Image Manifest Property Descriptions, currently describes:
Followed by:
This part is a bit ambiguous (perhaps that's just my interpretation of it though);
+zstd
layer mediatype is not in theMUST
list, is there any reason for including it in the list of OCI Media Types? After all, any media types not included in the list "could" be supported by an implementation, and must otherwise be ignored.What's the way forward with this?
v1.1.0
, only defining+zstd
as a possible compression format for layers, but no requirement for implementations of thev1.1.0
specification to support them+zstd
compression format to the list of required media types, and tagv1.1.0
; projects implementingv1.1.0
of the specification MUST support zstd layers, or otherwise implementv1.0.x
v1.1.0
v1.1.0
release (1.
or2.
), and leave3.
for a future (v1.2.0
) release of the specification.On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?