Using GSON to transform a nested JSON Stream

Question

Purpose: Using GSON take an input stream of a large JSON file and expose it downstream functions as an Iterator; with the added constraint that I physically can't store the entire JSON File in memory. Currently I do have this working using some basic Java code that does the following:

knows when to skip curly braces
read the stream until it finds the next valid JSON Object
parse that into a POJO using GSON

Desired Outcome See if GSON has the built in ability to replace my custom Java code.

SAMPLE INPUT DOCUMENT

{
    "header":
    {
        "header1":"value1",
        "header2":"value2",
        "header3":"value3"
    },
    "body":
    {
        "obj-1":
        {
            "id":"obj-1",
            "name":"obj-1-name",
            "description":"obj-1-description"
        },
        "obj-2":
        {
            "id":"obj-2",
            "name":"obj-2-name",
            "description":"obj-2-description"
        },
        "obj-3":
        {
            "id":"obj-3",
            "name":"obj-3-name",
            "description":"obj-3-description"
        },
        "obj-4":
        {
            "id":"obj-4",
            "name":"obj-4-name",
            "description":"obj-4-description"
        }
    }
}

SAMPLE OUTPUT DOCUMENT

{
    "header":
    {
        "header1":"value1",
        "header2":"value2",
        "header3":"value3"
    },  
    "object":
    {
        "id":"obj-1",
        "name":"obj-1-name",
        "description":"obj-1-description"
    }
}

POJO's have been created for the "header" object, the individual elements in the "body" JSON Object, and the output document.

Using the following as a stepping stone to initially solve the problem, https://howtodoinjava.com/gson/jsonreader-streaming-json-parser/, is it my understanding that since there is an transformation of the JSON Structure I would need to do that basic 3 step process; just translate it into GSON specific functions?

Should every entry in body be wrapped into a new JSON object which duplicates the header? For example, would your "Sample output document" look the same for obj-2? Just to be sure that there is no misunderstanding, you want to convert InputDocument → Iterator<OutputDocument>, right? — Marcono1234, Commented Nov 25, 2021 at 0:37

Marcono1234 · Accepted Answer · 2025-03-12 21:24:59Z

As mentioned in the linked tutorial, you should use JsonReader when you want to process JSON data in a streaming way. You can then use the respective TypeAdapter instances obtained from Gson for your POJO classes and use them to parse the header and the individual body objects.
You could also use the Gson.fromJson(JsonReader, ...) method instead of directly using the TypeAdapter instances; ~~however Gson unfortunately does not respect the leniency setting / always makes the reader lenient~~ edit: no longer the case, see JsonReader#setStrictness. Unless you explicitly need this, I would recommend against this and instead directly use the TypeAdapter instances because then the leniency setting of the JsonReader is respected.

Assuming you have the following POJO classes:

public class Header {
  public String header1;
  public String header2;
  public String header3;
}

public class BodyObject {
  public String id;
  public String name;
  public String description;
}

public class OutputDocument {
  public Header header;
  public BodyObject object;
}

Then you could create a method which creates a Stream<OutputDocument> like the following. Using a Stream here has the advantage that it's close method can be used to close the Reader providing the JSON data. However it can also be implemented in a similar way using an Iterator.

/**
 * Creates a {@link Stream} which transforms the data to {@link OutputDocument} elements.
 * 
 * <p><b>Important:</b> The provided reader will be closed by this method, or by the created
 * stream. It is therefore necessary to call {@link Stream#close()} (for example by using a
 * try-with-resources statement).
 * 
 * @param inputDocumentReader JSON data stream
 * @param gson Gson object used for looking up adapters
 * @return Stream of transformed elements
 */
public static Stream<OutputDocument> transform(Reader inputDocumentReader, Gson gson) throws IOException {
  JsonReader jsonReader = new JsonReader(inputDocumentReader);
  try {
    jsonReader.beginObject();
    String headerProperty = jsonReader.nextName();
    if (!headerProperty.equals("header")) {
      throw new IllegalArgumentException("Expected 'header' property at " + jsonReader.getPath());
    }

    // Read the Header
    TypeAdapter<Header> headerAdapter = gson.getAdapter(Header.class);
    Header header = headerAdapter.read(jsonReader);


    String bodyProperty = jsonReader.nextName();
    if (!bodyProperty.equals("body")) {
      throw new IllegalArgumentException("Expected 'body' property at " + jsonReader.getPath());
    }

    // Start reading body
    jsonReader.beginObject();
    TypeAdapter<BodyObject> bodyObjectAdapter = gson.getAdapter(BodyObject.class);

    long estimatedSize = Long.MAX_VALUE; // unknown size
    // Could also add `| NONNULL`, if there are no null body objects
    int characteristics = Spliterator.Spliterator.ORDERED;
    Spliterator<OutputDocument> spliterator = new AbstractSpliterator<>(estimatedSize, characteristics) {
      @Override
      public boolean tryAdvance(Consumer<? super OutputDocument> action) {
        try {
          // Check if end of 'body' object has been reached
          if (!jsonReader.hasNext()) {
            // End 'body'
            jsonReader.endObject();
            // End top level object
            jsonReader.endObject();

            if (jsonReader.peek() != JsonToken.END_DOCUMENT) {
              throw new IllegalStateException("Expected end of JSON document at " + jsonReader.getPath());
            }
            // Reached end
            return false;
          } else {
            // Skip entry name
            jsonReader.skipValue();

            BodyObject object = bodyObjectAdapter.read(jsonReader);

            // Create combined OutputDocument
            OutputDocument result = new OutputDocument();
            result.header = header;
            result.object = object;

            action.accept(result);
            return true;
          }
        } catch (IOException e) {
          throw new UncheckedIOException("Failed reading next element", e);
        }
      }
    };
    return StreamSupport.stream(spliterator, false) // false, don't create parallel stream
        .onClose(() -> {
          try {
            jsonReader.close();
          } catch (IOException e) {
            throw new UncheckedIOException("Failed closing JsonReader", e);
          }
        });
  }
  catch (Exception e) {
    try {
      jsonReader.close();
    } catch (IOException suppressed) {
      e.addSuppressed(suppressed);
    }
    throw e;
  }
}

This method can then be called like this:

try (Stream<OutputDocument> stream = transform(inputDocumentReader, new Gson())) {
    ...
}

inputDocumentReader is the Reader you created for your InputDocument file. The Gson instance can either be a new instance (as shown in the example above), or it can be one which you have created with GsonBuilder in case you registered custom adapters to customize how the POJO classes or their fields are deserialized.

Collectives™ on Stack Overflow

Using GSON to transform a nested JSON Stream

1 Answer 1

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related