2

Purpose: Using GSON take an input stream of a large JSON file and expose it downstream functions as an Iterator; with the added constraint that I physically can't store the entire JSON File in memory. Currently I do have this working using some basic Java code that does the following:

  • knows when to skip curly braces
  • read the stream until it finds the next valid JSON Object
  • parse that into a POJO using GSON

Desired Outcome See if GSON has the built in ability to replace my custom Java code.

SAMPLE INPUT DOCUMENT

{
    "header":
    {
        "header1":"value1",
        "header2":"value2",
        "header3":"value3"
    },
    "body":
    {
        "obj-1":
        {
            "id":"obj-1",
            "name":"obj-1-name",
            "description":"obj-1-description"
        },
        "obj-2":
        {
            "id":"obj-2",
            "name":"obj-2-name",
            "description":"obj-2-description"
        },
        "obj-3":
        {
            "id":"obj-3",
            "name":"obj-3-name",
            "description":"obj-3-description"
        },
        "obj-4":
        {
            "id":"obj-4",
            "name":"obj-4-name",
            "description":"obj-4-description"
        }
    }
}

SAMPLE OUTPUT DOCUMENT

{
    "header":
    {
        "header1":"value1",
        "header2":"value2",
        "header3":"value3"
    },  
    "object":
    {
        "id":"obj-1",
        "name":"obj-1-name",
        "description":"obj-1-description"
    }
}

POJO's have been created for the "header" object, the individual elements in the "body" JSON Object, and the output document.

Using the following as a stepping stone to initially solve the problem, https://howtodoinjava.com/gson/jsonreader-streaming-json-parser/, is it my understanding that since there is an transformation of the JSON Structure I would need to do that basic 3 step process; just translate it into GSON specific functions?

2
  • 1
    Should every entry in body be wrapped into a new JSON object which duplicates the header? For example, would your "Sample output document" look the same for obj-2? Just to be sure that there is no misunderstanding, you want to convert InputDocumentIterator<OutputDocument>, right? Commented Nov 25, 2021 at 0:37
  • Yes. For each element the header object does not change.
    – Ad Astra
    Commented Nov 26, 2021 at 1:46

1 Answer 1

1

As mentioned in the linked tutorial, you should use JsonReader when you want to process JSON data in a streaming way. You can then use the respective TypeAdapter instances obtained from Gson for your POJO classes and use them to parse the header and the individual body objects.
You could also use the Gson.fromJson(JsonReader, ...) method instead of directly using the TypeAdapter instances; however Gson unfortunately does not respect the leniency setting / always makes the reader lenient edit: no longer the case, see JsonReader#setStrictness. Unless you explicitly need this, I would recommend against this and instead directly use the TypeAdapter instances because then the leniency setting of the JsonReader is respected.

Assuming you have the following POJO classes:

public class Header {
  public String header1;
  public String header2;
  public String header3;
}

public class BodyObject {
  public String id;
  public String name;
  public String description;
}

public class OutputDocument {
  public Header header;
  public BodyObject object;
}

Then you could create a method which creates a Stream<OutputDocument> like the following. Using a Stream here has the advantage that it's close method can be used to close the Reader providing the JSON data. However it can also be implemented in a similar way using an Iterator.

/**
 * Creates a {@link Stream} which transforms the data to {@link OutputDocument} elements.
 * 
 * <p><b>Important:</b> The provided reader will be closed by this method, or by the created
 * stream. It is therefore necessary to call {@link Stream#close()} (for example by using a
 * try-with-resources statement).
 * 
 * @param inputDocumentReader JSON data stream
 * @param gson Gson object used for looking up adapters
 * @return Stream of transformed elements
 */
public static Stream<OutputDocument> transform(Reader inputDocumentReader, Gson gson) throws IOException {
  JsonReader jsonReader = new JsonReader(inputDocumentReader);
  try {
    jsonReader.beginObject();
    String headerProperty = jsonReader.nextName();
    if (!headerProperty.equals("header")) {
      throw new IllegalArgumentException("Expected 'header' property at " + jsonReader.getPath());
    }

    // Read the Header
    TypeAdapter<Header> headerAdapter = gson.getAdapter(Header.class);
    Header header = headerAdapter.read(jsonReader);


    String bodyProperty = jsonReader.nextName();
    if (!bodyProperty.equals("body")) {
      throw new IllegalArgumentException("Expected 'body' property at " + jsonReader.getPath());
    }

    // Start reading body
    jsonReader.beginObject();
    TypeAdapter<BodyObject> bodyObjectAdapter = gson.getAdapter(BodyObject.class);

    long estimatedSize = Long.MAX_VALUE; // unknown size
    // Could also add `| NONNULL`, if there are no null body objects
    int characteristics = Spliterator.Spliterator.ORDERED;
    Spliterator<OutputDocument> spliterator = new AbstractSpliterator<>(estimatedSize, characteristics) {
      @Override
      public boolean tryAdvance(Consumer<? super OutputDocument> action) {
        try {
          // Check if end of 'body' object has been reached
          if (!jsonReader.hasNext()) {
            // End 'body'
            jsonReader.endObject();
            // End top level object
            jsonReader.endObject();

            if (jsonReader.peek() != JsonToken.END_DOCUMENT) {
              throw new IllegalStateException("Expected end of JSON document at " + jsonReader.getPath());
            }
            // Reached end
            return false;
          } else {
            // Skip entry name
            jsonReader.skipValue();

            BodyObject object = bodyObjectAdapter.read(jsonReader);

            // Create combined OutputDocument
            OutputDocument result = new OutputDocument();
            result.header = header;
            result.object = object;

            action.accept(result);
            return true;
          }
        } catch (IOException e) {
          throw new UncheckedIOException("Failed reading next element", e);
        }
      }
    };
    return StreamSupport.stream(spliterator, false) // false, don't create parallel stream
        .onClose(() -> {
          try {
            jsonReader.close();
          } catch (IOException e) {
            throw new UncheckedIOException("Failed closing JsonReader", e);
          }
        });
  }
  catch (Exception e) {
    try {
      jsonReader.close();
    } catch (IOException suppressed) {
      e.addSuppressed(suppressed);
    }
    throw e;
  }
}

This method can then be called like this:

try (Stream<OutputDocument> stream = transform(inputDocumentReader, new Gson())) {
    ...
}

inputDocumentReader is the Reader you created for your InputDocument file. The Gson instance can either be a new instance (as shown in the example above), or it can be one which you have created with GsonBuilder in case you registered custom adapters to customize how the POJO classes or their fields are deserialized.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.