Python JSON append if value doesn't exist

Question

I've got a json file with 30-ish, blocks of "dicts" where every block has and ID, like this:

{
      "ID": "23926695",
      "webpage_url": "https://.com",
      "logo_url": null,
      "headline": "aewafs",
      "application_deadline": "2020-03-31T23:59:59",
}

Since my script pulls information in the same way from an API more than once, I would like to append new "blocks" to the json file only if the ID doesn't already exist in the JSON file.

I've got something like this so far:

import os

check_empty = os.stat('pbdb.json').st_size
if check_empty == 0:
    with open('pbdb.json', 'w') as f:
        f.write('[\n]')    # Writes '[' then linebreaks with '\n' and writes ']'
output = json.load(open("pbdb.json"))

for i in jobs:
    output.append({
        'ID': job_id, 
        'Title': jobtitle, 
        'Employer' : company, 
        'Employment type' : emptype, 
        'Fulltime' : tid, 
        'Deadline' : deadline, 
        'Link' : webpage
    })

with open('pbdb.json', 'w') as job_data_file:
    json.dump(output, job_data_file)

but I would like to only do the "output.append" part if the ID doesn't exist in the Json file.

This isn't valid JSON. It's JSON Lines and you'll have to scan the file each time to look to see whether the ID exists; that's gonna be O(N), so gets progressively slower as the dataset grows. Or you can try keep something in memory to keep track of seen IDs? Is there a reason that you're not using a database? — roganjosh, Commented Mar 22, 2020 at 20:09
Actually, it's not even JSON Lines because you're putting the carriage return in a list. I don't know what you're trying to do — roganjosh, Commented Mar 22, 2020 at 20:14
it writes the data in output.append to a json file and it looks pretty much like this image: cloud.google.com/bigquery/images/create-schema-array.png I don't know if that's valid or not but it's not throwing any errors atm, it's just a fun project for me to learn, and since the dataset would never be that big, I thought a JSON or CSV file would work well enough, I'm pretty new to this, would a database make this much easier? — Derpa, Commented Mar 22, 2020 at 20:51
That is valid JSON, but you haven't shown the newlines. Yes, a database would make this easier — roganjosh, Commented Mar 22, 2020 at 20:53
any recommendations for a good, lightweight database that works great with python? — Derpa, Commented Mar 22, 2020 at 21:31

Bradia · Accepted Answer · 2020-03-22 20:39:39Z

0

I am not able to complete the code you provided but I added an example to show how you can achieve the none duplicate list of jobs(hopefully it helps):

# suppose `data` is you input data with duplicate ids included
data = [{'id': 1, 'name': 'john'}, {'id': 1, 'name': 'mary'}, {'id': 2, 'name': 'george'}]

# using dictionary comprehension you can eliminate the duplicates and finally get the results by calling the `values` method on dict.
noduplicate = list({itm['id']:itm for itm in data}.values())

with open('pbdb.json', 'w') as job_data_file:
    json.dump(noduplicate, job_data_file)

answered Mar 22, 2020 at 20:39

Bradia

8575 silver badges8 bronze badges

Well that doesn't make sense because it just overwrites duplicate data
– roganjosh
Commented Mar 22, 2020 at 21:54
You are right, however having an id means data are duplicates and from the question, it seems that having one copy of each is sufficient and overwriting should not matter.
– Bradia
Commented Mar 22, 2020 at 22:30
That's not true. You have no guarantee on the order of records, generally, with JSON. So you have no idea which value of a duplicate key would be saved
– roganjosh
Commented Mar 22, 2020 at 22:33
Here is the quote from the question Since my script pulls information in the same way from an API more than once, I would like to append new "blocks" to the json file only if the ID doesn't already exist in the JSON file. which means duplicate data are coming in.
– Bradia
Commented Mar 22, 2020 at 22:36
Great. And your answer doesn't deal with that. The very act of opening the file in write mode just wipes its contents
– roganjosh
Commented Mar 22, 2020 at 22:41

| Show 1 more comment

Derpa · Accepted Answer · 2020-03-22 23:02:57Z

0

I'll just go with a database guys, thank you for your time, we can close this thread now

answered Mar 22, 2020 at 23:02

Derpa

271 silver badge6 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Python JSON append if value doesn't exist

2 Answers 2

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related