Open
Description
I have some code to pull metadata from YouTube
response = requests.get(video_url)
metadata = extruct.extract(response.text, base_url="https://youtube.com")
Have noticed some recent crashing, but only on some videos.
No crash: https://www.youtube.com/watch?v=ZY48KUAZKhM https://www.youtube.com/watch?v=ZlVI7YJGHq0
Crash: https://www.youtube.com/watch?v=987wzJ2NHBE https://www.youtube.com/watch?v=0-EF60neguk
Common factor among those that crash is apostrophes in the channel name!
Traceback (most recent call last):
File "/home/will/local/breda/src/dredger/ingest/tests/test_youtube.py", line 72, in test_one
youtube.get_video_data("https://www.youtube.com/watch?v=987wzJ2NHBE")
File "/home/will/local/breda/src/dredger/ingest/youtube.py", line 46, in get_video_data
metadata = extruct.extract(response.text, base_url="https://youtube.com")
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/_extruct.py", line 108, in extract
output[syntax] = list(extract(document, base_url=base_url))
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 25, in extract_items
return [
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 25, in <listcomp>
return [
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 38, in _extract_items
data = jstyleson.loads(HTML_OR_JS_COMMENTLINE.sub('', script),strict=False)
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/jstyleson.py", line 123, in loads
return json.loads(dispose(text), **kwargs)
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 211 (char 210)
Haven't had a chance today to dig into much beyond triaging the above.
Metadata
Metadata
Assignees
Labels
No labels