Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a http.client.BadStatusLine error after calling render() #395

Closed
sagar1025 opened this issue May 28, 2020 · 16 comments
Closed

Getting a http.client.BadStatusLine error after calling render() #395

sagar1025 opened this issue May 28, 2020 · 16 comments

Comments

@sagar1025
Copy link

@sagar1025 sagar1025 commented May 28, 2020

I basically just followed the example in the documentation:

session = HTMLSession()

r = session.get('https://python.org/')

After running this

r.html.render()

I'm getting this error

File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.6/urllib/request.py", line 526, in open response = self._open(req, data) File "/usr/lib/python3.6/urllib/request.py", line 544, in _open '_open', req) File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/usr/lib/python3.6/urllib/request.py", line 1346, in http_open return self.do_open(http.client.HTTPConnection, req) File "/usr/lib/python3.6/urllib/request.py", line 1321, in do_open r = h.getresponse() File "/usr/lib/python3.6/http/client.py", line 1346, in getresponse response.begin() File "/usr/lib/python3.6/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib/python3.6/http/client.py", line 289, in _read_status raise BadStatusLine(line) http.client.BadStatusLine: GET /json/version HTTP/1.1

r.html.html prints the entire DOM but I'm not sure why I would get a http.client.BadStatusLine error.

Is this the right way to do this? or am I missing something here?

I'm currently using Python 3.6.9

Thanks

@misterch0c
Copy link

@misterch0c misterch0c commented May 30, 2020

Having the same issue

@sagar1025
Copy link
Author

@sagar1025 sagar1025 commented Jun 3, 2020

Are you using it on Windows Subsystem for Linux?
I didn't get this error while running it natively on Windows

@xtanmy
Copy link

@xtanmy xtanmy commented Jun 5, 2020

Same issue on centos/python3.7

@justfish09
Copy link

@justfish09 justfish09 commented Jun 10, 2020

same issue. I am running it on WSL python 3.7.


File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/requests_html.py", line 586, in render self.browser = self.session.browser  # Automatically create a event loop and browser
File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/requests_html.py", line 730, in browser self._browser = self.loop.run_until_complete(super().browser)
File "/home/anaconda3/envs/my_package/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete return future.result()
File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/requests_html.py", line 714, in browser self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/pyppeteer/launcher.py", line 305, in launch return await Launcher(options, **kwargs).launch()
File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/pyppeteer/launcher.py", line 166, in launch self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/anaconda3/envs/my_package/lib/python3.7/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint with urlopen(url) as f:
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout)
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data)
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 543, in _open '_open', req)
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args)
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 1347, in http_open return self.do_open(http.client.HTTPConnection, req)
File "/home/anaconda3/envs/my_package/lib/python3.7/urllib/request.py", line 1322, in do_open r = h.getresponse()
File "/home/anaconda3/envs/my_package/lib/python3.7/http/client.py", line 1344, in getresponse response.begin()
File "/home/anaconda3/envs/my_package/lib/python3.7/http/client.py", line 306, in begin version, status, reason = self._read_status()
File "/home/anaconda3/envs/my_package/lib/python3.7/http/client.py", line 288, in _read_status 
    raise BadStatusLine(line)
http.client.BadStatusLine: GET /json/version HTTP/1.1
@elessarelfstone
Copy link

@elessarelfstone elessarelfstone commented Jun 17, 2020

Same issue on centos/python3.7
same environment, same issue

@brendan98
Copy link

@brendan98 brendan98 commented Jun 18, 2020

same issue. python 3.6.7 on WSL

@ligou525
Copy link

@ligou525 ligou525 commented Jun 29, 2020

Has anyone solved this problem on centos/python3.6?

@SadriShehu
Copy link

@SadriShehu SadriShehu commented Jun 30, 2020

Having this problem in Ubuntu 18.04 VPS as well!

@dusan-zivkovic
Copy link

@dusan-zivkovic dusan-zivkovic commented Jun 30, 2020

At first I thought it was something about macOS. Then WSL with Ubuntu. Then Ubuntu proper. But it wasn't them. What worked on macOS was to fix the versions tornado==4.5.3, and also notebook==5.7.8 (which allowed both HTMLSession and AsyncHTMLSession to work as intended under Jupyter). Haven't yet checked WSL and Ubuntu. Please check if this works for you on other combinations of OS and Python.

@jethr0-1
Copy link

@jethr0-1 jethr0-1 commented Jul 15, 2020

At first I thought it was something about macOS. Then WSL with Ubuntu. Then Ubuntu proper. But it wasn't them. What worked on macOS was to fix the versions tornado==4.5.3, and also notebook==5.7.8 (which allowed both HTMLSession and AsyncHTMLSession to work as intended under Jupyter). Haven't yet checked WSL and Ubuntu. Please check if this works for you on other combinations of OS and Python.

Tried this on WSL Kali Linux, unfortunately didn't work. Still getting a very similar error to that of others in this thread:
Traceback (most recent call last): File "pyppeteertest.py", line 11, in <module> asyncio.get_event_loop().run_until_complete(main()) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "pyppeteertest.py", line 5, in main browser = await launch() File "/usr/local/lib/python3.8/dist-packages/pyppeteer/launcher.py", line 305, in launch return await Launcher(options, **kwargs).launch() File "/usr/local/lib/python3.8/dist-packages/pyppeteer/launcher.py", line 166, in launch self.browserWSEndpoint = get_ws_endpoint(self.url) File "/usr/local/lib/python3.8/dist-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint with urlopen(url) as f: File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.8/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.8/urllib/request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/usr/lib/python3.8/urllib/request.py", line 1379, in http_open return self.do_open(http.client.HTTPConnection, req) File "/usr/lib/python3.8/urllib/request.py", line 1354, in do_open r = h.getresponse() File "/usr/lib/python3.8/http/client.py", line 1332, in getresponse response.begin() File "/usr/lib/python3.8/http/client.py", line 303, in begin version, status, reason = self._read_status() File "/usr/lib/python3.8/http/client.py", line 285, in _read_status raise BadStatusLine(line) http.client.BadStatusLine: GET /json/version HTTP/1.1

@sagar1025
Copy link
Author

@sagar1025 sagar1025 commented Jul 16, 2020

I could be wrong, but I think Pyppeteer is not fully compatible with WSL. Pyppeteer is supposed to be an unofficial port of Puppeteer and I have found similar issues with puppeteer not working on WSL

Additionally, I also noticed that the render() method did not download Chromium to my home directory ~/.pyppeteer as it should. This could be why the error is being thrown. Can anyone else verify if this directory exists and there's content in it?

I tried running the same code a few times but it did not download the Chromium browser, however, I did notice a few processes were spawned with the name chromium when I ran ps -aef command. I'm not entirely sure what these processes are, but they were not killed even after the program finished executing.

I may have a possible work around (this is not fully tested)

  1. Update by running sudo apt-get update or equivalent.
  2. Download the chromium browser manually by running sudo apt-get install chromium-browser. This downloads the chromium browser to /usr/bin/
  3. Since requests-html looks in ~/.pyppeteer for the Chromium browser, you have to have to copy the Chromium browser from /usr/bin/ to ~/.pyppeteer. You can do this by running cp /usr/bin/chromium-browser ~/.pyppeteer/ (assuming you have created the directory ~/.pyppeteer)
@brendan98
Copy link

@brendan98 brendan98 commented Jul 16, 2020

sagar1025, thanks for the tips. I just installed chromium-browser and it worked. I didn't have to copy it or make a soft link and, in fact, I don't even have a .pyppeteer folder in my home folder... but it is working.

I just did sudo apt-get update and sudo apt-get install chromium-browser. Then went to my project, activated the venv and it's working. I'm able to get the rendered html.

Was chromium supposed to be installed automatically or did I just miss that step in the requests-html setup process?

@raja-ankaha
Copy link

@raja-ankaha raja-ankaha commented Jul 30, 2020

@brendan98 According to doc, it will be installed automatically. For me, it isnt working on centos7 but working on macos

@Ankur-singh
Copy link

@Ankur-singh Ankur-singh commented Aug 5, 2020

I may have a possible work around (this is not fully tested)

  1. Update by running sudo apt-get update or equivalent.
  2. Download the chromium browser manually by running sudo apt-get install chromium-browser. This downloads the chromium browser to /usr/bin/
  3. Since requests-html looks in ~/.pyppeteer for the Chromium browser, you have to have to copy the Chromium browser from /usr/bin/ to ~/.pyppeteer. You can do this by running cp /usr/bin/chromium-browser ~/.pyppeteer/ (assuming you have created the directory ~/.pyppeteer)

I followed these steps, but I am still getting the same error. I am using Ubuntu WSL.

@Ankur-singh
Copy link

@Ankur-singh Ankur-singh commented Aug 13, 2020

I got it working on WSL. These are the steps of followed:

  1. install lib deps for chrome
sudo apt-get update && apt-get install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
  1. install pyppdf manually
pip install pyppdf
  1. then use this as the first line:
import pyppdf.patch_pyppeteer
@sagar1025
Copy link
Author

@sagar1025 sagar1025 commented Aug 13, 2020

I got it working on WSL. These are the steps of followed:

1. install lib deps for chrome
sudo apt-get update && apt-get install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
1. install pyppdf manually
pip install pyppdf
1. then use this as the first line:
import pyppdf.patch_pyppeteer

So this was a dependency issue?

@sagar1025 sagar1025 closed this Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.