Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pyppeteer use proxies #266

Open
oldani opened this issue Feb 18, 2019 · 10 comments
Open

Make pyppeteer use proxies #266

oldani opened this issue Feb 18, 2019 · 10 comments
Labels
enhancement good first issue
Milestone

Comments

@oldani
Copy link
Member

@oldani oldani commented Feb 18, 2019

If you're using proxies with requests-html and rendering JS sites is all good. Once you render a website pyppeteer don't know about this proxies and will expose your IP. This is an undesired behavior when scraping with proxies.

The idea is that whenever someone passes in proxies to the session object or any method call, make pyppeteer also use these proxies. #265

@oldani oldani added the enhancement label Feb 18, 2019
@oldani oldani added the good first issue label Feb 26, 2019
@oldani oldani added this to the v0.11.0 milestone Feb 26, 2019
@Bobspadger
Copy link

@Bobspadger Bobspadger commented Feb 26, 2019

This would be a good item to get fixed, currently when rendering I have to stop using proxy servers.

@oldani
Copy link
Member Author

@oldani oldani commented Feb 27, 2019

I will take on this

@Bobspadger
Copy link

@Bobspadger Bobspadger commented Feb 27, 2019

cool thanks, I was going to take a look later but I'm not up on the whole async thing yet :)

@ep4devops
Copy link

@ep4devops ep4devops commented Apr 11, 2019

I am in a very restrictive Coorporate Network and expiriencing many issues with Python and Proxies since the beginning of using requests-html.
My goal is to scrape some cisco site, which has al lot of html returned by js - therefor I have to use the render functionality.

1st (solved manually)
The initial Chromium Download of pyppeteer does not use proxies, so I had to download it manually and check where it expects to be:

python -c 'import pyppeteer; print(pyppeteer.chromium_downloader.chromiumExecutable)'

>>'win64': WindowsPath('C:/Users/XXX/AppData/Local/pyppeteer/pyppeteer/local-chromium/575458/chrome-win32/chrome.exe'

2nd (solved manually)
Chromium does not accept Auth+Password given to --proxy-server="XXX" arg, see here

Now I am starting chromium with
session = HTMLSession(browser_args=['--no-sandbox', '--proxy-pac-url="http://XXX/XXX.pac"'])
while using the Proxy Auto Auth addon for chromium...

Start chrome.exe with the --proxy-pac-url="http://XXX/XXX.pac argument, enter your credentials and install the Proxy Auto Auth addon. Restart chrome.exe with the arguemts and check if you can use it without any proxy auth.

3rd (not solved yet)
The render function does not use my proxy:

req = session.get(url=url, proxies=proxyDict, verify=False)
req.html.render()

pyppeteer.errors.PageError: net::ERR_NAME_NOT_RESOLVED at <URL>

I would be very happy if this can be solved ...

@FlyingZebra1
Copy link

@FlyingZebra1 FlyingZebra1 commented May 3, 2019

+1 On this being an amazing thing to get resolved.

@predicador37
Copy link

@predicador37 predicador37 commented Aug 22, 2019

Are there any news about this issue? Scraping behind corporate proxies is impossible right now... Any planned progress on this? Thank you

@lauevrar77
Copy link

@lauevrar77 lauevrar77 commented Jun 29, 2020

Is there any news on this ?
I saw this commit but don't know if it is the expected patch : #396

According to me, the best solution would be to be able to use proxies in the same way as requests do (from env or dict).
Is it possible at this time ?

@MrIdjit
Copy link

@MrIdjit MrIdjit commented Jul 30, 2020

How is this going? I would like to know how I can use socks5 proxies with requests-html... and the .render() function.

@Bobspadger
Copy link

@Bobspadger Bobspadger commented Feb 8, 2021

bump? any updates?

@kiriharu
Copy link

@kiriharu kiriharu commented Oct 2, 2021

bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue
Projects
None yet
Development

No branches or pull requests

8 participants