Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making html5lib.tokenizer public #532

Open
mgrandi opened this issue Apr 12, 2021 · 0 comments
Open

consider making html5lib.tokenizer public #532

mgrandi opened this issue Apr 12, 2021 · 0 comments

Comments

@mgrandi
Copy link

mgrandi commented Apr 12, 2021

Hello,

In version https://github.com/html5lib/html5lib-python/releases/tag/0.999999999 , html5lib.tokenizer was made private

The wpull project (https://github.com/ArchiveTeam/wpull ) uses this library, and if we were to ever migrate to using the 1.X versions, it would negatively impact the application, because instead of just tokenizing a webpage (see https://github.com/ArchiveTeam/wpull/blob/a4ff4a93f613ce18ad3c515aa3d4f5848a88b98c/wpull/document/htmlparse/html5lib_.py ), we would have to use the full tree parsing which is slower and uses more ram

is there any reason this was made private when the 1.x branch was released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant