New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HtmlSanitizer] Introduce HtmlSanitizer component #44681
base: 6.1
Are you sure you want to change the base?
Conversation
Hi. First of all, thank you for you effort, this is really cool idea! Second: I'm not a Symony team member, so feel free to ignore me. My concern is that configuring will create a bunch of temporary objects. Maybe it's better to make a separate builder class, something like: <?php
$config = (new HtmlSanitizerConfigBuilder())
->allowElement('div')
->allowElement('a', ['title'])
->getConfig()
;
$sanitizer = new HtmlSanitizer($config); And call constructor only in the |
{ | ||
$this->config = clone $config; | ||
$this->maxInputLength = $maxInputLength; | ||
$this->parser = $parser ?: new MastermindsParser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$this->parser = $parser ?: new MastermindsParser(); | |
$this->parser = $parser ?? new MastermindsParser(); |
if (!isset($this->domVisitors[$context])) { | ||
$this->domVisitors[$context] = $this->createDomVisitorForContext($context); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!isset($this->domVisitors[$context])) { | |
$this->domVisitors[$context] = $this->createDomVisitorForContext($context); | |
} | |
$this->domVisitors[$context] ??= $this->createDomVisitorForContext($context); |
|
||
try { | ||
$parsed = $this->parser->parse($input); | ||
} catch (\Exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific about what kind of exception we expect here?
if (isset($self->blockedElements[$element])) { | ||
unset($self->blockedElements[$element]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (isset($self->blockedElements[$element])) { | |
unset($self->blockedElements[$element]); | |
} | |
unset($self->blockedElements[$element]); |
unset()
should be idempotent. No need to safe-guard the call.
*/ | ||
class HtmlSanitizerConfig | ||
{ | ||
// Elements that should be removed but their children should be retained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use PHPDoc syntax to document properties.
class HtmlSanitizerConfig | ||
{ | ||
// Elements that should be removed but their children should be retained | ||
private array $blockedElements = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific on the kind of elements that the array properties in this class may contain?
|
||
public function __construct(ParserInterface $parser, \Throwable $previous = null) | ||
{ | ||
parent::__construct('HTML parsing failed using parser '.\get_class($parser), 0, $previous); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent::__construct('HTML parsing failed using parser '.\get_class($parser), 0, $previous); | |
parent::__construct('HTML parsing failed using parser '.\get_debug_type($parser), 0, $previous); |
"league/uri": "^6.5", | ||
"masterminds/html5": "^2.4", | ||
"psr/log": "^1.0|^2.0|^3.0", | ||
"symfony/string": "^5.0|^6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"symfony/string": "^5.0|^6.0" | |
"symfony/string": "^5.4|^6.0" |
We've bumped all Symfony dependencies to ^5.4|^6.0
on the 6.0 branch. We should do the same for new components.
private NodeInterface $parentNode; | ||
private string $text; | ||
|
||
public function __construct(NodeInterface $parentNode, string $text) | ||
{ | ||
$this->parentNode = $parentNode; | ||
$this->text = $text; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private NodeInterface $parentNode; | |
private string $text; | |
public function __construct(NodeInterface $parentNode, string $text) | |
{ | |
$this->parentNode = $parentNode; | |
$this->text = $text; | |
} | |
public function __construct( | |
private NodeInterface $parentNode, | |
private string $text | |
) { | |
} |
try { | ||
$parsed = parse($url); | ||
} catch (Exception) { | ||
return null; | ||
} | ||
|
||
return $parsed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try { | |
$parsed = parse($url); | |
} catch (Exception) { | |
return null; | |
} | |
return $parsed; | |
try { | |
return parse($url); | |
} catch (Exception) { | |
return null; | |
} |
* | ||
* @final | ||
*/ | ||
class MastermindsParser implements ParserInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this class is not actually final
?
|
||
private function visitNode(\DOMNode $domNode, Cursor $cursor) | ||
{ | ||
$nodeName = u($domNode->nodeName)->lower()->toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to use symfony/string to lowercase the node name ? Node names are ASCII only AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this component is only used once, I would drop this dep 👍🏻
|
||
interface AttributeSanitizerInterface | ||
{ | ||
public function getSupportedElements(): array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation for those interfaces:
- method description explaining what the method is about
- phpdoc describing the type of arrays
|
||
use function League\Uri\build; | ||
use League\Uri\Exception; | ||
use function League\Uri\parse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start adding use statements for functions, I suggest sorting the use statements by type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a job for php-ca-fixer
@@ -0,0 +1,91 @@ | |||
Example usage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readme should contain the info present in all component readme files.
// Should the URL in the sanitized document be transformed to HTTPS if they are using HTTP | ||
private bool $forceHttpsUrls = false; | ||
|
||
// Sanitizers that should be applied to specific attributes in addition to standard sanitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here (sanitation
is not a valid word).
// Should the URL in the sanitized document be transformed to HTTPS if they are using HTTP | ||
private bool $forceHttpsUrls = false; | ||
|
||
// Sanitizers that should be applied to specific attributes in addition to standard sanitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property description should be in the doc comment of the property rather than in a separate comment
$this->parser = $parser; | ||
} | ||
|
||
public function getParser(): ParserInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exposing the parser in the exception ?
6.1.0 | ||
----- | ||
|
||
* added the component as experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* added the component as experimental | |
* Add the component as experimental |
*/ | ||
class MastermindsParser implements ParserInterface | ||
{ | ||
public function parse(string $html): \DOMNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheritdoc? Or is this not needed?
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig; | ||
use Symfony\Component\HtmlSanitizer\Visitor\AttributeSanitizer\AttributeSanitizerInterface; | ||
|
||
class HtmlSanitizerCustomTest extends TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final please
use PHPUnit\Framework\TestCase; | ||
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig; | ||
|
||
class HtmlSanitizerConfigTest extends TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final please
use Symfony\Component\HtmlSanitizer\HtmlSanitizer; | ||
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig; | ||
|
||
class HtmlSanitizerStandardTest extends TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final please
|
||
private function visitNode(\DOMNode $domNode, Cursor $cursor) | ||
{ | ||
$nodeName = u($domNode->nodeName)->lower()->toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this component is only used once, I would drop this dep 👍🏻
* | ||
* @final | ||
*/ | ||
class Cursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no real final?
use function Symfony\Component\String\u; | ||
|
||
/** | ||
* The DomVisitor iterate over the parsed DOM tree and visit nodes using the NodeVisitor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The DomVisitor iterate over the parsed DOM tree and visit nodes using the NodeVisitor. | |
* The DomVisitor iterates over the parsed DOM tree and visits nodes using the NodeVisitor. |
?
*/ | ||
interface NodeInterface | ||
{ | ||
public function addChild(self $node): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe the methods
</php> | ||
|
||
<testsuites> | ||
<testsuite name="Symfony String Component Test Suite"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong name
Following discussion on #44144, this PR proposes to add a new HtmlSanitizer component to Symfony.
I refactored the infrastructure of my original html-sanitizer to better follow the W3C Standard Proposal about sanitizers.
The sanitizer can be used as follow:
The component is working and well tested.
I did some very basic performance testing and it seems to be **~ 2 times faster** than my original library, mostly by relying on much less objects and more on native primitives. My original library was already as fast as the fastest PHP HTML sanitizers, meaning this component may well be the fastest sanitizer in the PHP ecosystem (to better check of course!). That's exciting :) !
I still have on my TODO list:
Feel free to review the component while I work on finalizing it and the FameworkBundle integration!
The text was updated successfully, but these errors were encountered: