Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
[Forms] trim option is not removing UTF8 ZERO WIDTH SPACE (\xE2\x80\x8B) #39230
Comments
…U+200B) and SOFT HYPHEN (U+00AD) (pmishev) This PR was merged into the 4.4 branch. Discussion ---------- [Form] Fixed StringUtil::trim() to trim ZERO WIDTH SPACE (U+200B) and SOFT HYPHEN (U+00AD) | Q | A | ------------- | --- | Branch? | 4.4 | Bug fix? | yes | New feature? | no <!-- please update src/**/CHANGELOG.md files --> | Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files --> | Tickets | Fix #39230 <!-- prefix each issue number with "Fix #", no need to create an issue if none exist, explain below instead --> | License | MIT <!-- Replace this notice by a short README for your feature/bugfix. This will help people understand your PR and can be used as a start for the documentation. Additionally (see https://symfony.com/releases): - Always add tests and ensure they pass. - Never break backward compatibility (see https://symfony.com/bc). - Bug fixes must be submitted against the lowest maintained branch where they apply (lowest branches are regularly merged to upper ones so they get the fixes too.) - Features and deprecations must be submitted against branch 5.x. --> Commits ------- 258bea7 [Form] Fixed StringUtil::trim() to trim ZERO WIDTH SPACE (U+200B) and SOFT HYPHEN (U+00AD)
Symfony version(s) affected: 4.4.16
Description
StringUtil::trim()
is doingpreg_replace('/^[\pZ\p{Cc}]+|[\pZ\p{Cc}]+$/u', '', $string)
to trim strings, however this regex is missing ZERO WIDTH SPACE (\xE2\x80\x8B) characters.How to reproduce
http://sandbox.onlinephpfunctions.com/code/5b72e8ff76c34a313c0f2799995f56e5993b6b60
Possible Solution
p{Cf}
catches this character, but it also catches some soft-hyphen (https://en.wikipedia.org/wiki/Unicode_character_property). I don't think that should be a problem though.I suggest the regex to be changed to
preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $string)
.Cc
andCf
are the only categories with characters in theC
group anyway.