Make ArbitraryBase Unicode-aware #1299

lionel-rowe · 2023-02-22T03:42:42Z

https://mathiasbynens.be/notes/javascript-unicode#counting-symbols

know more

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

I have read CONTRIBUTING.md.
This pull request is all my own work -- I have not plagiarized.
I know that pull requests will not be merged if they fail the automated tests.
This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
All new JavaScript files are placed inside an existing directory.
All filenames should use the UpperCamelCase (PascalCase) style. There should be no spaces in filenames.
Example:UserProfile.js is allowed but userprofile.js,Userprofile.js,user-Profile.js,userProfile.js are not
All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

https://mathiasbynens.be/notes/javascript-unicode#counting-symbols

appgurueu

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

lionel-rowe · 2023-02-22T10:03:58Z

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

I'm not convinced that's true — surely accessing an array index is constant time, just like accessing an object property. There's no reason for at to be iterating over the array, as arr.at(n) acts identically to arr[n] for non-negative integer values of n. Maybe I should just swap those out for square-bracket accessors as they'll never be negative anyway.

The only linear time parts I added were these:

  const baseOneCharacters = [...baseOneCharacterString]
  const baseTwoCharacters = [...baseTwoCharacterString]

But they only run once per function call, and in any case, the first argument is already iterated prior to being reversed.

appgurueu

Nevermind, you're correct. Glancing over the changes I missed the spread operators and confused Array.at with String.at, of which I had incorrectly assumed that it was linear time (it is not, it's constant time - it's basically just charAt + negative index support).

Is there any particular reason to prefer x.at(y) over x[y] here though? y will always be positive, so I'd prefer standard array indexing here.

Additionally, it would be good if you could convert the "test comments" into proper Jest tests.

Thanks for the fix.

raklaptudirm · 2023-02-22T21:44:17Z

@appgurueu Should I wait for the changes you suggested or good to merge?

lionel-rowe · 2023-02-23T04:55:16Z

Changing baseTwoCharacters.at(remainder) to baseTwoCharacters[remainder] uncovered a performance bug — value was often not an integer, because the division result wasn't being floored. charAt and at both silently floor their arguments, which obscured the bug:

'abc'.charAt(1.5)   // b
;[...'abc'].at(1.5) // b

However, as a result, the loop was being iterated over many more times than necessary as value < 1 got repeatedly divided until it reached floating-point 0, with all the extra zero-characters finally being lopped off at the end with .replace(new RegExp(`^${baseTwoZero}+`), ''). Without that regex replacement, convertArbitraryBase('98', '0123456789', '01234567') had 358 leading zeros (i.e. 358 unnecessary iterations).

I've fixed that by using floor division and removing the regex replacement.

Conversions/ArbitraryBase.js

appgurueu

Excellent work.

Make ArbitraryBase Unicode-aware

cff0f16

https://mathiasbynens.be/notes/javascript-unicode#counting-symbols

lionel-rowe requested review from raklaptudirm and appgurueu as code owners February 22, 2023 03:42

appgurueu reviewed Feb 22, 2023

View reviewed changes

appgurueu previously approved these changes Feb 22, 2023

View reviewed changes

appgurueu added the fix Fixes a bug label Feb 22, 2023

Fix performance bug and add Unicode test

9103435

Add BigInt version and push output chars to array

6667e4c

lionel-rowe dismissed appgurueu’s stale review via e3d6fcc February 23, 2023 04:40

lionel-rowe force-pushed the patch-1 branch from e3d6fcc to 9103435 Compare February 23, 2023 05:00

appgurueu reviewed Feb 23, 2023

View reviewed changes

Conversions/ArbitraryBase.js Outdated Show resolved Hide resolved

appgurueu previously approved these changes Feb 23, 2023

View reviewed changes

lionel-rowe dismissed appgurueu’s stale review via 6667e4c February 23, 2023 09:12

appgurueu approved these changes Feb 23, 2023

View reviewed changes

raklaptudirm approved these changes Feb 23, 2023

View reviewed changes

raklaptudirm merged commit 0c42758 into TheAlgorithms:master Feb 23, 2023
2 checks passed

Make ArbitraryBase Unicode-aware #1299

Make ArbitraryBase Unicode-aware #1299

lionel-rowe commented Feb 22, 2023

appgurueu left a comment

lionel-rowe commented Feb 22, 2023

appgurueu left a comment •

edited

raklaptudirm commented Feb 22, 2023

lionel-rowe commented Feb 23, 2023

appgurueu left a comment

Make ArbitraryBase Unicode-aware #1299

Make ArbitraryBase Unicode-aware #1299

Conversation

lionel-rowe commented Feb 22, 2023

Describe your change:

Checklist:

appgurueu left a comment

Choose a reason for hiding this comment

lionel-rowe commented Feb 22, 2023

appgurueu left a comment • edited

Choose a reason for hiding this comment

raklaptudirm commented Feb 22, 2023

lionel-rowe commented Feb 23, 2023

appgurueu left a comment

Choose a reason for hiding this comment

appgurueu left a comment •

edited