Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ArbitraryBase Unicode-aware #1299

Merged
merged 3 commits into from Feb 23, 2023

Conversation

lionel-rowe
Copy link
Contributor

https://mathiasbynens.be/notes/javascript-unicode#counting-symbols

Open in Gitpod know more

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new JavaScript files are placed inside an existing directory.
  • All filenames should use the UpperCamelCase (PascalCase) style. There should be no spaces in filenames.
    Example:UserProfile.js is allowed but userprofile.js,Userprofile.js,user-Profile.js,userProfile.js are not
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

Copy link
Collaborator

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

@lionel-rowe
Copy link
Contributor Author

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

I'm not convinced that's true — surely accessing an array index is constant time, just like accessing an object property. There's no reason for at to be iterating over the array, as arr.at(n) acts identically to arr[n] for non-negative integer values of n. Maybe I should just swap those out for square-bracket accessors as they'll never be negative anyway.

The only linear time parts I added were these:

  const baseOneCharacters = [...baseOneCharacterString]
  const baseTwoCharacters = [...baseTwoCharacterString]

But they only run once per function call, and in any case, the first argument is already iterated prior to being reversed.

appgurueu
appgurueu previously approved these changes Feb 22, 2023
Copy link
Collaborator

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, you're correct. Glancing over the changes I missed the spread operators and confused Array.at with String.at, of which I had incorrectly assumed that it was linear time (it is not, it's constant time - it's basically just charAt + negative index support).

Is there any particular reason to prefer x.at(y) over x[y] here though? y will always be positive, so I'd prefer standard array indexing here.

Additionally, it would be good if you could convert the "test comments" into proper Jest tests.

Thanks for the fix.

@appgurueu appgurueu added the fix Fixes a bug label Feb 22, 2023
@raklaptudirm
Copy link
Member

@appgurueu Should I wait for the changes you suggested or good to merge?

@lionel-rowe
Copy link
Contributor Author

Changing baseTwoCharacters.at(remainder) to baseTwoCharacters[remainder] uncovered a performance bug — value was often not an integer, because the division result wasn't being floored. charAt and at both silently floor their arguments, which obscured the bug:

'abc'.charAt(1.5)   // b
;[...'abc'].at(1.5) // b

However, as a result, the loop was being iterated over many more times than necessary as value < 1 got repeatedly divided until it reached floating-point 0, with all the extra zero-characters finally being lopped off at the end with .replace(new RegExp(`^${baseTwoZero}+`), ''). Without that regex replacement, convertArbitraryBase('98', '0123456789', '01234567') had 358 leading zeros (i.e. 358 unnecessary iterations).

I've fixed that by using floor division and removing the regex replacement.

appgurueu
appgurueu previously approved these changes Feb 23, 2023
Copy link
Collaborator

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

@raklaptudirm raklaptudirm merged commit 0c42758 into TheAlgorithms:master Feb 23, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants