Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: convert CLDR locale extraction from Gulp to Bazel tool #42230

Closed
wants to merge 8 commits into from

Conversation

devversion
Copy link
Member

@devversion devversion commented May 21, 2021

See individual commits.

This PR initially contained the update to CLDR 39 too, but I've decided to switch back to CLDR 37 so that
we can land this (without having to wait for a new major). I've also compared old generated files with the new
generated files to make sure nothing is missing/changed in terms of the locale data.

@google-cla google-cla bot added the cla: yes label May 21, 2021
@zarend zarend added the comp: build & ci label May 21, 2021
@ngbot ngbot bot added this to the Backlog milestone May 21, 2021
@devversion devversion force-pushed the build/cldr-bazel-2 branch 7 times, most recently from ae7cff0 to 56fcaa6 Compare May 23, 2021
@devversion devversion changed the title [WIP] build: convert CLDR locale extraction from Gulp to Bazel tool build: convert CLDR locale extraction from Gulp to Bazel tool May 23, 2021
@devversion devversion requested a review from josephperrott May 23, 2021
@devversion devversion marked this pull request as ready for review May 23, 2021
@pullapprove pullapprove bot requested a review from AndrewKushnir May 23, 2021
@devversion devversion requested a review from petebacondarwin May 23, 2021
@petebacondarwin petebacondarwin added action: review target: patch labels May 23, 2021
Copy link
Member

@petebacondarwin petebacondarwin left a comment

Fantastic work @devversion! I love the way you can skip from TS to Skylark and back so fluently. Here is my initial feedback. No serious blockers that I can see. Mainly just a bunch of NITs and suggestions.

package.json Show resolved Hide resolved
packages/common/locales/generate-locales-tool/cldr-data.ts Outdated Show resolved Hide resolved
packages/common/locales/BUILD.bazel Show resolved Hide resolved
packages/common/locales/BUILD.bazel Show resolved Hide resolved
packages/common/locales/closure-locale.ts Show resolved Hide resolved
@devversion
Copy link
Member Author

@devversion devversion commented May 25, 2021

@petebacondarwin Thanks for reviewing so thoroughly! I've addressed the feedback. While doing the requested cleanup of a few things within the old extract.js script, I also went ahead and simplified the base currency generation a bit.

Previously, we generated two objects of base currencies. Once for the currencies.ts file in common/src/i18n, and one just for comparisons. I've changed it so that we only have to generate one type of BaseCurrencies object. The old extract.js script generated the two different objects so that it could do comparisons using toString() (the locale currencies do not contain the fraction digits, while the base currencies does; -> this is the reason for the addDigits flag in generateBaseCurrencies)

@devversion devversion requested a review from petebacondarwin May 25, 2021
Copy link
Member

@petebacondarwin petebacondarwin left a comment

Great work! I really like the changes you made to the binaries.

Copy link
Contributor

@AndrewKushnir AndrewKushnir left a comment

@devversion looks awesome! 👍

Thanks for documenting the logic, this is really helpful. While you have all the context, I was thinking that it might also be beneficial to add a README.md file (into the packages/common/locales/generate-locales-tool folder) to describe some aspects of the process and optmiizations from high-level perspective (while keeping the details in the code as well).

@devversion
Copy link
Member Author

@devversion devversion commented May 25, 2021

@petebacondarwin Thanks! Addressed the feedback.
@AndrewKushnir I've added a little README. It's far from being very detailed, but might make some things more clear (and can serve as a foundation). Please have another look!

@devversion devversion added target: minor and removed target: patch labels May 25, 2021
@AndrewKushnir
Copy link
Contributor

@AndrewKushnir AndrewKushnir commented May 26, 2021

@devversion great, thanks for adding the readme file 👍

@AndrewKushnir AndrewKushnir removed the action: merge label Jul 13, 2021
@AndrewKushnir
Copy link
Contributor

@AndrewKushnir AndrewKushnir commented Jul 13, 2021

FYI, removing from the merge queue for now (by removing the "merge" label), since some extra checks should be performed in g3 before we can land this change.

devversion added 8 commits Jul 16, 2021
This is a pre-refactor commit allowing us to move
the CLDR locale generation to Bazel where files would
no longer be checked-in, except for the `closure-locale`
file that is synced into Google3.
Converts the CLDR locale extraction script to a Bazel tool.
This allows us to generate locale files within Bazel, so that
locales don't need to live as sources within the repo. Also
it allows us to get rid of the legacy Gulp tooling.

The migration of the Gulp script to a Bazel tool involved the
following things:

  1. Basic conversion of the `extract.js` script to TypeScript.
     This mostly was about adding explicit types. e.g. adding `locale:
     string` or `localeData: CldrStatic`.

  2. Split-up into separate files. Instead of keeping the large
     `extract.js` file, the tool has been split into separate files.
     The logic remains the same, just that code is more readable and
     maintainable.

  3. Introduction of a new `index.ts` file that is the entry-point
     for the Bazel tool. Previously the Gulp tool just generated
     all locale files, the default locale and base currency files
     at once. The new entry-point accepts a mode to be passed as
     first process argument. based on that argument, either locales
     are generated into a specified directory, or the default locale,
     base currencies or closure file is generated.

     This allows us to generate files with a Bazel genrule where
     we simply run the tool and specify the outputs. Note: It's
     necessary to have multiple modes because files live in separate
     locations. e.g. the default locale in `@angular/core`, but the
     rest in `@angular/common`.

  4. Removal of the `cldr-data-downloader` and custom CLDR resolution
     logic. Within Bazel we cannot run a downloader using network.

     We switch this to something more Bazel idiomatic with better
     caching. For this a new repository rule is introduced that
     downloads the CLDR JSON repository and extracts it. Within
     that rule we determine the supported locales so that they
     can be used to pre-declare outputs (for the locales) within
     Bazel analysis phase. This allows us to add the generated locale
     files to a `ts_library` (which we want to have for better testing,
     and consistent JS transpilation).

     Note that the removal of `cldr-data-downloader` also requires us to
     add logic for detecting locales without data. The CLDR data
     downloader overwrote the `availableLocales.json` file with a file
     that only lists locales that CLDR provides data for. We use the
     official `availableLocales` file CLDR provides, but filter out
     locales for which no data is available. This is needed until we
     update to CLDR 39 where data is available for all such locales
     listed in `availableLocales.json`.
Introduces a few Starlark macros for running the new Bazel
CLDR generation tool. Wires up the new tool so that locales
are generated properly. Also updates the existing
`closure-locale` file to match the new output generated by the Bazel tool.

This commit also re-adds a few locale files that aren't
generated by CLDR 37, but have been accidentally left in
the repository as the Gulp script never removed old locales
from previous CLDR versions. This problem is solved with the
Bazel generation of locale files, but for now we re-add these
old CLDR 33 locale files to not break developers relying on these
(even though the locale data indicies are incorrect; but there might
be users accessing the data directly)
Given that the locale files are now generated through
Bazel, the files are no longer checked-in and the
legacy TSC compilation fails due to imports resolving
to non-existent files. We fix this for the legacy
saucelabs job by copying the generated TS files into
the sources (which is acceptable for the isolated CI job)
The CLDR extraction tool has been reworked to run as part of Bazel.
This adds a initial readme explaining what the tool generates. It's
far from a detailed description but it can serve as foundation for more
detailed explanations.
In the past, the closure file has been generated so that all individual
locale files were imported individually. This resulted in a huge
slow-down in g3 due to the large amount of imports.

With 90bd984 this changed so that we
inline the locale data for the g3 closure locale file. Also the file
only contained data for locales being supported by Closure. For this a
list of locales has been extracted from Closure Compiler, as well as a
list of locale aliases.

This logic is prone to CLDR version updates, and also broke as part of
the Gulp -> Bazel migration where this logic has been slightly modified
but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3,
but the locale data had the name of the alias locale that provided the
data at index zero (which represents the locale name). Note that the
locale names at index zero always could differentiate from the requested
`goog.LOCALE` due to the aliasing logic. This just didn't come up before.

We simplify this logic by generating a `goog.LOCALE` case for all
locales CLDR provides data for. We don't need to bother about aliasing
because with the refactorings to the CLDR generation tool, all locales
are built (which also captures the aliases), and we can generate the locale
file on the fly (which has not been done before).
Within Google, closure compiler is used for dealing with translations.
We generate a closure-compatible locale file that allows for
registration within Angular, so that Closure i18n works well together
with Angular applications. Closure compiler does not limit its
locales to BCP47-canonical locale identifiers. This commit updates
the generation logic so that we also support deprecated (but aliased)
locale identifiers, or other aliases which are likely used within
Closure. We use CLDR's alias supplemental data for this. It instructs
us to alias `iw` to `he` for example. `iw` is still supported in Closure.

Note that we do not manually extract all locales supported in Closure;
instead we only support the CLDR canonical locales (as done before) +
common aliases that CLDR provides data for. We are not aware of other
locale aliases within Closure that wouldn't be part of the CLDR aliases.
If there would be, then Angular/Closure would fail accordingly.
With the refactoring from a Gulp task to a Bazel too, we tried switching
away from the hard-coded list of locales and aliases for the Closure
Locale file generation. After multiple attempts of landing this, it
turned out that Closure Compiler/Closure Library relies on locale
identifiers CLDR does not capture within it's `availableLocales.json`
or `aliases.json` data.

Closure Library does not use any unknown locale identifiers here. The
locale identifiers can be resolved within CLDR using the bundle lookup
algorithm that is specified as part of CLDR; instead the problem is that
the locale identifiers do not follow any reasonable pattern and
therefore it's extremely difficult to generate them automatically (it's
almost like we'd need to build up _all_ possible combinations). Instead
of doing that, we just use the hard-coded locales and aliases from the
old Closure Locale generation script.
@josephperrott josephperrott added action: merge and removed state: blocked labels Jul 16, 2021
@alxhub alxhub closed this in f2cd6de Jul 16, 2021
alxhub pushed a commit that referenced this issue Jul 16, 2021
Converts the CLDR locale extraction script to a Bazel tool.
This allows us to generate locale files within Bazel, so that
locales don't need to live as sources within the repo. Also
it allows us to get rid of the legacy Gulp tooling.

The migration of the Gulp script to a Bazel tool involved the
following things:

  1. Basic conversion of the `extract.js` script to TypeScript.
     This mostly was about adding explicit types. e.g. adding `locale:
     string` or `localeData: CldrStatic`.

  2. Split-up into separate files. Instead of keeping the large
     `extract.js` file, the tool has been split into separate files.
     The logic remains the same, just that code is more readable and
     maintainable.

  3. Introduction of a new `index.ts` file that is the entry-point
     for the Bazel tool. Previously the Gulp tool just generated
     all locale files, the default locale and base currency files
     at once. The new entry-point accepts a mode to be passed as
     first process argument. based on that argument, either locales
     are generated into a specified directory, or the default locale,
     base currencies or closure file is generated.

     This allows us to generate files with a Bazel genrule where
     we simply run the tool and specify the outputs. Note: It's
     necessary to have multiple modes because files live in separate
     locations. e.g. the default locale in `@angular/core`, but the
     rest in `@angular/common`.

  4. Removal of the `cldr-data-downloader` and custom CLDR resolution
     logic. Within Bazel we cannot run a downloader using network.

     We switch this to something more Bazel idiomatic with better
     caching. For this a new repository rule is introduced that
     downloads the CLDR JSON repository and extracts it. Within
     that rule we determine the supported locales so that they
     can be used to pre-declare outputs (for the locales) within
     Bazel analysis phase. This allows us to add the generated locale
     files to a `ts_library` (which we want to have for better testing,
     and consistent JS transpilation).

     Note that the removal of `cldr-data-downloader` also requires us to
     add logic for detecting locales without data. The CLDR data
     downloader overwrote the `availableLocales.json` file with a file
     that only lists locales that CLDR provides data for. We use the
     official `availableLocales` file CLDR provides, but filter out
     locales for which no data is available. This is needed until we
     update to CLDR 39 where data is available for all such locales
     listed in `availableLocales.json`.

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
Introduces a few Starlark macros for running the new Bazel
CLDR generation tool. Wires up the new tool so that locales
are generated properly. Also updates the existing
`closure-locale` file to match the new output generated by the Bazel tool.

This commit also re-adds a few locale files that aren't
generated by CLDR 37, but have been accidentally left in
the repository as the Gulp script never removed old locales
from previous CLDR versions. This problem is solved with the
Bazel generation of locale files, but for now we re-add these
old CLDR 33 locale files to not break developers relying on these
(even though the locale data indicies are incorrect; but there might
be users accessing the data directly)

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
Given that the locale files are now generated through
Bazel, the files are no longer checked-in and the
legacy TSC compilation fails due to imports resolving
to non-existent files. We fix this for the legacy
saucelabs job by copying the generated TS files into
the sources (which is acceptable for the isolated CI job)

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
The CLDR extraction tool has been reworked to run as part of Bazel.
This adds a initial readme explaining what the tool generates. It's
far from a detailed description but it can serve as foundation for more
detailed explanations.

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
In the past, the closure file has been generated so that all individual
locale files were imported individually. This resulted in a huge
slow-down in g3 due to the large amount of imports.

With 90bd984 this changed so that we
inline the locale data for the g3 closure locale file. Also the file
only contained data for locales being supported by Closure. For this a
list of locales has been extracted from Closure Compiler, as well as a
list of locale aliases.

This logic is prone to CLDR version updates, and also broke as part of
the Gulp -> Bazel migration where this logic has been slightly modified
but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3,
but the locale data had the name of the alias locale that provided the
data at index zero (which represents the locale name). Note that the
locale names at index zero always could differentiate from the requested
`goog.LOCALE` due to the aliasing logic. This just didn't come up before.

We simplify this logic by generating a `goog.LOCALE` case for all
locales CLDR provides data for. We don't need to bother about aliasing
because with the refactorings to the CLDR generation tool, all locales
are built (which also captures the aliases), and we can generate the locale
file on the fly (which has not been done before).

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
Within Google, closure compiler is used for dealing with translations.
We generate a closure-compatible locale file that allows for
registration within Angular, so that Closure i18n works well together
with Angular applications. Closure compiler does not limit its
locales to BCP47-canonical locale identifiers. This commit updates
the generation logic so that we also support deprecated (but aliased)
locale identifiers, or other aliases which are likely used within
Closure. We use CLDR's alias supplemental data for this. It instructs
us to alias `iw` to `he` for example. `iw` is still supported in Closure.

Note that we do not manually extract all locales supported in Closure;
instead we only support the CLDR canonical locales (as done before) +
common aliases that CLDR provides data for. We are not aware of other
locale aliases within Closure that wouldn't be part of the CLDR aliases.
If there would be, then Angular/Closure would fail accordingly.

PR Close #42230
alxhub pushed a commit that referenced this issue Jul 16, 2021
…#42230)

With the refactoring from a Gulp task to a Bazel too, we tried switching
away from the hard-coded list of locales and aliases for the Closure
Locale file generation. After multiple attempts of landing this, it
turned out that Closure Compiler/Closure Library relies on locale
identifiers CLDR does not capture within it's `availableLocales.json`
or `aliases.json` data.

Closure Library does not use any unknown locale identifiers here. The
locale identifiers can be resolved within CLDR using the bundle lookup
algorithm that is specified as part of CLDR; instead the problem is that
the locale identifiers do not follow any reasonable pattern and
therefore it's extremely difficult to generate them automatically (it's
almost like we'd need to build up _all_ possible combinations). Instead
of doing that, we just use the hard-coded locales and aliases from the
old Closure Locale generation script.

PR Close #42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
This is a pre-refactor commit allowing us to move
the CLDR locale generation to Bazel where files would
no longer be checked-in, except for the `closure-locale`
file that is synced into Google3.

PR Close angular#42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
…r#42230)

Converts the CLDR locale extraction script to a Bazel tool.
This allows us to generate locale files within Bazel, so that
locales don't need to live as sources within the repo. Also
it allows us to get rid of the legacy Gulp tooling.

The migration of the Gulp script to a Bazel tool involved the
following things:

  1. Basic conversion of the `extract.js` script to TypeScript.
     This mostly was about adding explicit types. e.g. adding `locale:
     string` or `localeData: CldrStatic`.

  2. Split-up into separate files. Instead of keeping the large
     `extract.js` file, the tool has been split into separate files.
     The logic remains the same, just that code is more readable and
     maintainable.

  3. Introduction of a new `index.ts` file that is the entry-point
     for the Bazel tool. Previously the Gulp tool just generated
     all locale files, the default locale and base currency files
     at once. The new entry-point accepts a mode to be passed as
     first process argument. based on that argument, either locales
     are generated into a specified directory, or the default locale,
     base currencies or closure file is generated.

     This allows us to generate files with a Bazel genrule where
     we simply run the tool and specify the outputs. Note: It's
     necessary to have multiple modes because files live in separate
     locations. e.g. the default locale in `@angular/core`, but the
     rest in `@angular/common`.

  4. Removal of the `cldr-data-downloader` and custom CLDR resolution
     logic. Within Bazel we cannot run a downloader using network.

     We switch this to something more Bazel idiomatic with better
     caching. For this a new repository rule is introduced that
     downloads the CLDR JSON repository and extracts it. Within
     that rule we determine the supported locales so that they
     can be used to pre-declare outputs (for the locales) within
     Bazel analysis phase. This allows us to add the generated locale
     files to a `ts_library` (which we want to have for better testing,
     and consistent JS transpilation).

     Note that the removal of `cldr-data-downloader` also requires us to
     add logic for detecting locales without data. The CLDR data
     downloader overwrote the `availableLocales.json` file with a file
     that only lists locales that CLDR provides data for. We use the
     official `availableLocales` file CLDR provides, but filter out
     locales for which no data is available. This is needed until we
     update to CLDR 39 where data is available for all such locales
     listed in `availableLocales.json`.

PR Close angular#42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
Introduces a few Starlark macros for running the new Bazel
CLDR generation tool. Wires up the new tool so that locales
are generated properly. Also updates the existing
`closure-locale` file to match the new output generated by the Bazel tool.

This commit also re-adds a few locale files that aren't
generated by CLDR 37, but have been accidentally left in
the repository as the Gulp script never removed old locales
from previous CLDR versions. This problem is solved with the
Bazel generation of locale files, but for now we re-add these
old CLDR 33 locale files to not break developers relying on these
(even though the locale data indicies are incorrect; but there might
be users accessing the data directly)

PR Close angular#42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
Given that the locale files are now generated through
Bazel, the files are no longer checked-in and the
legacy TSC compilation fails due to imports resolving
to non-existent files. We fix this for the legacy
saucelabs job by copying the generated TS files into
the sources (which is acceptable for the isolated CI job)

PR Close angular#42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
The CLDR extraction tool has been reworked to run as part of Bazel.
This adds a initial readme explaining what the tool generates. It's
far from a detailed description but it can serve as foundation for more
detailed explanations.

PR Close angular#42230
TeriGlover pushed a commit to TeriGlover/angular that referenced this issue Jul 21, 2021
In the past, the closure file has been generated so that all individual
locale files were imported individually. This resulted in a huge
slow-down in g3 due to the large amount of imports.

With 90bd984 this changed so that we
inline the locale data for the g3 closure locale file. Also the file
only contained data for locales being supported by Closure. For this a
list of locales has been extracted from Closure Compiler, as well as a
list of locale aliases.

This logic is prone to CLDR version updates, and also broke as part of
the Gulp -> Bazel migration where this logic has been slightly modified
but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3,
but the locale data had the name of the alias locale that provided the
data at index zero (which represents the locale name). Note that the
locale names at index zero always could differentiate from the requested
`goog.LOCALE` due to the aliasing logic. This just didn't come up before.

We simplify this logic by generating a `goog.LOCALE` case for all
locales CLDR provides data for. We don't need to bother about aliasing
because with the refactorings to the CLDR generation tool, all locales
are built (which also captures the aliases), and we can generate the locale
file on the fly (which has not been done before).

PR Close angular#42230
@angular-automatic-lock-bot
Copy link

@angular-automatic-lock-bot angular-automatic-lock-bot bot commented Aug 16, 2021

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Aug 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge cla: yes comp: build & ci risk: high target: minor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants