New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: convert CLDR locale extraction from Gulp to Bazel tool #42230
Conversation
ae7cff0
to
56fcaa6
Compare
Fantastic work @devversion! I love the way you can skip from TS to Skylark and back so fluently. Here is my initial feedback. No serious blockers that I can see. Mainly just a bunch of NITs and suggestions.
packages/common/locales/generate-locales-tool/array-deduplication.ts
Outdated
Show resolved
Hide resolved
packages/common/locales/generate-locales-tool/license-header.ts
Outdated
Show resolved
Hide resolved
85b465c
to
b2dcca8
Compare
@petebacondarwin Thanks for reviewing so thoroughly! I've addressed the feedback. While doing the requested cleanup of a few things within the old Previously, we generated two objects of base currencies. Once for the |
Great work! I really like the changes you made to the binaries.
packages/common/locales/generate-locales-tool/bin/write-locale-files-to-dist.ts
Outdated
Show resolved
Hide resolved
packages/common/locales/generate-locales-tool/object-stringify.ts
Outdated
Show resolved
Hide resolved
@devversion looks awesome!
Thanks for documenting the logic, this is really helpful. While you have all the context, I was thinking that it might also be beneficial to add a README.md file (into the packages/common/locales/generate-locales-tool
folder) to describe some aspects of the process and optmiizations from high-level perspective (while keeping the details in the code as well).
@petebacondarwin Thanks! Addressed the feedback. |
@devversion great, thanks for adding the readme file |
FYI, removing from the merge queue for now (by removing the "merge" label), since some extra checks should be performed in g3 before we can land this change. |
This is a pre-refactor commit allowing us to move the CLDR locale generation to Bazel where files would no longer be checked-in, except for the `closure-locale` file that is synced into Google3.
Converts the CLDR locale extraction script to a Bazel tool. This allows us to generate locale files within Bazel, so that locales don't need to live as sources within the repo. Also it allows us to get rid of the legacy Gulp tooling. The migration of the Gulp script to a Bazel tool involved the following things: 1. Basic conversion of the `extract.js` script to TypeScript. This mostly was about adding explicit types. e.g. adding `locale: string` or `localeData: CldrStatic`. 2. Split-up into separate files. Instead of keeping the large `extract.js` file, the tool has been split into separate files. The logic remains the same, just that code is more readable and maintainable. 3. Introduction of a new `index.ts` file that is the entry-point for the Bazel tool. Previously the Gulp tool just generated all locale files, the default locale and base currency files at once. The new entry-point accepts a mode to be passed as first process argument. based on that argument, either locales are generated into a specified directory, or the default locale, base currencies or closure file is generated. This allows us to generate files with a Bazel genrule where we simply run the tool and specify the outputs. Note: It's necessary to have multiple modes because files live in separate locations. e.g. the default locale in `@angular/core`, but the rest in `@angular/common`. 4. Removal of the `cldr-data-downloader` and custom CLDR resolution logic. Within Bazel we cannot run a downloader using network. We switch this to something more Bazel idiomatic with better caching. For this a new repository rule is introduced that downloads the CLDR JSON repository and extracts it. Within that rule we determine the supported locales so that they can be used to pre-declare outputs (for the locales) within Bazel analysis phase. This allows us to add the generated locale files to a `ts_library` (which we want to have for better testing, and consistent JS transpilation). Note that the removal of `cldr-data-downloader` also requires us to add logic for detecting locales without data. The CLDR data downloader overwrote the `availableLocales.json` file with a file that only lists locales that CLDR provides data for. We use the official `availableLocales` file CLDR provides, but filter out locales for which no data is available. This is needed until we update to CLDR 39 where data is available for all such locales listed in `availableLocales.json`.
Introduces a few Starlark macros for running the new Bazel CLDR generation tool. Wires up the new tool so that locales are generated properly. Also updates the existing `closure-locale` file to match the new output generated by the Bazel tool. This commit also re-adds a few locale files that aren't generated by CLDR 37, but have been accidentally left in the repository as the Gulp script never removed old locales from previous CLDR versions. This problem is solved with the Bazel generation of locale files, but for now we re-add these old CLDR 33 locale files to not break developers relying on these (even though the locale data indicies are incorrect; but there might be users accessing the data directly)
Given that the locale files are now generated through Bazel, the files are no longer checked-in and the legacy TSC compilation fails due to imports resolving to non-existent files. We fix this for the legacy saucelabs job by copying the generated TS files into the sources (which is acceptable for the isolated CI job)
The CLDR extraction tool has been reworked to run as part of Bazel. This adds a initial readme explaining what the tool generates. It's far from a detailed description but it can serve as foundation for more detailed explanations.
In the past, the closure file has been generated so that all individual locale files were imported individually. This resulted in a huge slow-down in g3 due to the large amount of imports. With 90bd984 this changed so that we inline the locale data for the g3 closure locale file. Also the file only contained data for locales being supported by Closure. For this a list of locales has been extracted from Closure Compiler, as well as a list of locale aliases. This logic is prone to CLDR version updates, and also broke as part of the Gulp -> Bazel migration where this logic has been slightly modified but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3, but the locale data had the name of the alias locale that provided the data at index zero (which represents the locale name). Note that the locale names at index zero always could differentiate from the requested `goog.LOCALE` due to the aliasing logic. This just didn't come up before. We simplify this logic by generating a `goog.LOCALE` case for all locales CLDR provides data for. We don't need to bother about aliasing because with the refactorings to the CLDR generation tool, all locales are built (which also captures the aliases), and we can generate the locale file on the fly (which has not been done before).
Within Google, closure compiler is used for dealing with translations. We generate a closure-compatible locale file that allows for registration within Angular, so that Closure i18n works well together with Angular applications. Closure compiler does not limit its locales to BCP47-canonical locale identifiers. This commit updates the generation logic so that we also support deprecated (but aliased) locale identifiers, or other aliases which are likely used within Closure. We use CLDR's alias supplemental data for this. It instructs us to alias `iw` to `he` for example. `iw` is still supported in Closure. Note that we do not manually extract all locales supported in Closure; instead we only support the CLDR canonical locales (as done before) + common aliases that CLDR provides data for. We are not aware of other locale aliases within Closure that wouldn't be part of the CLDR aliases. If there would be, then Angular/Closure would fail accordingly.
With the refactoring from a Gulp task to a Bazel too, we tried switching away from the hard-coded list of locales and aliases for the Closure Locale file generation. After multiple attempts of landing this, it turned out that Closure Compiler/Closure Library relies on locale identifiers CLDR does not capture within it's `availableLocales.json` or `aliases.json` data. Closure Library does not use any unknown locale identifiers here. The locale identifiers can be resolved within CLDR using the bundle lookup algorithm that is specified as part of CLDR; instead the problem is that the locale identifiers do not follow any reasonable pattern and therefore it's extremely difficult to generate them automatically (it's almost like we'd need to build up _all_ possible combinations). Instead of doing that, we just use the hard-coded locales and aliases from the old Closure Locale generation script.
9cf4ab0
to
df40c39
Compare
Converts the CLDR locale extraction script to a Bazel tool. This allows us to generate locale files within Bazel, so that locales don't need to live as sources within the repo. Also it allows us to get rid of the legacy Gulp tooling. The migration of the Gulp script to a Bazel tool involved the following things: 1. Basic conversion of the `extract.js` script to TypeScript. This mostly was about adding explicit types. e.g. adding `locale: string` or `localeData: CldrStatic`. 2. Split-up into separate files. Instead of keeping the large `extract.js` file, the tool has been split into separate files. The logic remains the same, just that code is more readable and maintainable. 3. Introduction of a new `index.ts` file that is the entry-point for the Bazel tool. Previously the Gulp tool just generated all locale files, the default locale and base currency files at once. The new entry-point accepts a mode to be passed as first process argument. based on that argument, either locales are generated into a specified directory, or the default locale, base currencies or closure file is generated. This allows us to generate files with a Bazel genrule where we simply run the tool and specify the outputs. Note: It's necessary to have multiple modes because files live in separate locations. e.g. the default locale in `@angular/core`, but the rest in `@angular/common`. 4. Removal of the `cldr-data-downloader` and custom CLDR resolution logic. Within Bazel we cannot run a downloader using network. We switch this to something more Bazel idiomatic with better caching. For this a new repository rule is introduced that downloads the CLDR JSON repository and extracts it. Within that rule we determine the supported locales so that they can be used to pre-declare outputs (for the locales) within Bazel analysis phase. This allows us to add the generated locale files to a `ts_library` (which we want to have for better testing, and consistent JS transpilation). Note that the removal of `cldr-data-downloader` also requires us to add logic for detecting locales without data. The CLDR data downloader overwrote the `availableLocales.json` file with a file that only lists locales that CLDR provides data for. We use the official `availableLocales` file CLDR provides, but filter out locales for which no data is available. This is needed until we update to CLDR 39 where data is available for all such locales listed in `availableLocales.json`. PR Close #42230
Introduces a few Starlark macros for running the new Bazel CLDR generation tool. Wires up the new tool so that locales are generated properly. Also updates the existing `closure-locale` file to match the new output generated by the Bazel tool. This commit also re-adds a few locale files that aren't generated by CLDR 37, but have been accidentally left in the repository as the Gulp script never removed old locales from previous CLDR versions. This problem is solved with the Bazel generation of locale files, but for now we re-add these old CLDR 33 locale files to not break developers relying on these (even though the locale data indicies are incorrect; but there might be users accessing the data directly) PR Close #42230
Given that the locale files are now generated through Bazel, the files are no longer checked-in and the legacy TSC compilation fails due to imports resolving to non-existent files. We fix this for the legacy saucelabs job by copying the generated TS files into the sources (which is acceptable for the isolated CI job) PR Close #42230
The CLDR extraction tool has been reworked to run as part of Bazel. This adds a initial readme explaining what the tool generates. It's far from a detailed description but it can serve as foundation for more detailed explanations. PR Close #42230
In the past, the closure file has been generated so that all individual locale files were imported individually. This resulted in a huge slow-down in g3 due to the large amount of imports. With 90bd984 this changed so that we inline the locale data for the g3 closure locale file. Also the file only contained data for locales being supported by Closure. For this a list of locales has been extracted from Closure Compiler, as well as a list of locale aliases. This logic is prone to CLDR version updates, and also broke as part of the Gulp -> Bazel migration where this logic has been slightly modified but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3, but the locale data had the name of the alias locale that provided the data at index zero (which represents the locale name). Note that the locale names at index zero always could differentiate from the requested `goog.LOCALE` due to the aliasing logic. This just didn't come up before. We simplify this logic by generating a `goog.LOCALE` case for all locales CLDR provides data for. We don't need to bother about aliasing because with the refactorings to the CLDR generation tool, all locales are built (which also captures the aliases), and we can generate the locale file on the fly (which has not been done before). PR Close #42230
Within Google, closure compiler is used for dealing with translations. We generate a closure-compatible locale file that allows for registration within Angular, so that Closure i18n works well together with Angular applications. Closure compiler does not limit its locales to BCP47-canonical locale identifiers. This commit updates the generation logic so that we also support deprecated (but aliased) locale identifiers, or other aliases which are likely used within Closure. We use CLDR's alias supplemental data for this. It instructs us to alias `iw` to `he` for example. `iw` is still supported in Closure. Note that we do not manually extract all locales supported in Closure; instead we only support the CLDR canonical locales (as done before) + common aliases that CLDR provides data for. We are not aware of other locale aliases within Closure that wouldn't be part of the CLDR aliases. If there would be, then Angular/Closure would fail accordingly. PR Close #42230
…#42230) With the refactoring from a Gulp task to a Bazel too, we tried switching away from the hard-coded list of locales and aliases for the Closure Locale file generation. After multiple attempts of landing this, it turned out that Closure Compiler/Closure Library relies on locale identifiers CLDR does not capture within it's `availableLocales.json` or `aliases.json` data. Closure Library does not use any unknown locale identifiers here. The locale identifiers can be resolved within CLDR using the bundle lookup algorithm that is specified as part of CLDR; instead the problem is that the locale identifiers do not follow any reasonable pattern and therefore it's extremely difficult to generate them automatically (it's almost like we'd need to build up _all_ possible combinations). Instead of doing that, we just use the hard-coded locales and aliases from the old Closure Locale generation script. PR Close #42230
This is a pre-refactor commit allowing us to move the CLDR locale generation to Bazel where files would no longer be checked-in, except for the `closure-locale` file that is synced into Google3. PR Close angular#42230
…r#42230) Converts the CLDR locale extraction script to a Bazel tool. This allows us to generate locale files within Bazel, so that locales don't need to live as sources within the repo. Also it allows us to get rid of the legacy Gulp tooling. The migration of the Gulp script to a Bazel tool involved the following things: 1. Basic conversion of the `extract.js` script to TypeScript. This mostly was about adding explicit types. e.g. adding `locale: string` or `localeData: CldrStatic`. 2. Split-up into separate files. Instead of keeping the large `extract.js` file, the tool has been split into separate files. The logic remains the same, just that code is more readable and maintainable. 3. Introduction of a new `index.ts` file that is the entry-point for the Bazel tool. Previously the Gulp tool just generated all locale files, the default locale and base currency files at once. The new entry-point accepts a mode to be passed as first process argument. based on that argument, either locales are generated into a specified directory, or the default locale, base currencies or closure file is generated. This allows us to generate files with a Bazel genrule where we simply run the tool and specify the outputs. Note: It's necessary to have multiple modes because files live in separate locations. e.g. the default locale in `@angular/core`, but the rest in `@angular/common`. 4. Removal of the `cldr-data-downloader` and custom CLDR resolution logic. Within Bazel we cannot run a downloader using network. We switch this to something more Bazel idiomatic with better caching. For this a new repository rule is introduced that downloads the CLDR JSON repository and extracts it. Within that rule we determine the supported locales so that they can be used to pre-declare outputs (for the locales) within Bazel analysis phase. This allows us to add the generated locale files to a `ts_library` (which we want to have for better testing, and consistent JS transpilation). Note that the removal of `cldr-data-downloader` also requires us to add logic for detecting locales without data. The CLDR data downloader overwrote the `availableLocales.json` file with a file that only lists locales that CLDR provides data for. We use the official `availableLocales` file CLDR provides, but filter out locales for which no data is available. This is needed until we update to CLDR 39 where data is available for all such locales listed in `availableLocales.json`. PR Close angular#42230
Introduces a few Starlark macros for running the new Bazel CLDR generation tool. Wires up the new tool so that locales are generated properly. Also updates the existing `closure-locale` file to match the new output generated by the Bazel tool. This commit also re-adds a few locale files that aren't generated by CLDR 37, but have been accidentally left in the repository as the Gulp script never removed old locales from previous CLDR versions. This problem is solved with the Bazel generation of locale files, but for now we re-add these old CLDR 33 locale files to not break developers relying on these (even though the locale data indicies are incorrect; but there might be users accessing the data directly) PR Close angular#42230
Given that the locale files are now generated through Bazel, the files are no longer checked-in and the legacy TSC compilation fails due to imports resolving to non-existent files. We fix this for the legacy saucelabs job by copying the generated TS files into the sources (which is acceptable for the isolated CI job) PR Close angular#42230
The CLDR extraction tool has been reworked to run as part of Bazel. This adds a initial readme explaining what the tool generates. It's far from a detailed description but it can serve as foundation for more detailed explanations. PR Close angular#42230
In the past, the closure file has been generated so that all individual locale files were imported individually. This resulted in a huge slow-down in g3 due to the large amount of imports. With 90bd984 this changed so that we inline the locale data for the g3 closure locale file. Also the file only contained data for locales being supported by Closure. For this a list of locales has been extracted from Closure Compiler, as well as a list of locale aliases. This logic is prone to CLDR version updates, and also broke as part of the Gulp -> Bazel migration where this logic has been slightly modified but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3, but the locale data had the name of the alias locale that provided the data at index zero (which represents the locale name). Note that the locale names at index zero always could differentiate from the requested `goog.LOCALE` due to the aliasing logic. This just didn't come up before. We simplify this logic by generating a `goog.LOCALE` case for all locales CLDR provides data for. We don't need to bother about aliasing because with the refactorings to the CLDR generation tool, all locales are built (which also captures the aliases), and we can generate the locale file on the fly (which has not been done before). PR Close angular#42230
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
See individual commits.
This PR initially contained the update to CLDR 39 too, but I've decided to switch back to CLDR 37 so that
we can land this (without having to wait for a new major). I've also compared old generated files with the new
generated files to make sure nothing is missing/changed in terms of the locale data.