DOC Ensures that load_boston passes numpydoc validation #22247

PurnaChandraMansingh · 2022-01-20T15:08:21Z

Reference Issues/PRs

Addresses #21350

What does this implement/fix? Explain your changes.

This PR ensures that sklearn.datasets._base.load_boston passes numpydoc validation

Removed the function from FUNCTION_DOCSTRING_IGNORE_LIST
Verify that all tests are passed

Any other comments?

My first PR to my favorite Library! : )
Thank You!

cc: (pair programming partner) @Ramanand-Yadav

…NORE_LIST and ensures that it passes numpydoc validation

PurnaChandraMansingh · 2022-01-20T16:33:21Z

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston"
I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

Since there is no error message I created this PR and now I am getting this error
Could you please help me with this
Thanks

PurnaChandraMansingh · 2022-01-20T18:19:10Z

Hi @thomasjpfan
I was missing some important packages but now I got them installed.
Now I am getting the desired error messages and am I working on them.

Thanks

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston" I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg plugins: cov-3.0.0 collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

Since there is no error message I created this PR and now I am getting this error Could you please help me with this Thanks

jmloyola · 2022-01-20T23:18:50Z

Thanks for the PR, @PurnaChandraMansingh!
It's great that you found what the cause error was.
Now, you should see what are the problems with the documentation.
Let me know if you need a hand.

PurnaChandraMansingh · 2022-01-21T03:16:30Z

Hi @jmloyola
Thanks for your response.

After installing all the dependencies properly I found there are 3 problems with the documentation. i.e

GL03: Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
GL09: Deprecation warning should precede extended summary
RT03: Return value has no description

I fixed GL03 & RT09 error codes successfully but now got stuck at GL09: Deprecation warning should precede extended summary.
It is in sklearn.datasets._base.load_boston
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/datasets/_base.py

Line: 1180

I also referred to the numpydoc documentation
https://numpydoc.readthedocs.io/en/latest/format.html#sections
https://numpydoc.readthedocs.io/en/latest/format.html#deprecation-warning
still, I am trying my best to fix this message asap but don't know what am I missing.

I will be very grateful if you can guide me on this case.
Thanks

jmloyola · 2022-01-21T13:27:30Z

You can read more about the problem with GL09 here.

You can do something similar as they did in that PR:

remove the .. deprecated directive;
add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

We are showing a deprecation message two times here. After you do that, run pytest again to check that everything is OK. You can also build the documentation and see how it looks.

In the future, when you face an error, print the message here so it is easier for us to help you ^^

PurnaChandraMansingh · 2022-01-21T15:02:20Z

Thanks for the help @jmloyola

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

In the future, when you face an error, print the message here so it is easier for us to help you ^^

ogrisel · 2022-01-21T15:23:14Z

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

ogrisel · 2022-01-21T15:29:47Z

Note: there is also a failure related to black:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=37188&view=logs&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=fc67071d-c3d4-58b8-d38e-cafc0d3c731a

Run the black command locally to fix the formatting of this file, then commit and push again:

pip install black==21.6b0
black sklearn/datasets/_base.py

PurnaChandraMansingh · 2022-01-21T15:34:55Z

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

Sure @ogrisel

ogrisel · 2022-01-21T15:36:26Z

@thomasjpfan I opened #22264 to fix the problem with the check-changelog linter.

… from that directive into the @deprecate decorator

PurnaChandraMansingh · 2022-01-21T16:23:09Z

You can also build the documentation and see how it looks.

Is there any way to build documentation for a single file only (the file I changed) ?

remove the .. deprecated directive;

add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

I did the same and now it passed pytest, flake8 and black.
So pushing them again.

jmloyola · 2022-01-21T19:04:04Z

Is there any way to build documentation for a single file only (the file I changed) ?

You can not do that (as far as I know). Remember that the documentation has hyperlinks connecting each module together.

I did the same and now it passed pytest, flake8 and black.

I would leave the warning below the description of the function. Something like this:

@deprecated(r"""`load_boston` is deprecated in 1.0 and will be removed in 1.2.""")
def load_boston(*, return_X_y=False):
    r"""Load and return the Boston house-prices dataset (regression).

    ==============   ==============
    Samples total               506
    Dimensionality               13
    Features         real, positive
    Targets           real 5. - 50.
    ==============   ==============

    Read more in the :ref:`User Guide <boston_dataset>`.

    .. warning::
        The Boston housing prices dataset has an ethical problem: as
        investigated in [1]_, the authors of this dataset engineered a
        non-invertible variable "B" assuming that racial self-segregation had a
        positive impact on house prices [2]_. Furthermore the goal of the
        research that led to the creation of this dataset was to study the
        impact of air quality but it did not give adequate demonstration of the
        validity of this assumption.

        The scikit-learn maintainers therefore strongly discourage the use of
        this dataset unless the purpose of the code is to study and educate
        about ethical issues in data science and machine learning.

        In this special case, you can fetch the dataset from the original
        source::

            import pandas as pd  # doctest: +SKIP
            import numpy as np

            data_url = "http://lib.stat.cmu.edu/datasets/boston"
            raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
            data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
            target = raw_df.values[1::2, 2]

        Alternative datasets include the California housing dataset [3]_
        (i.e. :func:`~sklearn.datasets.fetch_california_housing`) and Ames
        housing dataset [4]_. You can load the datasets as follows::

            from sklearn.datasets import fetch_california_housing
            housing = fetch_california_housing()

        for the California housing dataset and::

            from sklearn.datasets import fetch_openml
            housing = fetch_openml(name="house_prices", as_frame=True)  # noqa

        for the Ames housing dataset.

    Parameters
    ----------

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

PurnaChandraMansingh · 2022-01-21T19:27:50Z

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

Thanks, @jmloyola I am doing these changes now.

PurnaChandraMansingh · 2022-01-21T19:40:02Z

I would leave the warning below the description of the function. Something like this:

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

All done 😊
Please have a look @jmloyola

jmloyola · 2022-01-21T21:22:46Z

Now the test test_load_boston_warning is failing :(

We forgot to run pytest there to catch it.

@ogrisel, @thomasjpfan, @jjerphan, what do you think? How should we handle this? Beforehand, the information was duplicated. One time in the @deprecated decorator and once again in the .. warning directive.

Should we change test_load_boston_warning, leave the warning information in the decorator, or duplicate the warnings (leave as it was before). I don't like the second option because the description of the functions is not at the top of the page.

thomasjpfan · 2022-01-22T14:16:30Z

I think the warning message is important enough to remain in both the FutureWarning and the docstring.

Moving forward, I think we can remove the .. warning directive and keep the content in @deprecated.
Looking at the render on main, the whole docstring became the warning because of how @deprecated auto-inserted the warning message.

PurnaChandraMansingh · 2022-01-24T07:30:02Z

Hi @thomasjpfan, @jmloyola
As per your instruction I kept the message in both @deprecated decorator and .. warning directive.
After this I had generated the html file and below is the output, kindly have look on this and kindly let me know if it's ok ro merge or I need to do any modification.
Thanks

jjerphan

LGTM.

jmloyola · 2022-01-24T12:02:35Z

LGTM

thomasjpfan

LGTM

PurnaChandraMansingh · 2022-01-24T15:04:30Z

Thank you so much @thomasjpfan 😊
It's my first contribution my most favorite library scikit-learn ❤
It was a great experience and learned a lot about how opensource, GitHub, and code base works.
Thank you so much to open this opportunity for new contributors like me.
also a lot of thanks to all the friendly members of this community specially @jmloyola, @jjerphan @ogrisel, @atharvapatil123, @genvalen, you all helped me a lot and cleared all my doubts.

As per the rule mentioned in #21350 I have successfully completed 3 PRs. could you please suggest any other issue to contribute or shall I continue my contribution to #21350?
Thanks 🙏

removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IG…

4fc0d29

…NORE_LIST and ensures that it passes numpydoc validation

github-actions bot added the Documentation label Jan 20, 2022

PurnaChandraMansingh added 3 commits Jan 20, 2022

Fixed - GL03: Double line break found

98e84f1

Fixed - RT03: Return value has no description

29cde29

Merge remote-tracking branch 'upstream/main' into docstrings

10c494c

PurnaChandraMansingh added 2 commits Jan 21, 2022

Merge remote-tracking branch 'upstream/main' into docstrings

c54cf08

Merge remote-tracking branch 'upstream/main' into docstrings

6f6c255

github-actions bot added the module:datasets label Jan 21, 2022

Removed white space at line 1254

ccff8da

thomasjpfan added the No Changelog Needed label Jan 21, 2022

ogrisel mentioned this pull request Jan 21, 2022

FIX typo in .github/workflows/check-changelog.yml #22264

Merged

removed the .. deprecated directive and added the missing information…

ff20abf

… from that directive into the @deprecate decorator

Merge remote-tracking branch 'upstream/main' into docstrings

8665dda

PurnaChandraMansingh added 2 commits Jan 21, 2022

leave the warning below the description of the function

9aa1540

Merge remote-tracking branch 'upstream/main' into docstrings

138c1b0

PurnaChandraMansingh added 2 commits Jan 24, 2022

kept the warning message in both @deprecated decorator and the docstring

8700118

Merge remote-tracking branch 'upstream/main' into docstrings

77045ec

jjerphan approved these changes Jan 24, 2022

View changes

thomasjpfan approved these changes Jan 24, 2022

View changes

thomasjpfan merged commit 6bdfb9a into scikit-learn:main Jan 24, 2022
31 checks passed

scikit-learn / scikit-learn Public

DOC Ensures that load_boston passes numpydoc validation #22247

DOC Ensures that load_boston passes numpydoc validation #22247

PurnaChandraMansingh commented Jan 20, 2022

PurnaChandraMansingh commented Jan 20, 2022

PurnaChandraMansingh commented Jan 20, 2022

jmloyola commented Jan 20, 2022

PurnaChandraMansingh commented Jan 21, 2022 •

edited

jmloyola commented Jan 21, 2022 •

edited

PurnaChandraMansingh commented Jan 21, 2022

ogrisel commented Jan 21, 2022

ogrisel commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

ogrisel commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

jmloyola commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022 •

edited

jmloyola commented Jan 21, 2022

thomasjpfan commented Jan 22, 2022

PurnaChandraMansingh commented Jan 24, 2022

jjerphan left a comment

jmloyola commented Jan 24, 2022

thomasjpfan left a comment

PurnaChandraMansingh commented Jan 24, 2022

scikit-learn / scikit-learn Public

DOC Ensures that load_boston passes numpydoc validation #22247

DOC Ensures that load_boston passes numpydoc validation #22247

Conversation

PurnaChandraMansingh commented Jan 20, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

PurnaChandraMansingh commented Jan 20, 2022

PurnaChandraMansingh commented Jan 20, 2022

jmloyola commented Jan 20, 2022

PurnaChandraMansingh commented Jan 21, 2022 • edited

jmloyola commented Jan 21, 2022 • edited

PurnaChandraMansingh commented Jan 21, 2022

ogrisel commented Jan 21, 2022

ogrisel commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

ogrisel commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

jmloyola commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022

PurnaChandraMansingh commented Jan 21, 2022 • edited

jmloyola commented Jan 21, 2022

thomasjpfan commented Jan 22, 2022

PurnaChandraMansingh commented Jan 24, 2022

jjerphan left a comment

jmloyola commented Jan 24, 2022

thomasjpfan left a comment

PurnaChandraMansingh commented Jan 24, 2022

PurnaChandraMansingh commented Jan 21, 2022 •

edited

jmloyola commented Jan 21, 2022 •

edited

PurnaChandraMansingh commented Jan 21, 2022 •

edited