Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Ensures that load_boston passes numpydoc validation #22247

Merged
merged 13 commits into from Jan 24, 2022

Conversation

@PurnaChandraMansingh
Copy link
Contributor

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 20, 2022

Reference Issues/PRs

Addresses #21350

What does this implement/fix? Explain your changes.

This PR ensures that sklearn.datasets._base.load_boston passes numpydoc validation

  • Removed the function from FUNCTION_DOCSTRING_IGNORE_LIST
  • Verify that all tests are passed

Any other comments?

My first PR to my favorite Library! : )
Thank You!

cc: (pair programming partner) @Ramanand-Yadav

…NORE_LIST and ensures that it passes numpydoc validation
@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 20, 2022

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston"
I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

image

Since there is no error message I created this PR and now I am getting this error
Could you please help me with this
Thanks

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 20, 2022

Hi @thomasjpfan
I was missing some important packages but now I got them installed.
Now I am getting the desired error messages and am I working on them.

Thanks

Hi @thomasjpfan

I followed the instructions given in Addresses #21350

I removed sklearn.datasets._base.load_boston from FUNCTION_DOCSTRING_IGNORE_LIST

After runing "pytest sklearn/tests/test_docstrings.py -k sklearn.datasets._base.load_boston" I got the below result

platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 rootdir: C:\Users\Purna\Downloads\OpenSource\scikit-learn, configfile: setup.cfg plugins: cov-3.0.0 collected 0 items / 2 skipped

======================== 2 skipped in 0.09s ========================

I also run "pytest sklearn" and got the below result

image

Since there is no error message I created this PR and now I am getting this error Could you please help me with this Thanks

@jmloyola
Copy link
Member

@jmloyola jmloyola commented Jan 20, 2022

Thanks for the PR, @PurnaChandraMansingh!
It's great that you found what the cause error was.
Now, you should see what are the problems with the documentation.
Let me know if you need a hand.

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

Hi @jmloyola
Thanks for your response.

After installing all the dependencies properly I found there are 3 problems with the documentation. i.e

  • GL03: Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
  • GL09: Deprecation warning should precede extended summary
  • RT03: Return value has no description

I fixed GL03 & RT09 error codes successfully but now got stuck at GL09: Deprecation warning should precede extended summary.
It is in sklearn.datasets._base.load_boston
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/datasets/_base.py

Line: 1180

I also referred to the numpydoc documentation
https://numpydoc.readthedocs.io/en/latest/format.html#sections
https://numpydoc.readthedocs.io/en/latest/format.html#deprecation-warning
still, I am trying my best to fix this message asap but don't know what am I missing.

I will be very grateful if you can guide me on this case.
Thanks

@jmloyola
Copy link
Member

@jmloyola jmloyola commented Jan 21, 2022

You can read more about the problem with GL09 here.

You can do something similar as they did in that PR:

  • remove the .. deprecated directive;
  • add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

We are showing a deprecation message two times here. After you do that, run pytest again to check that everything is OK. You can also build the documentation and see how it looks.

In the future, when you face an error, print the message here so it is easier for us to help you ^^

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

Thanks for the help @jmloyola

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

In the future, when you face an error, print the message here so it is easier for us to help you ^^

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jan 21, 2022

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jan 21, 2022

Note: there is also a failure related to black:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=37188&view=logs&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=fc67071d-c3d4-58b8-d38e-cafc0d3c731a

Run the black command locally to fix the formatting of this file, then commit and push again:

pip install black==21.6b0
black sklearn/datasets/_base.py

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

Actually, I am completely new to this git interface so a bit confused, I will keep this in my mind for future conversation.
btw Is there any active channel for scikit-learn where I can ask my doubts?

It's better to have a focused discussion on your github PR itself rather than a generic channel if you need still help related to this PR. Feel free to ask more precise questions if @jmloyola answers above are not enough.

Sure @ogrisel

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Jan 21, 2022

@thomasjpfan I opened #22264 to fix the problem with the check-changelog linter.

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

You can also build the documentation and see how it looks.

Is there any way to build documentation for a single file only (the file I changed) ?

  • remove the .. deprecated directive;
  • add the missing information from that directive into the @deprecate decorator ("See the warning message below for further details regarding the alternative datasets.").

I did the same and now it passed pytest, flake8 and black.
So pushing them again.

@jmloyola
Copy link
Member

@jmloyola jmloyola commented Jan 21, 2022

Is there any way to build documentation for a single file only (the file I changed) ?

You can not do that (as far as I know). Remember that the documentation has hyperlinks connecting each module together.

I did the same and now it passed pytest, flake8 and black.

I would leave the warning below the description of the function. Something like this:

@deprecated(r"""`load_boston` is deprecated in 1.0 and will be removed in 1.2.""")
def load_boston(*, return_X_y=False):
    r"""Load and return the Boston house-prices dataset (regression).

    ==============   ==============
    Samples total               506
    Dimensionality               13
    Features         real, positive
    Targets           real 5. - 50.
    ==============   ==============

    Read more in the :ref:`User Guide <boston_dataset>`.

    .. warning::
        The Boston housing prices dataset has an ethical problem: as
        investigated in [1]_, the authors of this dataset engineered a
        non-invertible variable "B" assuming that racial self-segregation had a
        positive impact on house prices [2]_. Furthermore the goal of the
        research that led to the creation of this dataset was to study the
        impact of air quality but it did not give adequate demonstration of the
        validity of this assumption.

        The scikit-learn maintainers therefore strongly discourage the use of
        this dataset unless the purpose of the code is to study and educate
        about ethical issues in data science and machine learning.

        In this special case, you can fetch the dataset from the original
        source::

            import pandas as pd  # doctest: +SKIP
            import numpy as np

            data_url = "http://lib.stat.cmu.edu/datasets/boston"
            raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
            data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
            target = raw_df.values[1::2, 2]

        Alternative datasets include the California housing dataset [3]_
        (i.e. :func:`~sklearn.datasets.fetch_california_housing`) and Ames
        housing dataset [4]_. You can load the datasets as follows::

            from sklearn.datasets import fetch_california_housing
            housing = fetch_california_housing()

        for the California housing dataset and::

            from sklearn.datasets import fetch_openml
            housing = fetch_openml(name="house_prices", as_frame=True)  # noqa

        for the Ames housing dataset.

    Parameters
    ----------

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

Thanks, @jmloyola I am doing these changes now.

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 21, 2022

I would leave the warning below the description of the function. Something like this:

Note that there was an error in raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None). The separation should be "\s+" instead of "s+". I know it isn't something you added and it was already there, just letting you know so we can fix it now 🤓.

Also, I used "Boston" for description of the function instead of "boston".

All done 😊
Please have a look @jmloyola

@jmloyola
Copy link
Member

@jmloyola jmloyola commented Jan 21, 2022

Now the test test_load_boston_warning is failing :(

We forgot to run pytest there to catch it.

@ogrisel, @thomasjpfan, @jjerphan, what do you think? How should we handle this? Beforehand, the information was duplicated. One time in the @deprecated decorator and once again in the .. warning directive.

Should we change test_load_boston_warning, leave the warning information in the decorator, or duplicate the warnings (leave as it was before). I don't like the second option because the description of the functions is not at the top of the page.

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 22, 2022

I think the warning message is important enough to remain in both the FutureWarning and the docstring.

Moving forward, I think we can remove the .. warning directive and keep the content in @deprecated.
Looking at the render on main, the whole docstring became the warning because of how @deprecated auto-inserted the warning message.

@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 24, 2022

Hi @thomasjpfan, @jmloyola
As per your instruction I kept the message in both @deprecated decorator and .. warning directive.
After this I had generated the html file and below is the output, kindly have look on this and kindly let me know if it's ok ro merge or I need to do any modification.
Thanks

image
image
image

Copy link
Member

@jjerphan jjerphan left a comment

LGTM.

@jmloyola
Copy link
Member

@jmloyola jmloyola commented Jan 24, 2022

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

LGTM

@thomasjpfan thomasjpfan merged commit 6bdfb9a into scikit-learn:main Jan 24, 2022
31 checks passed
@PurnaChandraMansingh
Copy link
Contributor Author

@PurnaChandraMansingh PurnaChandraMansingh commented Jan 24, 2022

Thank you so much @thomasjpfan 😊
It's my first contribution my most favorite library scikit-learn
It was a great experience and learned a lot about how opensource, GitHub, and code base works.
Thank you so much to open this opportunity for new contributors like me.
also a lot of thanks to all the friendly members of this community specially @jmloyola, @jjerphan @ogrisel, @atharvapatil123, @genvalen, you all helped me a lot and cleared all my doubts.

As per the rule mentioned in #21350 I have successfully completed 3 PRs. could you please suggest any other issue to contribute or shall I continue my contribution to #21350?
Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants