pandas-dev / pandas Public
Testing
Pages 28
-
- Typical Imports
- Making your tests behave well
- Additional imports
- Testing using a File
- Testing for Exceptions
- Handling tests requiring network connectivity
- Testing for Warnings
- Reading from either a URL or zip file
- Reading from a url
- Reading a file named file.txt that's inside of a zip file named file.zip
- Hook up travis-ci
For Documentation Authors
For Developers
- Contributing
- Code Style and Conventions
- Testing
- Performance Testing
- Documenting new features and bug fixes
- Using Git
- Choosing Exceptions to Raise
- A quick overview of
pandas.util
- Git Workflows
- Pandas Development FAQ
- Windows
- Specialities
For maintainers
Clone this wiki locally
First off - thank you for writing test cases - they're really important in developing pandas!
Typical Imports
import pytest
import pandas.util.testing as tm
from pandas.util.testing import makeCustomDataframe as mkdf
Making your tests behave well
pandas
committers run test cases after every change (as does Travis), so it's important that you make your tests well-behaved. Balancing that, it's important that your test cases cover the functionality of your addition, so that when others make changes, they can be confident that they aren't introducing errors in your code. This includes:
- marking network-using test cases with
@network
(see below). - marking slow tests with
@pytest.mark.slow
- using smaller test cases where it makes sense (for example, if you're testing a
numexpr
evaluation, you can generally just setexpr._MIN_ELEMENTS = 0
and go ahead, rather than needing to test on a frame of at least 10K elements). - making sure to skip tests (or even test files) if a required import is not available.
In addition, stylistically, the preference is to use pytest.mark.parametrize
or pytest
fixtures to group multiple related tests under one test function to keep the functionality tested in any one test function specific and clear.
E.g.:
@pytest.mark.slow
def test_million_element_arithmetic():
df = mkdf(100000, 100000)
tm.assert_frame_equal(df.mod(df) * df * 0, df * 0)
Additional imports
Important: make sure to document failure conditions (and use the pytest.raises
where necessary to make it clearer which exception you want to get). Testing for Exception
is strongly discouraged.
Testing using a File
This context manager allows safe read/write access to a temporary file, with a generated filename (or your filename if provided). The file will be automatically deleted when the context block is exited.
with tm.ensure_clean('my_file_path') as path:
# do something with the path
Testing for Exceptions
Generally, it's not acceptable to just check that something raises Exception
, because that tends to mask a lot of errors. For example, if a function's signature changes between releases, you could be catching the wrong kind of error altogether. Going forward, the goal is to have no test cases that pass if Exception
or a subclass is raised (we're not quite there yet).
Another element that is helpful is to use pytest.raises(..., match=regexp)
. It lets you be very explicit with what you expect (and prevents hiding errors like changing signatures, etc.)
with pytest.raises(ValueError, match="an error"):
raise ValueError("an error")
with pytest.raises(TypeError, match="invalid literal"):
int('abc')
Handling tests requiring network connectivity
Please run your tests without an internet connection before submitting a PR! (it's really important that your tests not fail when you have no internet connection (i.e., they should skip with out a network connection). In general, network tests are finicky. All tests that involve networking must be marked as "network", either by using the network
decorator or the with_connectivity_check
decorator from pandas.util.testing
.Unless you absolutely need to test that a function/method correctly handles connectivity errors, you should use the network
decorator, which will catch all IOError
s (which includes URLError
). If you believe that your test case will only fail if you simply aren't connected to the internet, you can use the with_connectivity_test
to check:
>>> @with_connectivity_check
... def test_my_function():
... urllib2.urlopen("funny://rabbithead")
>>> test_my_function()
Traceback (most recent call last)
...
URLError...#some message
If you want to have the decorator always raise errors, just pass raise_on_error=True
to the network
decorator:
>>> @network(raise_on_error=True)
... def test2():
... raise URLError("WRONG!")
Traceback (most recent call last)
...
URLError: WRONG!
The with_connectivity_check
decorator defaults to checking http://www.google.com
to determine whether it is connected. But if you had a test that depends on yahoo, it might make sense to check yahoo instead:
@with_connectivity_check("http://www.yahoo.com")
def some_test_with_yahoo():
# do something etc.
It's a good idea to break up network tests into at least two parts:
- Tests that check that the code works and gracefully handles errors.
- Tests that really only matter if you have network connectivity (like making sure that the current Google Analytics feed is being processed properly).
For (1), you might want to use @network(raise_on_error=True)
, because those tests should not fail without connectivity.
For (2), you should definitely suppress network errors, and, particularly if you have a slow test, you may even want to check for connectivity first (so the test never even runs if there isn't a network connection). You can do that easily by passing check_before_test=True
to with_connectivity_check
:
@with_connectivity_check("http://www.somespecificsite.com", check_before_test=True)
def some_test():
for i in range(1000):
test_some_really_long_function(i)
Testing for Warnings
To test for warnings, you can use the assert_produces_warning
contextmanager, which checks that your code produces a warning.
Probably the most common case is just a test case for a DeprecationWarning:
>>> with assert_produces_warning(DeprecationWarning):
... some_function_that_raises_deprecation_warning()
With no arguments, it checks that any warning is raised.
>>> import warnings
>>> with assert_produces_warning():
... warnings.warn(UserWarning())
...
When passed False, it checks that no warnings are raised.
>>> with assert_produces_warning(False):
... warnings.warn(RuntimeWarning())
...
Traceback (most recent call last):
...
AssertionError: Caused unexpected warning(s): ['RuntimeWarning'].
Finally, if you pass it a warning class, it will check that the specific class of warning was raised and no other.
>>> with assert_produces_warning(UserWarning):
... warnings.warn(RuntimeWarning())
Traceback (most recent call last):
...
AssertionError: Did not see expected warning of class 'UserWarning'.
Reading from either a URL or zip file
Reading from a url
from pandas.io.common import urlopen
with urlopen('http://www.google.com') as url:
raw_text = url.read()
Reading a file named file.txt
that's inside of a zip file named file.zip
from pandas.io.common import ZipFile
with ZipFile('file.zip') as zf:
raw_text = zf.read('file.txt')
Hook up travis-ci
We use travis for testings the entire library across various python versions. If you hook up your fork to run travis, then it is displayed prominently whether your pull request passes or fails the testing suite. This is incredibly helpful.
If it shows that it passes, great! We can consider merging.
If there's a failure, this let's you and us know there is something wrong, and needs some attention before it can be considered for merging.
Sometimes Travis will say a change failed for reasons unrelated to your pull request. For example there could be a build error or network error. To get Travis to retest your pull request, do the following:
git commit --amend -C HEAD
git push origin <yourbranch> -f