Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-58451: Add optional delete_on_close parameter to NamedTemporaryFile #97015

Merged
merged 26 commits into from Oct 4, 2022

Conversation

Ev2geny
Copy link
Contributor

@Ev2geny Ev2geny commented Sep 22, 2022

This is a reincarnation of the already marked as awaiting merge PR 22431, which was subsequently closed due to some git mistake.

This PR fixes issue 58451: tempfile.NamedTemporaryFile not particularly useful on Windows

tempfile.NamedTemporaryFile is too hard to use portably when you need to open the file by name after writing it. To do that, you need to close the file first (on Windows), which means you have to pass delete=False, which in turn means that you get no help in cleaning up the actual file resource,

Hence at the moment there is no out of the box solution to use tempfile.NamedTemporaryFile on Windows in such scenario (which is often used in unit testing):

  • in test module:
  1. create and open temporary file
  2. write data to it
  3. pass name of the temporary file to the operational code
  • In operational code, being tested
  1. open file, using name of the temporary file
  2. read data from this temporary file

In this Pull Request the issue is solved by adding an additional optional argument to NamedTemporaryFile() 'delete_on_close' (default is True). It works in combination with already existing argument 'delete', and determines the deletion behaviour.

If delete is true (the default) and delete_on_close is true (the default), the file is deleted as soon as it is closed. If delete is true and delete_on_close is false, the file is deleted on context manager exit only, if no context manager was used, then the file is deleted only when the file-like object is finalized. If delete is false, the value of delete_on_close is ignored.

So, the change shall be fully backwards compatible.

@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 22, 2022

The changes done to the code since it left the PR 22431

  • Merge conflicts are resolved

  • A new functionality is added based on the review feedback from @eryksun to cover the case if delete is true and delete_on_close is false, but the file is never used in a with context. The new functionality is that in this case the file is deleted when the instance is finalized. This is implemented by adding a __del__() method to the _TemporaryFileCloser. This required documentation update as well.
    @eryksun , please have a look at the implementation as well as documentation update.

Doc/whatsnew/3.12.rst Outdated Show resolved Hide resolved
Co-authored-by: Éric <merwok@netwok.org>
@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 22, 2022

Ok, looks like there are some Ubuntu errors, which I will need to look at

Lib/tempfile.py Outdated Show resolved Hide resolved
…r to _TemporaryFileCloser

Test test_del_by_finalizer_if_no_with is changed from simulating finalizer by calling __del__() to making
sure finalizer really runs
Lib/tempfile.py Outdated Show resolved Hide resolved
Ev2geny added 3 commits Sep 24, 2022
…close is moved to _TemporaryFileWrapper

based on the advice from @eryksun

There are errors in the test (at least on Linux)
Lib/tempfile.py Show resolved Hide resolved
Lib/tempfile.py Outdated Show resolved Hide resolved
Lib/tempfile.py Outdated Show resolved Hide resolved
Doc/library/tempfile.rst Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/tempfile.py Outdated Show resolved Hide resolved
Ev2geny and others added 3 commits Sep 25, 2022
Implemented review from @eryksun and @merwok

Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Éric <merwok@netwok.org>
…ch takes into account delete_on_close moved to _TemporaryFileWrapper (suggestion of @ekisun)
@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 25, 2022

@eryksun , I have implemented all suggestions, but it now fails the unit test for the issue18879

    def test_method_lookup(self):
        # Issue #18879: Looking up a temporary file method should keep it
        # alive long enough.
        f = self.do_create()
        wr = weakref.ref(f)
        write = f.write
        write2 = f.write
        del f
        write(b'foo')

@eryksun
Copy link
Contributor

eryksun commented Sep 25, 2022

I have implemented all suggestions, but it now fails the unit test for the issue18879

The delegated attribute cache is a problem. It leads to a reference cycle if the function wrapper (bpo-18879) is changed to keep a reference to the file wrapper (i.e. self) instead of the file closer (i.e. self._closer). Apparently avoiding this reference cycle was the reason the closer was added.

So, sorry to have misled you, but it looks like we're going to have to implement all delete operations in the closer. The wrapper just needs to change __init__() to instantiate the closer with delete_on_close, and change __exit__() to call the closer's cleanup() method. The latter includes a call to self.file.close() if needed. Since all deleting is handled by the closer, there's no need to implement __del__() in the wrapper. For example:

class _TemporaryFileCloser:
    """A separate object allowing proper closing of a temporary file's
    underlying file object, without adding a __del__ method to the
    temporary file."""

    cleanup_called = False
    close_called = False

    def __init__(self, file, name, delete=True, delete_on_close=True):
        self.file = file
        self.name = name
        self.delete = delete
        self.delete_on_close = delete_on_close

    def cleanup(self, windows=(_os.name == 'nt'), unlink=_os.unlink):
        if not self.cleanup_called:
            self.cleanup_called = True
            try:
                if not self.close_called:
                    self.close_called = True
                    self.file.close()
            finally:
                if self.delete and not (self.delete_on_close and windows):
                    try:
                        unlink(self.name)
                    except FileNotFoundError:
                        pass

    def close(self):
        if not self.close_called:
            self.close_called = True
            try:
                self.file.close()
            finally:
                if self.delete and self.delete_on_close:
                    self.cleanup()

    def __del__(self):
        self.cleanup()
class _TemporaryFileWrapper:
    """Temporary file wrapper

    This class provides a wrapper around files opened for
    temporary use.  In particular, it seeks to automatically
    remove the file when it is no longer needed.
    """

    def __init__(self, file, name, delete=True, delete_on_close=True):
        self.file = file
        self.name = name
        self._closer = _TemporaryFileCloser(file, name, delete,
                                            delete_on_close)

    def __getattr__(self, name):
        # Attribute lookups are delegated to the underlying file
        # and cached for non-numeric results
        # (i.e. methods are cached, closed and friends are not)
        file = self.__dict__['file']
        a = getattr(file, name)
        if hasattr(a, '__call__'):
            func = a
            @_functools.wraps(func)
            def func_wrapper(*args, **kwargs):
                return func(*args, **kwargs)
            # Avoid closing the file as long as the wrapper is alive,
            # see issue #18879.
            func_wrapper._closer = self._closer
            a = func_wrapper
        if not isinstance(a, int):
            setattr(self, name, a)
        return a

    # The underlying __enter__ method returns the wrong object
    # (self.file) so override it to return the wrapper
    def __enter__(self):
        self.file.__enter__()
        return self

    # Need to trap __exit__ as well to ensure the file gets
    # deleted when used in a with statement
    def __exit__(self, exc, value, tb):
        result = self.file.__exit__(exc, value, tb)
        self._closer.cleanup()
        return result

    def close(self):
        """
        Close the temporary file, possibly deleting it.
        """
        self._closer.close()

    # iter() doesn't use __getattr__ to find the __iter__ method
    def __iter__(self):
        # Don't return iter(self.file), but yield from it to avoid closing
        # file as long as it's being used as iterator (see issue #23700).  We
        # can't use 'yield from' here because iter(file) returns the file
        # object itself, which has a close method, and thus the file would get
        # closed when the generator is finalized, due to PEP380 semantics.
        for line in self.file:
            yield line

This passes all tests for me in both Linux and Windows.

@eryksun
Copy link
Contributor

eryksun commented Sep 25, 2022

As I mentioned previously, the exception handler in NamedTemporaryFile() also needs to be changed to check delete_on_close:

    except:
        if name is not None and not (
          _os.name == 'nt' and delete and delete_on_close):
            _os.unlink(name)
        raise

@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 26, 2022

Isn't this code trying to remove whatever is left after unsuccessful attempt to create a temporary file? In this case why would we want to check the delete_on_close flag? Which code is going to delete it then, if we don't do it now ?

In the io.open() call, if opener() succeeds, then name is not None, and ownership of the file descriptor gets transferred to the io.FileIO instance. If io.open() ultimately fails or instantiating _TemporaryFileWrapper fails, the file descriptor gets closed. In this case, the file will be deleted automatically if _os.name == 'nt' and delete and delete_on_close, else it has to be deleted manually.

OK, thanks. Understood. I have implemented this in this commit

@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 26, 2022

@eryksun , sorry for asking an educational question, but can you explain me please what is the reason of using class attribute here, rather than an instance attribute?

class _TemporaryFileCloser:

   # these are class attributes
    cleanup_called = False
    close_called = False

    def __init__(self, file, name, delete=True, delete_on_close=True):
        self.file = file
        self.name = name
        self.delete = delete
        self.delete_on_close = delete_on_close

        # why not just setting the default  instance attributes here?
        # cleanup_called = False
        # close_called = False

    def cleanup(self, windows=(_os.name == 'nt'), unlink=_os.unlink):
        if not self.cleanup_called:        #if done, 1st time, this is looks at class attribute
            # But this creates an instance attribute with the same name
            # From now on all calls to  self.cleanup_called will return an instance attribute 
            self.cleanup_called = True  

@eryksun
Copy link
Contributor

eryksun commented Sep 26, 2022

sorry for asking an educational question, but can you explain me please what is the reason of using class attribute here, rather than an instance attribute?

It's just less work and less memory usage. The default value of False can be referenced on the class until it gets overridden to True as an instance attribute.

	modified:   Lib/test/test_tempfile.py
@Ev2geny
Copy link
Contributor Author

Ev2geny commented Sep 29, 2022

@zooba , this is a reincarnation of the already approved by you PR 22431 (which was closed due to git error). It has been further reviewed and updated. Can you please look at it now?

Copy link
Member

@zooba zooba left a comment

Code looks good to me. Would like to make the docs easier to follow while we're here though.

following differences:

* The file is guaranteed to have a visible name in the file system
(on Unix, the directory entry is not unlinked).
Copy link
Member

@zooba zooba Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this line was here before, but it doesn't make any sense to me. Should we mention this somewhere else?

Copy link
Contributor Author

@Ev2geny Ev2geny Oct 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun , I think zooba talks about the line "(on Unix, the directory entry is not unlinked)." What do you think, does it make scene to move it somewhere else?

Copy link
Contributor

@eryksun eryksun Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On POSIX, a file created by TemoraryFile() does not have a visible name in the filesystem. The implementation tries to use the O_TMPFILE flag if it's supported (Linux 3.11+), which creates an anonymous file. If O_TMPFILE isn't supported, the file is created with a unique name, and then it's immediately unlinked while the file descriptor is still open.

On Windows, TemporaryFile is aliased to NamedTemporaryFile. There isn't a compelling reason for this decision. It's fine to delete a file that's opened with the O_TEMPORARY flag. On Windows 10 and 11, if it's on an NTFS filesystem, the file gets unlinked immediately. Prior to Windows 10, or on filesystems other than NTFS, the file is deleted but not unlinked. In the latter state, a file has a visible entry in the directory, but it can't be opened again for any access, and it gets unlinked as soon as the last handle to it is closed.

Copy link
Contributor Author

@Ev2geny Ev2geny Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun , thanks for explanation. So, would you agree, that the statement on Unix, the directory entry is not unlinked does not really make too much scene the way it is written now? Would it be better to either remove it completely or to put a broader explanation, similar to what you provided?

Copy link
Contributor

@eryksun eryksun Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the context of clarifying that NamedTemporaryFile() acts "exactly" like TemporaryFile(). In that context it makes sense. But I'd prefer that the first sentence and the two bullet points were replaced with something like the following:

   This function returns a file that is guaranteed to have a visible name in
   the file system. To manage the named file, it extends the parameters of
   :func:`TemporaryFile` with *delete* and *delete_on_close* parameters that
   determine whether and how the named file should be automatically deleted.

Copy link
Contributor Author

@Ev2geny Ev2geny Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now placed the statement about unlinking on Unix to a different place and slightly reworked it to make a reference to TemporaryFile(), without which I agree it was a bit confusing.

I have also implemented rewording from @eryksun

Doc/library/tempfile.rst Outdated Show resolved Hide resolved
While the named temporary file is open, the file can always be opened again
on POSIX. On Windows, it can be opened again if *delete* is false, or if
*delete_on_close* is false, or if the additional open shares delete access
(e.g. by calling :func:`os.open` with the flag ``O_TEMPORARY``). On
Windows, if *delete* is true and *delete_on_close* is false, additional
opens that do not share delete access (e.g. via builtin :func:`open`) must
be closed before exiting the context manager, else the :func:`os.unlink`
call on context manager exit will fail with a :exc:`PermissionError`.
Copy link
Member

@zooba zooba Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reframe this as:

  • the file can be opened again using its name
  • on Windows, subsequent opens should share delete access (e.g.)
  • however, when delete or delete_on_close are false, sharing delete access is not necessary, but you should ensure the file is closed before leaving the context manager

This sets it up more positively for the user - what should they do, as opposed to what could go wrong.

Copy link
Contributor Author

@Ev2geny Ev2geny Oct 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how I have written this in the new version. Not sure everybody would agree, but I have now written this information the way I would want it to be explained for me without the risk of twisting my brains.

Additional question: does it make sense to write unit tests to cover all these combinations of opening the file the second time, while it is already open. I have tested this for myself (on Windows) to make sure it is all correct, but I am not sure that it shall go to unit tests.

Doc/library/tempfile.rst Show resolved Hide resolved
Doc/library/tempfile.rst Outdated Show resolved Hide resolved
Doc/library/tempfile.rst Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/tempfile.py Outdated Show resolved Hide resolved
Ev2geny and others added 3 commits Oct 1, 2022
Co-authored-by: Eryk Sun <eryksun@gmail.com>
	modified:   Doc/library/tempfile.rst
	modified:   Lib/test/test_tempfile.py
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
…eryksun


Files changed:
      Lib/test/test_tempfile.py

Co-authored-by: Eryk Sun <eryksun@gmail.com>
@Ev2geny
Copy link
Contributor Author

Ev2geny commented Oct 2, 2022

@eryksun , can you please have a look at this discussion about the fact, that the directory entry is not unlinked on Unix. I think you a correct person to have an idea there.

@Ev2geny
Copy link
Contributor Author

Ev2geny commented Oct 4, 2022

@zooba , I have done my best to implement your comments and also got a help of eryksun.
Can you look at it again please?

zooba
zooba approved these changes Oct 4, 2022
Doc/library/tempfile.rst Outdated Show resolved Hide resolved
Doc/library/tempfile.rst Outdated Show resolved Hide resolved
@zooba
Copy link
Member

zooba commented Oct 4, 2022

Thanks for all your hard work on this! I tweaked some of the docs slightly, and will merge once CI finishes again.

@zooba zooba merged commit 743453a into python:main Oct 4, 2022
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants