Best practices for temp files
It seems the more middleware we use, the more we need to work with temporary files. So for Tech Artists, we usually work with temp files a lot! So it’s important we follow best practices when we need to work with temporary files.
Note: I’ll be using python functions and terminology but this applies to any language.
It goes without saying but, use the tempfile module. Don’t use environment variables or roll something yourself (I’ve seen both done).
Never hard-code the location of a temporary file. I’ve used middleware that does an equivalent of “os.path.join(gettempdir(), ‘myscratch.foo’)“ to get some path. Which means running multiple processes at the same time causes file lock errors! I’ve had to change the temp directory before calling these processes.
Either your directory or your filename should always be programmatically looked up: so “mkstemp(dir=os.path.join(gettempdir(), ‘myscratchfiles’)“ is acceptable, as is “os.path.join(mkdtemp(), ‘myfile.foo’)“.
Try and clean up your files and folders! I say ‘try’ because it isn’t always possible. Sometimes your library may need to return the path to some file it generates. In that case, rather than rely on the caller to clean up the file (which would be a bizarre dependency and docstring!), you’re better off to take in the ‘output path’ to your library, and write the result file there. And your library can clean up the intermediate temp files. That way the caller can be responsible for deleting the result file if it needs (hopefully it doesn’t) because it knows more about the path.
In a lot of cases, I don’t bother cleaning up temp files for internal software. Management of temp files takes a non-negligible amount of work, and if I can keep the software simpler by not worrying about it, I will. I’d rather rely on the developers to keep their hard drives in good shape, which they’ll do anyway.
If you’re running something that will generate a lot of temp files (maybe your test running script), you can also set your temporary directory to a temporary subdir, and then clean that up. Something like:
oldtemp = tempfile.gettempdir() newtemp = tempfile.tempdir = tempfile.mkdtemp() # Maybe set os.environs TMPDIR/TMP/TEMP as well? try: nose.run() # Or whatever processing you need to do finally: tempfile.tempdir = oldtemp shutil.rmtree(newtemp)
And lastly (it’s last because I’m sure people have some religious objection to it), I’ve actually created a custom mktemp function that just calls mkstemp and closes the file descriptor. Just getting a path is useful and worth the performance overhead in (IMO) the vast majority of cases, especially since we often just need the filename to pass to some external process.
What advice do you have for working with temp files (and where is my advice poor)?