Best practices for temp files

On 13 Jan, 2013 By Rob Galanakis With 6 Comments

It seems the more middleware we use, the more we need to work with temporary files. So for Tech Artists, we usually work with temp files a lot! So it’s important we follow best practices when we need to work with temporary files.

Note: I’ll be using python functions and terminology but this applies to any language.

It goes without saying but, use the tempfile module. Don’t use environment variables or roll something yourself (I’ve seen both done).

Never hard-code the location of a temporary file. I’ve used middleware that does an equivalent of “os.path.join(gettempdir(), ‘myscratch.foo’)“ to get some path. Which means running multiple processes at the same time causes file lock errors! I’ve had to change the temp directory before calling these processes.

Either your directory or your filename should always be programmatically looked up: so “mkstemp(dir=os.path.join(gettempdir(), ‘myscratchfiles’)“ is acceptable, as is “os.path.join(mkdtemp(), ‘myfile.foo’)“.

Try and clean up your files and folders! I say ‘try’ because it isn’t always possible. Sometimes your library may need to return the path to some file it generates. In that case, rather than rely on the caller to clean up the file (which would be a bizarre dependency and docstring!), you’re better off to take in the ‘output path’ to your library, and write the result file there. And your library can clean up the intermediate temp files. That way the caller can be responsible for deleting the result file if it needs (hopefully it doesn’t) because it knows more about the path.

In a lot of cases, I don’t bother cleaning up temp files for internal software. Management of temp files takes a non-negligible amount of work, and if I can keep the software simpler by not worrying about it, I will. I’d rather rely on the developers to keep their hard drives in good shape, which they’ll do anyway.

If you’re running something that will generate a lot of temp files (maybe your test running script), you can also set your temporary directory to a temporary subdir, and then clean that up. Something like:

oldtemp = tempfile.gettempdir()
newtemp = tempfile.tempdir = tempfile.mkdtemp() # Maybe set os.environs TMPDIR/TMP/TEMP as well?
try:
    nose.run() # Or whatever processing you need to do
finally:
    tempfile.tempdir = oldtemp
    shutil.rmtree(newtemp)

And lastly (it’s last because I’m sure people have some religious objection to it), I’ve actually created a custom mktemp function that just calls mkstemp and closes the file descriptor. Just getting a path is useful and worth the performance overhead in (IMO) the vast majority of cases, especially since we often just need the filename to pass to some external process.

What advice do you have for working with temp files (and where is my advice poor)?

Tweet This Post

6 thoughts on “Best practices for temp files”

Adam says:

January 13, 2013 at 11:39 am

Thanks! I actually never knew about the tempfile module.

Reply
Mumm says:

January 13, 2013 at 3:47 pm

I would also recommend keeping temp files around for helping with debugging.

Reply
Roger says:

January 13, 2013 at 10:17 pm

Always use the prefix option when making temp files giving the name of your script/module/library/function as appropriate. That way examination of the temp directory will tell you who is to blame for files and makes manual cleanup easier, not to mention detecting bugs in the cleanup of files when you do an “ls” and see many files with your prefix!

Reply
Antonio says:

January 14, 2013 at 4:37 am

I think a context manager is a better overall solution:

class cdinto(object):
def __enter__(…)
-> create the temp dir
def __exit__(…)
-> remove the leftovers

and wrapping the main with:
with cdinto() as tmp:
main()

Much easier;)

Reply
Adam Skutt says:

January 14, 2013 at 6:53 am

There’s lots of questionable advice here. I’m not sure you fully understand temporary file-race issues, and why they are so dangerous.

You should never, ever just generate a temporary file name, unless you’re going to write into a secure directory. A secure directory is one that no one else can write into (except root) and the same holds for all parent directories (or the sticky bit is set). Otherwise, you open yourself up to race attacks.

Temporary filenames in sticky directories (e.g., /tmp) are only safe to use as long as the file itself exists. Once the file is gone, the name isn’t usable either. Suitably paranoid code would check that the directory is sticky before creating the file and using the name.

In general, all of this means you should write code that never looks at, or cares about, the name of a temporary file. If you care about the name, you must be very careful. In those cases, it’s probably easiest to securely create a temporary directory, then create the files in there.

This code:
“mkstemp(dir=os.path.join(gettempdir(), ‘myscratchfiles’))“ is insecure because someone else could make myscratchfiles into a symlink. It could be used to write a temporary file into an attacker controlled place, which opens you to attack if you reuse the filename (i.e., close the descriptor and open the same name again). Likewise, your last paragraph is only safe because you use mkstemp and not one of the other functions.

The simplest way to ensure temporary files and directories get cleaned up is to just use tempfile.TemporaryFile where appropriate, or a context manager when not appropriate. If you’re finding management of temporary files difficult, then you’re most likely structuring your code incorrectly.

Reply
chris says:

January 15, 2013 at 4:13 pm

@Adam Skutt
You know what’s also unsafe? Parsing and executing Python through an application like Maya.. You guys should quit.

Reply