Posts Tagged ‘threading’

Python Singletons

In my opinion, good python libraries and frameworks should spend effort guiding you towards the ‘pit of success’, rather than trying to keep you from failing. They do this by spending most effort on things related to the critical path- clear interfaces, simple implementations, thorough documentation.

Which is why singletons are, to me, the worst form of framework masturbation in python. You will never be able to stop people from doing something stupid if they’re determined (in pure python). In the case of a singleton, that means instantiating more than one instance of a type. So spending effort on ‘designing’ singletons is not just a waste of effort, but actively harmful. Just provide a clear way to use a single instance, and your system should fail clearly if it detects an actual problem due to multiple instances (as opposed to, trying to detect multiple instances to keep said problem from happening).

The best method for singletons in python, then, is- whatever is simplest!

  1. Some form of module or class state is, to me, the clearest. It requires someone reading or using your code to know nothing more than the most basic python. Just prefix your class def with an underscore, and expose an accessor function to an instance stored on the module (or on the class). The capacity for failure is minimal and the behavior is clear (it requires no behavior modification to the type itself).
  2. Overriding __new__ is pretty bad but OK. It requires someone to understand the subtleties of __new__, which is a useful thing to teach someone but, are singletons really the time and place?
  3. Using a metaclass is a terrible solution. It has a higher likelihood of failure (how many people understand the nuances of metaclasses!?). Misdirection even for people just reading your code, trying to understand your type’s behavior. Avoid.
The question to ask yourself before doing any of this is, “is a singleton a technical requirement or an architectural preference?” Ie, a single instance of an application event loop (QApplication, etc) I’d consider a technical requirement and make it foolproof (in C?). But technical requirements are few and far between and should be driven by underlying system/OS requirements rather than your code’s design or architecture. If it’s an architectural preference- “there should only be one instance of this manager/window/cache”- there’s absolutely no reason to confuse your code (especially you object’s behavior!) to achieve it. Just use design, documentation, and examples, to show people the right way to use it.

Why GUI’s Lock Up

This is a post for the Tech Artists and new programmers out there. I am going to answer the common question “why do my GUI’s lock up” and, to a lesser extent, “what can I do about it.”

I’m going to tell a story of a mouse click event. It is born when a user clicks her mouse button. This mouse click goes from the hardware, to the OS. The OS determines that the user wants to interact with your GUI, so it sends the mouse click event to your GUI’s process. Here, the mouse click sits in a queue of other similar events, just hanging out. At some point, usually almost immediately, your process (actually the main thread in the process) goes through and, one by one, dispatches the items (messages) in the queue. It has determined our mouse click is over a button and then tells the button it’s been clicked. The button then usually repaints itself so it looks pressed, and then invokes any callbacks that are hooked up. The process (main thread) goes into one such callback you hooked up, that will look at 1000 files on disk. This takes a while. In the mean time, the user is clicking, but the messages are just piling up in the queue. And then someone drags a window over your GUI, because they’re tired of their clicks not doing anything and want to see what’s new on G+. The OS sends a message to your UI that it needs to repaint itself, but that message, too, just sits in the queue. At some point, your OS may even realize your window is not responding, and fade it out and change the title bar. Finally your on-button-click callback finishes, the process (thread) is done processing our initial mouse click, and then goes back to processing the messages that may have accumulated in the queue, and your UI will refresh and start responding again.

All this happens because the thread that processes messages to draw the UI was also responsible for looking at 1000 files on disk, so it wasn’t around to respond to the paint and click messages. A few pieces of info:

  1. You can’t just ‘update the UI’ from the middle of your code. In addition to being terrible form code-wise, clearing the message queue would just cause other things to block the main thread, and it’d all get into one giant asynchronous mess. Some programs may have their own UI framework that supports this. Don’t trust it. You really just need the main/GUI thread clear as much as possible to respond to events.
  2. Your GUI process has a single ‘main thread.’ A thread roughly corresponds to, and I’m being not nuanced here, the software concept of a hardware CPU core. Your GUI objects can only be created and manipulated by the main thread.

This means, you want to keep your main thread free so it can act on GUI stuff (paint events, mouse clicks) only. The processing, such as your callback that looks at 1000 files, should happen on another thread (a background thread). When the processing is complete, it can tell the GUI thread that it is finished, and the GUI thread can update the UI. Your background thread can also fire events or invoke a callback that will be picked up by the GUI thread, so the GUI can update a progress bar or whatever.

How you actually do this varies with each UI framework. .NET, including WinForms and WPF, is quite easy to use (look at the BackgroundWorker class, but the Tasks Parallel Library and Async CTP make that less necessary). Python GUI frameworks are a bit worse off- multithreading in python in general is worse off- so it’ll be different for each one, and probably not as simple as .NET. There’s no excuse for python GUI’s to lock up, it just takes a little more effort to get it completely right (like callbacks to update a UI are a bit tricky).

There is one other vital thing to keep in mind- DCC programs generally require you to interact with the API or run all their script on the main thread, which as discussed should also be kept clear. Bummer! So the best thing we can do is block while we get our data from the scene, put the processing on a background thread, and report back to the main thread when done, applying the new data back to the scene if necessary. Unfortunately, if your processing interacts with the API in any way, you probably need to put it in the main thread as well. So, right now, your GUI’s in DCC apps may need to lock up, by design. There are, in theory, ways to avoid this, but they’re well outside of the scope of what you can handle if you’re learning anything from this article.

Whatever your language and program, those are the essentials of why your GUI locks up.

Note: This info is not nuanced (and is less accurate the lower down things go), may not be terminologically perfect (though it should be vulgarly comprehensible), and is Windows-only, though it should be enough to know how any higher-level GUI framework (such as Qt) would work on a non-Windows system).

Three options for data correctness

In a previous post, I linked to Rico Mariani’s performance advice for Data Access Layers. On G+, Tyler Good asked:

I just read the posts and the linked blogs, I had a question about some specific implementations. How do you deal with classes that represent another non-[in this case]-Python entity that may be updated outside of Python?

I’m not sure if this sort of case is outside of the scope of what’s being talked about in the articles, but if there’s a better way to do getting on things like p4 paths or elements in a Maya file (that may have been changed by the user since instantiating/loading the object) I’d really like some ideas about that.

You basically have three options and fortunately they line up easily on a scale:

Technique Correct Difficulty
Transactions Always High
Fetch-on-demand Usually Medium
Store in memory Maybe Low

Let’s get on the same page first. Let’s consider all three types of interactions- database through a DAL, perforce (or any source control) interaction, and interaction with some host application (Maya, or your engine, or whatever). So what are the three approaches and how do they differ?

Store in Memory

You create a code object with a given state, and you interact with that code object. Every set either pushes changes, or you can push all changes at once. So for example, if you have a tool that works with some Maya nodes, you create the python objects, one for each node, when you start the tool. When you change one of the python objects, it pushes its changes to the tool.

This is the simplest to reason about and implement. However, the difficultly quickly becomes managing its correctness. You need to lock people out of making changes (like deleting the maya node a python object refers to), which is pretty much impossible. Or you need to keep the two in sync, which is incredibly difficult (especially since you have any number of systems running concurrently trying to keep things in sync). Or you just ignore the incorrectness that will appear.

It isn’t that this is always bad, more that it is a maintenance nightmare because of all sorts of race conditions and back doors. Not good for critical tools that are editing any sort of useful persistent data. And in my opinion, the difficulties with correctness are not worth the risk. While the system can be easy to reason about, it is only easy to reason about because it is very incomplete and thus deceivingly simple. So what is better?

Fetch on Demand

Here, instead of storing objects in two places (your code’s memory, and where they exist authoritatively, like the Maya scene, or a Perforce database), you store them only where they exist authoritatively and create the objects when that data is queried. So instead of working with a list of python objects as with Store in Memory, you’d always query for the list of Maya nodes (and create the python object you need from it).

This can be simple to reason about as well but can also be quite slow, depending on your dependency. If you’re hitting a DB each time, it will be slow. If you need to build complex python objects from hundreds of Maya or Max calls, it will be slow. If you need to query Perforce each time, it will be slow.

I should note that this is really just a correctness improvement upon Store in Memory and the workings are really similar. The querying of data is only superior because it is done more frequently (so it is more likely to be correct). The changing of data is only more likely to be correct because it will have had less time to change since querying.

That said, in many cases the changing of data will be correct enough. In a Maya scene, for example, this will always be correct on the main thread because the underlying Maya nodes will not be modified by another thread. In the case of Perforce, it may not matter if the file has changed (let’s say, if someone has checked in a new revision when your change is to sync a file).

Transactions

Transactions should be familiar to anyone who knows about database programming or has read about Software Transactional Memory. I’m going to simplify at the risk of oversimplifying. When you use a transactions, you start a transaction, do some stuff (to a ‘copy’ of the ‘real’ data), and commit the transaction. If the ‘real’ data you are reading or updating has changed, the whole transaction fails, and you can abort the transaction, or keep trying until it succeeds.

Mass simplification but should be enough for our purposes. This is, under the hood, the guaranteed behavior of SCM systems and all databases I know of. The correctness is guaranteed (as long as the implementation is correct, of course). However, it is difficult to implement. It is even difficult to conceptualize in a lot of cases. There are lots of user-feedback implications: an ‘increment’ button should obviously retry a transaction, but what if it’s a spinner? Are you setting an explicit value, or just incrementing? Regardless, where you need correctness in a concurrent environment, you need transactions. The question is, do you need absolute correctness, or is ‘good enough’ good enough?

Recommendations

Avoid Store in Memory. If you design things this way, break the habit. It is a beginner’s mistake that I still make from time to time. Use Fetch on Demand instead. It should be your most common pattern for designing your tools.

Be careful if you think you need Transactions. Ensure they are where they need to be (database, SCM), but don’t just go around designing everything as if it needs to be transactional. If you have two programs that can edit the same file- is one or the other just winning OK? How likely is that to happen? How will you indicate the failed transaction to the user? I’d suggest designing your tools so transactions are not necessary, and just verify things are correct when they cross an important threshold (checkin, export, etc.). Do your cost-benefit analysis. A highly concurrent system will need transactions, tools that only work with local data will likely not.

It should be clear, but still worth pointing out, you can mix-and-match these patterns inside of your designs.

Hope that clarifies things, Tyler.

Async IO- don’t do naive async!

This post comes from a response here: http://stackoverflow.com/questions/882686/asynchronous-file-copy-move-in-c .  The second-highest rated response includes absolutely terrible advice.  It says, just put the file IO on a background thread.  It shows a complete lack of understanding of how IO works.

When your do IO (reading/writing from/to HDD or network), the software thread goes to the hardware, and says, ‘hey, can you write this info somewhere’ or ‘hey, I need this info, can you get it for me.’  The hardware then takes that request and does something with it, and when it is done it tells the software thread (let’s assume there’s no caching or lazy behavior going on).  This entire time, the software thread is just waiting while the hardware does the work.

What you really want to do is have the software thread run to the hardware and say, ‘hey, can you do this for me, I’ll be back later,’ then run back to the software threadpool (where all the inactive software threads hang out).  And then it should be dispatched to do something else (like run another request down to the hardware, or some processing).  When the hardware is done, the software threads will pick it up and report it as finished- the software threads are never waiting around for hardware to do stuff.

Unfortunately async IO in .NET is cumbersome, so it is often not worth doing truly async IO.  So it still makes sense to do IO on background threads, but it is just inefficient and NOT best practice- hence this post.  Just keep in mind this is a vastly simplified version of what actually goes on, but it is important to understand it at least on some basic level so you can design efficient and fast systems.

Two keys to effective multithreaded programs

In my last post, I went over a mostly lock-free producer/consumer queue that worked entirely off the ThreadPool. This covers the two most important aspects to effective multithreaded programming: avoid creating new Threads, and work lock-free where possible.

Avoiding the creation of new threads is easy- you just need to schedule tasks on the ThreadPool (though QueueUserWorkItem, or System.Threading.Tasks usage), instead of using Thread.Start or new Thread. The goal is, you want the ThreadPool to manage threads for optimal performance, you don’t want to create other Threads that are going to push the ThreadPool threads off the CPU, and are going to need to be created/destroyed. A long-lived thread is usually a waste because it is inactive for much of the time. A short-lived thread is a waste because allocating and destroying a thread is an expensive operation! So just avoid creating threads- you almost never have to if you design your systems properly.

Lock-free programming is much more difficult. The reason lock-free programming is important (in this case) is, you don’t want to block ThreadPool threads. If a ThreadPool thread is blocked for a while, the ThreadPool will create a new thread- which means we’re basically creating new Threads as above, which is bad! Worst case scenario: a Parallel.ForEach loop that involves the launching of a Process/WaitForExit that blocks the thread. You’ll end up with almost every iteration of the ForEach loop having launched its own process, so you’ll have n threads in your main program, with n processes running, with their own threads. You will have a huge mess and everything will be context switching like crazy and performance across the board will suffer!

So this all basically requires breaking sequential work into discrete chunks, and linking them together- and actually this is exactly what Task Based Parallelism is, and you can do it effectively with System.Threading.Tasks in .NET 4.0 (or 3.5 with Reactive Extensions’ System.Threading.dll), and with .NET 5.0′s async and await keywords.  However, these are necessarily high-level concepts and systems, and you cannot just throw code into these systems and expect them to work correctly or predictably.  Managing threads effectively, as described here, is the first (and easiest) component of writing effective asynchronous/multithreaded systems.  The more difficult part is writing systems that can function correctly and predictably in a asynchronous environment.  I’ve found, though, that thinking about the (easier) thread management aspect can help inform the (difficult) systems design component.

Completely async file writer

I’ve been doing more and more asynchronous programming lately.  I needed to implement a logging system recently, but wanted to use asynchronous file IO.  There was also the possibility that log calls could come in faster than they could complete, so I needed to synchronize the logging calls, marking them finished when the async IO completed.  I wanted to make this as lock-free as possible.

The idea is, you have a ‘AsyncFileWriter’ field on a class, with a ‘QueueWrite’ method that threads can call to request a file write, passing the string/bytes, and a callback that fires when the IO is complete.

The ‘AsyncFileWriter’ has a Queue on it, and when calls to QueueWrite are made, a request (args and callback) is added to the Queue, and a call to a ‘WriteImpl’ is queued on the ThreadPool.  ’WriteImpl’ has basically a semaphore, only allowing 1 thread to dequeue a request (args and callback) and begin the file IO.  This ‘semaphore’ is released in the async IO complete callback.  After the semaphore is released, another call to WriteImpl is queued on the ThreadPool, which will just return immediately if the request queue is empty.

The net effect is we have, basically, a producer-consumer queue, where producer threads can also be consumers (or queue up other ThreadPool threads to be consumers).  When a request is processed, the consumer thread that finished processing it queues up another consumer thread.  If a request comes in while the ‘semaphore’ is taken, the thread just drops its request into the request Queue and immediately returns- if the ‘semaphore’ is not taken, the thread dropping off the request then takes the lock and acts as a consumer thread.

I think the pseudocode is more clear than the actual code:

class AsyncFileWriter
   string filename #initialized from constructor
   Queue _requestQueue
   bool _insideWrite

   public void QueueWrite(byte[] bytes, Action callback)
      lock (_requestQueue) _requestQueue.Add(bytes, callback)
      ThreadPool.QueueWork(WriteImpl)

   private void WriteImpl()
      if (_insideWrite)
         return
      lock (_requestQueue)
         request = _requestQueue.Dequeue()
         _insideWrite = true
      if request != null:
         fs = new FileStream(filename)
         fs.BeginWrite(request.Bytes, () => OnIOComplete(fs, request.Callback))

   private void InIOComplete(FileStream fs, Action callback)
      fs.EndWrite()
      fs.Dispose()
      _insideWrite = false
      ThreadPool.QueueWork(WriteImpl) #Keep clearing the queue
      callback()

So you can see that, we only take locks when adding or removing items from the queue. If 5 threads enter WriteImpl at the same time, only one will take the lock, and the rest will return. And when the IO is complete, it’ll queue up another ThreadPool request to keep clearing the queue.

I initially implemented this for async IO as mentioned above, but have abstracted the pattern for anything- it is just basically a way to throttle processing with a ThreadPool-managed producer/consumer queue. I’ll go over this pattern along with some code in a future post.

It is 2011. Start building responsive UIs.

In February, I finally figured out my New Year’s Resolution.  It was to build no more UIs that lock up, ever (except in exceptional circumstances where there is no other choice).  It spawned this post that I’ve finally gotten around to finishing (and some of the topics I have started queuing up posts for).

It is a well known customer-service mantra we have that, any response is better than no response. If you can or cannot help someone- or you are in the midst of doing so- you should still tell them what’s going on. It builds better relationships, communication, and confidence in your team.

So it goes with UIs. Nothing is so frustrating to users as a slow UI. Here’s a quick checklist to see whether your UI behaves as it should:

  1. A UI should always stay responsive and never lock up.
  2. A UI should give feedback about what it is doing, and the more detailed feedback the better.
  3. The user should be able to cancel something taking too long.

The frustrating thing to me is that so many of our UIs aren’t responsive, and the only reason is laziness or ignorance. I’ve written my fair share of unresponsive UIs, but I am smarter and more experienced and I say, no more! Threading APIs are sophisticated enough where any programmer really needs to know how to write asynchronous applications. Here’s what you need to learn to create a good UI (my advice is specific to .NET/Windows but the advice should apply to most platforms). I’m going to cover each topic in some detail in later posts (and maybe add/remove/modify topics).

The Windows Message Loop

To understand why your application locks up, you need to understand Windows Messages, the Main Message Loop, and other basic threading concepts. The gist is, your Main thread is what receives messages like user input and is responsible for redrawing, so it needs to be free to act on events. So that means 1) you need it to work asynchronously, so it can act on a message in the queue and go back to doing work, and 2) most processing needs to happen on a background thread. This is a highly discussed topic and much literature is available but I’ll try to give a good summary.

Offloading Work to Background Threads

Offloading work to backgrounds threads means more than just putting long-running tasks into a BackgroundWorker.  Understanding tricks for putting as much on the background as possible can make your UIs much more responsive.

System.ComponentModel.BackgroundWorker

This is the most basic way of keeping your UI responsive, and once you understand how to use it, can be incredibly effective and simple. I consider BackgroundWorkers almost trivial to set up after you create a few. They can be limiting, however, and they do incur some architectural overhead and you may have to make code design compromises. They need to be a tool in your toolbox, but you need to go further.

Tasks Parallel Library (TPL) and Task Parallelism

.NET 4.0 (and 3.5 with Reactive Extensions) has a System.Threading.Tasks namespace filled with all sorts of goodies. The Parallel class is excellent for data parallelism, so you can easy multithread foreach loops and LINQ queries. Task parallelism is much more sophisticated, however, and not too much has been written about it because it is complex. .NET 5.0 is going to be focused on asynchronous programming with the ‘await’ and ‘async’ keywords, which is mostly handled under the hood with the TPL. So you can find tons of good articles about task parallelism by searching for info about the .NET 5.0 CTP (community technology preview).

Immutability

Lots of complexity comes with asynchrony, and especially multithreaded programming. What happens if something fails, or more pertinent to UI programming, the user wants to cancel (what’s the point of having a responsive UI if the user can’t do anything?). Having a situation where we need to be able to roll back/cancel any changes means we need a way to store how things were when the process was started. You can do this by storing state (cloning, essentially), which is difficult to implement and maintain well because it is so unclear (does everything in the object’s references need to clone? What if you have something you can’t clone), or more preferably, having data that cannot change state. To be a good asynchronous programmer, you need to be good at creating and working with immutable data. If your data is immutable, you are safe- cancellation, undos, exception recovery, etc., is trivial because you have data in a good known state that nothing can change.

Naive Programming and Multithreading

I on almost weekly basis, I run into some example of naive programming regarding threading. Generally they have the following in common:
1- Uses Thread.Start
2- Show no understanding of CPU-vs-IO bound operations
3- Show no understanding of how a computer manages threads

Take this psuedocode for some widely-used routines I ran into today (actually our custom File.Copy method uses this internally!):

function CopyFiles(fromFilenames, toFilenames):
   for i = 0 to fromFilenames.Count - 1:
      System.IO.File.Copy(fromFilenames[i], toFilenames[i])

function FastCopyFiles(fromFilenames, toFilenames):
   #Bucket filenames into arrays, one for each core
   bucketedFromFiles = BucketFiles(fromFilenames)  
   bucketedToFiles = BucketFiles(toFilenames)
   for i = 0 to bucketedFromFiles.Count - 1:
      Thread.Start(CopyFiles(bucketedFromFiles[i], bucketedToFiles[i]))
   waitForAllThreadsToFinish() #implemented with some counter system

There are so many things wrong with this. I totally understand the idea- the Thread is waiting during IO, so just new up threads to send more IO, while each thread waits for the IO to complete. Here are the major problems:

  1. Newing threads are expensive! Each thread requires a 1MB stack and takes a significant amount of time to create and destroy.
  2. Managing threads is expensive! Each core on your computer can only run 1 thread at a time (basically). There are other programs running on your computer, as well as possibly other threads in your program. Windows allows ‘context switching’, which means a CPU binds to a different thread- which requires unloading and loading a thread’s cache onto the CPU, and a host of other stuff. Creating more threads than you have cores means context switches happen more often. More threads will get created in your program when the CLR detects a thread is blocked and there are things to do, or you request one with Thread.Start.
  3. Your threads are doing NOTHING! While each thread is waiting for the IO to complete, it is doing absolutely nothing. It is just killing time, and your performance.

Naive programming involves parallelizing a process, but not making it asynchronous. Parallelization (especially custom algorithms, not using the built-in ThreadPool/Threading.Tasks.Parallel.ForEach/PLINQ/etc) is good, but you NEED to be wary of IO-bound operations (or threads that launch a separate process, etc.).

The correct approach here is to basically have a single thread (well, just let the ThreadPool manage the threads) to begin an asynchronous write operation, and Wait for the tasks to finish. The ideal is that a thread gets a ‘BeginWrite’ request, runs to the HDD, drops off the request, then comes back up and does more work (probably running back to the HDD to drop off another request). As the HDD finishes the requests, a thread (the same or different ones) can pick up the notification and run a callback, signal that the original request has finished, etc. So no threads are sitting idle waiting for the HDD- they are running around frantically doing work. Which is fine- what we want to avoid is 1) creating new threads, 2) context switches, and 3) inactive threads while there’s other CPU work to do (which means wasted resources).

I’ll go more into the explanation/example for the proper way to implement that FastFileCopy method in a future post (actually probably after I rewrite the one at work). There are already lots of examples of asynchronous IO so you should be able to figure it out yourself. Which you must do if you want to write multithreaded programs. Because you don’t want to be a smart person doing naive programming.

Return top
 

Switch to our mobile site