Archive of articles classified as' "threading"

Back home

goless now on PyPI

12/06/2014

goless is now available on the Python Package Index: https://pypi.python.org/pypi/goless . You can do pip install goless and get Go-like primitives to use in Python, that runs atop gevent, PyPy, or Stackless Python. You can write code like:

channel = goless.chan()

def goroutine():
    while True:
        value = channel.recv()
        channel.send(value ** 2)
goless.go(goroutine)

for i in xrange(2, 5):
    channel.send(i)
    squared = channel.recv()
    print('%s squared is %s' % (i, squared))

# Output:
# 2 squared is 4
# 3 squared is 9
# 4 squared is 16

I’ve also ported the goless benchmarks to Go, for some basic comparisons to using goless on various Python runtimes (PyPy, CPython) and backends (gevent, stackless): https://goless.readthedocs.org/en/latest/#a-benchmarks

Thanks to Rui Carmo we have more extensive examples of how to use goless (but if you’ve done or read any Go, it should be relatively straightforward). Check them out in the examples folder: https://github.com/rgalanakis/goless/tree/master/examples

And just a compatibility note (which is covered in the docs, and explained in the exception you get if you try to use goless without stackless or gevent available): goless works out of the box with PyPy (using stackless.py in its stdlib) and Stackless Python. It works seamlessly with gevent and CPython 2.7. It works with PyPy and gevent if you use the tip of gevent and PyPy 2.2+. It will support Python 3 as soon as gevent does.

Thanks and if you use goless, I’m eager to hear your feedback!

No Comments

goless Benchmarks

9/06/2014

I benchmarked how goless performs under different backends (goless is a library that provides a Go-like concurrency model for Python, on top of stackless, PyPy, or gevent). Here are the results, also available on the goless readthedocs page:

Platform Backend   Benchmark      Time
======== ========= ============== =======
PyPy     stackless chan_async     0.08400
CPython  stackless chan_async     0.18000
PyPy     gevent    chan_async     0.46800
CPython  gevent    chan_async     1.32000
~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~
PyPy     stackless chan_buff      0.08000
CPython  stackless chan_buff      0.18000
PyPy     gevent    chan_buff      1.02000
CPython  gevent    chan_buff      1.26000
~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~
PyPy     stackless chan_sync      0.04400
CPython  stackless chan_sync      0.18000
PyPy     gevent    chan_sync      0.44800
CPython  gevent    chan_sync      1.26000
~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~
PyPy     stackless select         0.06000
CPython  stackless select         0.38000
PyPy     gevent    select         0.60400
CPython  gevent    select         1.94000
~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~
PyPy     stackless select_default 0.00800
PyPy     gevent    select_default 0.01200
CPython  stackless select_default 0.19000
CPython  gevent    select_default 0.25000

The trends are that PyPy with its built-in stackless support is fastest, then Stackless Python (2-5x), then PyPy with gevent (5-10x), and finally CPython with gevent (15-30x).

Remember that these are benchmarks of goless itself; goless performance characteristics may be different in real applications. For example, if you have lots of C extensions, CPython/stackless may pull ahead due to stack switching.

If you’re using goless for anything, I’d love to get some benchmarks, and find out how it’s working, so please comment below, or create a GitHub issue.

Disclaimers: It’s possible that goless is inefficiently using gevent, but the backend-specific code is so simple I doubt it (see goless/goless/backends.py). These benchmarks are all done against Python 2.7. Also I am no benchmark master (especially with PyPy) so there may be problems, look over goless/benchmark.py for benchmark code. These were done mostly for curiosity.

No Comments

goless- Golang semantics in Python

21/04/2014

The goless library https://github.com/rgalanakis/goless provides Go programming language semantics built on top of Stackless Python or gevent.* Here is a Go example using channels and Select converted to goless:

c1 = goless.chan()
c2 = goless.chan()

def func1():
    time.sleep(1)
    c1.send('one')
goless.go(func1)

def func2():
    time.sleep(2)
    c2.send('two')
goless.go(func2)

for i in range(2):
    case, val = goless.select([goless.rcase(c1), goless.rcase(c2)])
    print(val)

While I am not usually a Go programmer, I am a big fan of its style and patterns. goless provides the familiarity and practicality of Python while better enabling the asynchronous/concurrent programming style of Go. Right now it includes:

  • Synchronous/unbuffered channels (send and recv block waiting for a receiver or sender).
  • Buffered channels (send blocks if buffer is full, recv blocks if buffer is empty).
  • Asynchronous channels (do not exist in Go. Send never blocks, recv blocks if buffer is empty).
  • The select function (like reflect.Select, since Python does not have anonymous blocks we could not replicate Go’s Select statement).
  • The go function (runs a function in a tasklet/greenlet).

goless is pretty well documented and tested, but please take a look or give it a try and tell us what you think here or on GitHub’s issues. I’m especially interested in adding more Go examples converted to use goless, or other Go features replicated to create better asynchronous programs.**


*. goless was written at the PyCon 2014 sprints by myself, Carlos Knippschild, and Simon Konig, with help from Kristjan Valur Jonsson and Andrew Francis (sorry the lack of accents here, I am on an unfamiliar computer). Carlos and myself were both laid off while at PyCon- if you have an interesting job opportunity for either of us, please send me an email: rob.galanakis@gmail.com

**. We are close to getting PyPy support working through its stackless.py implementation. There are some lingering issues in the tests (though the examples and other ‘happy path’ code works fine under PyPy). I’ll post again and bump the version when it’s working.

2 Comments

Why I love blogging

29/03/2014

I started to write a post about how I missed multithreading and speed going from C# to Python. I ended up realizing the service we built which inspired the post was poorly designed and took far more effort than it should have. The speed and multithreading of C# made it easier to come up with an inferior design.

The service needed to be super fast, but the legacy usage pattern was the problem. It needed to run as a service, but then we realize it’ll be running as a normal process. This is what happens when you focus too much on requirements and architecture instead of delivering value quickly. You create waste.

I wouldn’t have realized all of this if I didn’t sit down to write about it.

No Comments

A story of simplification and abstraction (stackless timeouts)

11/01/2014

Someone was asking me the other day how to implement a timeout in a thread. His initial implementation used two background threads: one to do the work (making requests to a web service and updating a counter), and the other in a loop polling the counter and sleeping. If the first thread stopped updating the counter, the second should report some sort of error.

I helped him simplify the design in a couple ways. First I had him use stackless instead of threads and taught him how threading and microthreads work. Based on that, I suggested that instead of a counter and loop/sleep, there is a parent tasklet that kicks off a child tasklet which does the actual work.* The parent tasklet recvs on a channel with a timeout, and the child tasklet sends on the channel to act like a heartbeat. If the parent recv times out, it means the child tasklet hasn’t reported in and the user can be alerted. This simplified the code considerably.

I then asked a colleague (Kristján Valur) how to do a timeout with stackless, and he told me about the stacklesslib.util.timeout context manager. Doh! It ended up being as simple as:

try:
    for item in items:
        with stacklesslib.util.timeout(200):
            process_item(item)
except stackless.util.TimeoutError:
    tell_user_about_timeout()

It’s pretty amazing what sort of power you’re able to wield with a good language and framework. It’s so important to have the right abstractions, but you need to know how to use it. Even with documentation, nothing beats a little help from your friends.


*Instead of a channel, we probably could have used an Event.

5 Comments

A use for killing tasklets

24/11/2013

A few weeks ago, I posted about Killing Tasklets, a feature of Stackless Python that allows you to abort a tasklet at any point. And it turns out I had a perfect use case for them just last week and things went swimmingly well.

We have a client program that controls the state of 3D objects, and a server program that does the rendering. The client calculates a serialized version of the server’s state (based on the client’s state) and sends it to the server. It does this through a publish/subscribe. The server receives the state and applies it to the current scene, moving objects and the like (of course we have other mechanisms for rebuilding the entire scene, this is just for ‘updating’ the attributes of the current object graph).

This causes a problem when the server takes longer to apply the new state to its scene than it does for the client to calculate it (maybe the client is super fast because it is caching everything). The server lags further and further behind the client. So when the server receives the ‘update’ command, it kicks off and stores the tasklet to do the updating. If another ‘update’ comes in while the previous update’s tasklet is still alive, it kills that tasklet and starts a new one. This way we get as smooth an updating as possible (dropping updates would cause more choppiness). This does require that updates are ‘absolute’ and not relative to other updates, and can be aborted without corrupting the scene.

Killing tasklets turned this into very straightforward code. In fact none of it other than the few lines that handle subscriptions on the server know anything about it at all. This sort of “don’t think about it too much, it just works like you’d expect” promise of tasklet killing is exactly why I like it and exactly what was fulfilled in my use case.

No Comments

Killing Tasklets

7/11/2013

Today at work we had a presentation from the venerable Kristján Valur Jónsson about killing Tasklets in Stackless Python, a technique he and partner-in-crime Matthías Guðmundsson started using for their work on DUST514. As someone who’s done some asynchronous programming, this idea sounded blasphemous. It took a little while to stew but I see the value in it now.

The core idea is really simple. Code invokes the kill method on a tasklet. All kill does is synchronously raise a TaskletExit exception (which inherits from BaseException so as not to be caught by general error handling, same idea as SystemExit) on the tasklet. This bubbles up to the Stackless interpreter, and is caught and swallowed.

There are details of course, but that’s the gist. There are a few reasons I like this so much.

First, it uses standard Python exception handling. It’s really easy to explain and understand, like no question about finally blocks being executed (they are). The fact that the killed code runs and dies as I expect makes this at least something to look at.

Second, it can be synchronous. There’s no ‘kill and spin until it’s not alive’ type of thing. When you tell the tasklet to die, it faithfully obeys. You are Thulsa Doom. You can kill it asynchronously but the fact that synchronous behavior works in such a straightforward way is a big thing.

Third, it actually cancels the tasklet where it is. You cannot replicate this with some sort of cancellation token or flag, which always has some delay if it supports cancellation at all. It not only simplifies things but makes them work totally as desired. So even if you are theologically opposed to Tasklet.kill and prefer other disciplined techniques for writing async code, you can’t argue with the superior results.

In the end, you still need to write good code and maintain discipline about shared state and side effects, but I see not only the value in killing tasklets but the superiority of the choice. I hope Kristján (and also Christian Tismer, the other Christian primarily responsible for Stackless) do a more thorough talk about killing tasklets at some conference.

1 Comment

Python Singletons

18/11/2012

In my opinion, good python libraries and frameworks should spend effort guiding you towards the ‘pit of success’, rather than trying to keep you from failing. They do this by spending most effort on things related to the critical path- clear interfaces, simple implementations, thorough documentation.

Which is why singletons are, to me, the worst form of framework masturbation in python. You will never be able to stop people from doing something stupid if they’re determined (in pure python). In the case of a singleton, that means instantiating more than one instance of a type. So spending effort on ‘designing’ singletons is not just a waste of effort, but actively harmful. Just provide a clear way to use a single instance, and your system should fail clearly if it detects an actual problem due to multiple instances (as opposed to, trying to detect multiple instances to keep said problem from happening).

The best method for singletons in python, then, is- whatever is simplest!

  1. Some form of module or class state is, to me, the clearest. It requires someone reading or using your code to know nothing more than the most basic python. Just prefix your class def with an underscore, and expose an accessor function to an instance stored on the module (or on the class). The capacity for failure is minimal and the behavior is clear (it requires no behavior modification to the type itself).
  2. Overriding __new__ is pretty bad but OK. It requires someone to understand the subtleties of __new__, which is a useful thing to teach someone but, are singletons really the time and place?
  3. Using a metaclass is a terrible solution. It has a higher likelihood of failure (how many people understand the nuances of metaclasses!?). Misdirection even for people just reading your code, trying to understand your type’s behavior. Avoid.
The question to ask yourself before doing any of this is, “is a singleton a technical requirement or an architectural preference?” Ie, a single instance of an application event loop (QApplication, etc) I’d consider a technical requirement and make it foolproof (in C?). But technical requirements are few and far between and should be driven by underlying system/OS requirements rather than your code’s design or architecture. If it’s an architectural preference- “there should only be one instance of this manager/window/cache”- there’s absolutely no reason to confuse your code (especially you object’s behavior!) to achieve it. Just use design, documentation, and examples, to show people the right way to use it.
4 Comments

Why GUI’s Lock Up

8/04/2012

This is a post for the Tech Artists and new programmers out there. I am going to answer the common question “why do my GUI’s lock up” and, to a lesser extent, “what can I do about it.”

I’m going to tell a story of a mouse click event. It is born when a user clicks her mouse button. This mouse click goes from the hardware, to the OS. The OS determines that the user wants to interact with your GUI, so it sends the mouse click event to your GUI’s process. Here, the mouse click sits in a queue of other similar events, just hanging out. At some point, usually almost immediately, your process (actually the main thread in the process) goes through and, one by one, dispatches the items (messages) in the queue. It has determined our mouse click is over a button and then tells the button it’s been clicked. The button then usually repaints itself so it looks pressed, and then invokes any callbacks that are hooked up. The process (main thread) goes into one such callback you hooked up, that will look at 1000 files on disk. This takes a while. In the mean time, the user is clicking, but the messages are just piling up in the queue. And then someone drags a window over your GUI, because they’re tired of their clicks not doing anything and want to see what’s new on G+. The OS sends a message to your UI that it needs to repaint itself, but that message, too, just sits in the queue. At some point, your OS may even realize your window is not responding, and fade it out and change the title bar. Finally your on-button-click callback finishes, the process (thread) is done processing our initial mouse click, and then goes back to processing the messages that may have accumulated in the queue, and your UI will refresh and start responding again.

All this happens because the thread that processes messages to draw the UI was also responsible for looking at 1000 files on disk, so it wasn’t around to respond to the paint and click messages. A few pieces of info:

  1. You can’t just ‘update the UI’ from the middle of your code. In addition to being terrible form code-wise, clearing the message queue would just cause other things to block the main thread, and it’d all get into one giant asynchronous mess. Some programs may have their own UI framework that supports this. Don’t trust it. You really just need the main/GUI thread clear as much as possible to respond to events.
  2. Your GUI process has a single ‘main thread.’ A thread roughly corresponds to, and I’m being not nuanced here, the software concept of a hardware CPU core. Your GUI objects can only be created and manipulated by the main thread.

This means, you want to keep your main thread free so it can act on GUI stuff (paint events, mouse clicks) only. The processing, such as your callback that looks at 1000 files, should happen on another thread (a background thread). When the processing is complete, it can tell the GUI thread that it is finished, and the GUI thread can update the UI. Your background thread can also fire events or invoke a callback that will be picked up by the GUI thread, so the GUI can update a progress bar or whatever.

How you actually do this varies with each UI framework. .NET, including WinForms and WPF, is quite easy to use (look at the BackgroundWorker class, but the Tasks Parallel Library and Async CTP make that less necessary). Python GUI frameworks are a bit worse off- multithreading in python in general is worse off- so it’ll be different for each one, and probably not as simple as .NET. There’s no excuse for python GUI’s to lock up, it just takes a little more effort to get it completely right (like callbacks to update a UI are a bit tricky).

There is one other vital thing to keep in mind- DCC programs generally require you to interact with the API or run all their script on the main thread, which as discussed should also be kept clear. Bummer! So the best thing we can do is block while we get our data from the scene, put the processing on a background thread, and report back to the main thread when done, applying the new data back to the scene if necessary. Unfortunately, if your processing interacts with the API in any way, you probably need to put it in the main thread as well. So, right now, your GUI’s in DCC apps may need to lock up, by design. There are, in theory, ways to avoid this, but they’re well outside of the scope of what you can handle if you’re learning anything from this article.

Whatever your language and program, those are the essentials of why your GUI locks up.

Note: This info is not nuanced (and is less accurate the lower down things go), may not be terminologically perfect (though it should be vulgarly comprehensible), and is Windows-only, though it should be enough to know how any higher-level GUI framework (such as Qt) would work on a non-Windows system).
2 Comments

Three options for data correctness

24/01/2012

In a previous post, I linked to Rico Mariani’s performance advice for Data Access Layers. On G+, Tyler Good asked:

I just read the posts and the linked blogs, I had a question about some specific implementations. How do you deal with classes that represent another non-[in this case]-Python entity that may be updated outside of Python?

I’m not sure if this sort of case is outside of the scope of what’s being talked about in the articles, but if there’s a better way to do getting on things like p4 paths or elements in a Maya file (that may have been changed by the user since instantiating/loading the object) I’d really like some ideas about that.

You basically have three options and fortunately they line up easily on a scale:

Technique Correct Difficulty
Transactions Always High
Fetch-on-demand Usually Medium
Store in memory Maybe Low

Let’s get on the same page first. Let’s consider all three types of interactions- database through a DAL, perforce (or any source control) interaction, and interaction with some host application (Maya, or your engine, or whatever). So what are the three approaches and how do they differ?

Store in Memory

You create a code object with a given state, and you interact with that code object. Every set either pushes changes, or you can push all changes at once. So for example, if you have a tool that works with some Maya nodes, you create the python objects, one for each node, when you start the tool. When you change one of the python objects, it pushes its changes to the tool.

This is the simplest to reason about and implement. However, the difficultly quickly becomes managing its correctness. You need to lock people out of making changes (like deleting the maya node a python object refers to), which is pretty much impossible. Or you need to keep the two in sync, which is incredibly difficult (especially since you have any number of systems running concurrently trying to keep things in sync). Or you just ignore the incorrectness that will appear.

It isn’t that this is always bad, more that it is a maintenance nightmare because of all sorts of race conditions and back doors. Not good for critical tools that are editing any sort of useful persistent data. And in my opinion, the difficulties with correctness are not worth the risk. While the system can be easy to reason about, it is only easy to reason about because it is very incomplete and thus deceivingly simple. So what is better?

Fetch on Demand

Here, instead of storing objects in two places (your code’s memory, and where they exist authoritatively, like the Maya scene, or a Perforce database), you store them only where they exist authoritatively and create the objects when that data is queried. So instead of working with a list of python objects as with Store in Memory, you’d always query for the list of Maya nodes (and create the python object you need from it).

This can be simple to reason about as well but can also be quite slow, depending on your dependency. If you’re hitting a DB each time, it will be slow. If you need to build complex python objects from hundreds of Maya or Max calls, it will be slow. If you need to query Perforce each time, it will be slow.

I should note that this is really just a correctness improvement upon Store in Memory and the workings are really similar. The querying of data is only superior because it is done more frequently (so it is more likely to be correct). The changing of data is only more likely to be correct because it will have had less time to change since querying.

That said, in many cases the changing of data will be correct enough. In a Maya scene, for example, this will always be correct on the main thread because the underlying Maya nodes will not be modified by another thread. In the case of Perforce, it may not matter if the file has changed (let’s say, if someone has checked in a new revision when your change is to sync a file).

Transactions

Transactions should be familiar to anyone who knows about database programming or has read about Software Transactional Memory. I’m going to simplify at the risk of oversimplifying. When you use a transactions, you start a transaction, do some stuff (to a ‘copy’ of the ‘real’ data), and commit the transaction. If the ‘real’ data you are reading or updating has changed, the whole transaction fails, and you can abort the transaction, or keep trying until it succeeds.

Mass simplification but should be enough for our purposes. This is, under the hood, the guaranteed behavior of SCM systems and all databases I know of. The correctness is guaranteed (as long as the implementation is correct, of course). However, it is difficult to implement. It is even difficult to conceptualize in a lot of cases. There are lots of user-feedback implications: an ‘increment’ button should obviously retry a transaction, but what if it’s a spinner? Are you setting an explicit value, or just incrementing? Regardless, where you need correctness in a concurrent environment, you need transactions. The question is, do you need absolute correctness, or is ‘good enough’ good enough?

Recommendations

Avoid Store in Memory. If you design things this way, break the habit. It is a beginner’s mistake that I still make from time to time. Use Fetch on Demand instead. It should be your most common pattern for designing your tools.

Be careful if you think you need Transactions. Ensure they are where they need to be (database, SCM), but don’t just go around designing everything as if it needs to be transactional. If you have two programs that can edit the same file- is one or the other just winning OK? How likely is that to happen? How will you indicate the failed transaction to the user? I’d suggest designing your tools so transactions are not necessary, and just verify things are correct when they cross an important threshold (checkin, export, etc.). Do your cost-benefit analysis. A highly concurrent system will need transactions, tools that only work with local data will likely not.

It should be clear, but still worth pointing out, you can mix-and-match these patterns inside of your designs.

Hope that clarifies things, Tyler.

No Comments