Archive for the ‘Uncategorized’ Category

Large initializers/ctors?

With closures (and to some extent with runtime attribute assignments), I find the signatures of my UI types shrink and shrink. A lot of times we have code like this (python, but the same would apply to C#):

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    self.value = value
    self._InitButtons()    

  def _InitButtons(self):
    self.button = Button('Press Me!', parent=self)
    btn.clicked.addListener(self._OnButtonClick)

  def _OnButtonClick(self):
    print id(self.button), self.value

However we can easily rewrite this like so:

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    btn = Button('Press Me!', parent=self)
    def onClick():
      print value
    btn.clicked.addListener(onClick)

Now this is a trivial example. But I find that many types, UI types in particular, can have most or all of these callback methods (like self._OnButtonClick) removed by turning them into inner functions. And then as you turn them into inner functions in init, you can get rid of stored state (self.value and self.button).

But as we take this to the extreme, we end up with very simple classes (and in fact I could replace FooControl with a function, it doesn’t need to be a class at all), but very long init methods (imagine doing all your sub-control creation, layout, AND all callback functionality, inside of one method!).

I’ve decided I’d rather have a long init method, usually broken up into several inner functions, rather than a larger signature on the class with layout, callbacks, and stored state. In my mind, it is easier to pull something out into a type attribute, rather than remove it, as anything on the type is liable to be used externally. And breaking up your layout into instance methods that can really only be called once (_InitButtons), from the init, adds a cognitive burden for me.

So I can justify this decision to eliminate extra attributes rationally, but what seals the deal is, I’m not unit testing any of this code anyway. So whether it is in one long method, or broken up into several methods, it isn’t getting tested.

I started out as very much in the ‘break into small methods’ camp but have wholesale moved into the ‘one giant __init__ with inner functions’ camp. I’m curious what you all prefer and why?

Three options for data correctness

In a previous post, I linked to Rico Mariani’s performance advice for Data Access Layers. On G+, Tyler Good asked:

I just read the posts and the linked blogs, I had a question about some specific implementations. How do you deal with classes that represent another non-[in this case]-Python entity that may be updated outside of Python?

I’m not sure if this sort of case is outside of the scope of what’s being talked about in the articles, but if there’s a better way to do getting on things like p4 paths or elements in a Maya file (that may have been changed by the user since instantiating/loading the object) I’d really like some ideas about that.

You basically have three options and fortunately they line up easily on a scale:

Technique Correct Difficulty
Transactions Always High
Fetch-on-demand Usually Medium
Store in memory Maybe Low

Let’s get on the same page first. Let’s consider all three types of interactions- database through a DAL, perforce (or any source control) interaction, and interaction with some host application (Maya, or your engine, or whatever). So what are the three approaches and how do they differ?

Store in Memory

You create a code object with a given state, and you interact with that code object. Every set either pushes changes, or you can push all changes at once. So for example, if you have a tool that works with some Maya nodes, you create the python objects, one for each node, when you start the tool. When you change one of the python objects, it pushes its changes to the tool.

This is the simplest to reason about and implement. However, the difficultly quickly becomes managing its correctness. You need to lock people out of making changes (like deleting the maya node a python object refers to), which is pretty much impossible. Or you need to keep the two in sync, which is incredibly difficult (especially since you have any number of systems running concurrently trying to keep things in sync). Or you just ignore the incorrectness that will appear.

It isn’t that this is always bad, more that it is a maintenance nightmare because of all sorts of race conditions and back doors. Not good for critical tools that are editing any sort of useful persistent data. And in my opinion, the difficulties with correctness are not worth the risk. While the system can be easy to reason about, it is only easy to reason about because it is very incomplete and thus deceivingly simple. So what is better?

Fetch on Demand

Here, instead of storing objects in two places (your code’s memory, and where they exist authoritatively, like the Maya scene, or a Perforce database), you store them only where they exist authoritatively and create the objects when that data is queried. So instead of working with a list of python objects as with Store in Memory, you’d always query for the list of Maya nodes (and create the python object you need from it).

This can be simple to reason about as well but can also be quite slow, depending on your dependency. If you’re hitting a DB each time, it will be slow. If you need to build complex python objects from hundreds of Maya or Max calls, it will be slow. If you need to query Perforce each time, it will be slow.

I should note that this is really just a correctness improvement upon Store in Memory and the workings are really similar. The querying of data is only superior because it is done more frequently (so it is more likely to be correct). The changing of data is only more likely to be correct because it will have had less time to change since querying.

That said, in many cases the changing of data will be correct enough. In a Maya scene, for example, this will always be correct on the main thread because the underlying Maya nodes will not be modified by another thread. In the case of Perforce, it may not matter if the file has changed (let’s say, if someone has checked in a new revision when your change is to sync a file).

Transactions

Transactions should be familiar to anyone who knows about database programming or has read about Software Transactional Memory. I’m going to simplify at the risk of oversimplifying. When you use a transactions, you start a transaction, do some stuff (to a ‘copy’ of the ‘real’ data), and commit the transaction. If the ‘real’ data you are reading or updating has changed, the whole transaction fails, and you can abort the transaction, or keep trying until it succeeds.

Mass simplification but should be enough for our purposes. This is, under the hood, the guaranteed behavior of SCM systems and all databases I know of. The correctness is guaranteed (as long as the implementation is correct, of course). However, it is difficult to implement. It is even difficult to conceptualize in a lot of cases. There are lots of user-feedback implications: an ‘increment’ button should obviously retry a transaction, but what if it’s a spinner? Are you setting an explicit value, or just incrementing? Regardless, where you need correctness in a concurrent environment, you need transactions. The question is, do you need absolute correctness, or is ‘good enough’ good enough?

Recommendations

Avoid Store in Memory. If you design things this way, break the habit. It is a beginner’s mistake that I still make from time to time. Use Fetch on Demand instead. It should be your most common pattern for designing your tools.

Be careful if you think you need Transactions. Ensure they are where they need to be (database, SCM), but don’t just go around designing everything as if it needs to be transactional. If you have two programs that can edit the same file- is one or the other just winning OK? How likely is that to happen? How will you indicate the failed transaction to the user? I’d suggest designing your tools so transactions are not necessary, and just verify things are correct when they cross an important threshold (checkin, export, etc.). Do your cost-benefit analysis. A highly concurrent system will need transactions, tools that only work with local data will likely not.

It should be clear, but still worth pointing out, you can mix-and-match these patterns inside of your designs.

Hope that clarifies things, Tyler.

Why I will never develop for a big company again

I had this post written up for a long time, and it was much more ranty. But now I’ll just give you some facts (all of which are public) and let you fill in the blanks:

Your bonus is based on “Target Bonus %” x (“Your Performance” + “Company Performance” + “Studio Performance”) x “$ Your Salary”

So if company performance is crap, your bonus can be hurt substantially. I won’t say what a normal target is but it isn’t that high- the CEO’s is only about 100% so you can imagine a normal dev’s is only a small, small fraction of that.

Company sales up from previous year, but a net loss because it pays $682 million+ to buy the studio you just started working for, so your bonus gets hosed. CEO owned lots of money in the company that was bought out so he makes out like a bandit.

I don’t know how that shit is legal.

World financial crisis is in full swing and company clearly will not be profitable. So he cancels merit increases but keeps bonuses.

Well no duh- if your max bonus is 10% of your salary, a modest 2% merit increase is 1/5 the size. If your max bonus is 100% of your salary, it is 1/50 of the size. Manipulating to line his pockets.

After a failed acquisition that cost the company $22 million, the expenses were ignored for performance considerations of the executive team, who set the rules for bonuses and have a clause that they can ignore “non-recurring” expenses such as acquisitions (as if acquisitions were “non-recurring” there!).

I will tell you right now. If our CEO did some shit like this he’d be tarred and feathered. It is a great feeling to work in the same building as your CEO. To know that the only reason he doesn’t know your name is because you haven’t figured out a reason to talk to him at the bar. To know his sweat and blood helped build your company and he didn’t hop over as an executive from some multinational baked goods giant.

Taking pride in your work is a great thing, being able to take pride in the place you work feels even greater (at least for me, since I feel like it adds value to my work). I can’t say I’d categorically not work for another large company, but, not one with a CEO like that.

There’s idiomatic, and there’s just being respectful

I work in mixed language environments. Python, C#, C++, and more, can all make their rounds. It isn’t uncommon to have someone focused on C++ have to write something in another language, and it isn’t uncommon that I come across their code some point in the future.

It is easy to learn a language’s syntax but difficult to learn its idioms. Good luck trying to explain what ‘pythonic’ means to someone who is new to python or programming! So I forgive the transgressor when I see non-idiomatic code.

Usually.

There are some errors I find unforgivable. Errors that indicate a complete lack of understanding of the platform you are writing on. Errors like this (C#):

var foo = new Foo()
if (foo != null) {...}

Creating an instance is probably the most basic operation you can perform in an OO language, and the author clearly did not understand it.

Another unforgivable type of error is when someone tries to fix a bug but does not bother to understand what’s actually going on.

class Foo {
private bool _somevar;

...
}

There was some bug in the code somewhere, I can’t remember what. A developer changed ‘private bool _somevar’ to ‘private bool _somevar = False’ and declared the bug fixed (spoiler: it wasn’t).

Probably the best example comes from memory management, as the least understood things in programming tend to:

try { someUIControl.SetText(someGiantString); }
except OutOfMemoryException {
someUIControl.Clear();
GC.Collect()
someUIControl.SetText(someGiantString);
}

The only thing this did is change the stack trace. The problem was due to a .NET garbage collection implementation detail- the Large Object Heap and huge strings- and the ‘fixer’ just tried something every authority tells you not to do, which is catch an OOME.

If you’re going to leave your domain to write code in another language- I applaud you. It can show an endeavouring personality! But please have some respect for the language you are writing in- read a book, read a blog, ask for help. It’ll make you a better programmer, I promise.

Thank you, Rico Mariani, for reminding me how bad I was

A little while ago I read two great articles by Rico Mariani, a MS employee who usually blogs about performance in .NET (though python being an OO language the same advice applies there). The articles in question were these:

Performance Guidelines for Properties

Performance and Design Guidelines for Data Access Layers

I’d suggest at least skimming over them. He talks about, for property accessors, not allocating memory, locking, doing IO, having side effects, and being fast. For the DAL article, you should really read it, but the part that was especially relevant is “Whatever you do don’t create an API where each field read/write is remoted to get the value.”

It was a shocking reminder of my early days programming. Every point mentioned in those two articles, I was hands down guilty of. I don’t mean, I’ve done that sort of thing occasionally. I mean, I designed entire systems around everything you shouldn’t do with regards to properties and DAL design. To be fair, this was years ago, I was new to programming, in way over my head, and didn’t have people to turn to (no one at the studio could have told me what an ORM was or given me these suggestions about properties), so I don’t feel much guilt. And I learned better relatively quickly, well before reading those posts.

I work with a lot of new programmers, and experienced programmers who aren’t focused on higher level languages. The articles, most of all, reminded me how far I’ve come and how lucky I am. The new programmers haven’t had a chance to make the epic mistakes I have. The experienced programmers trained in a world without such useful managed languages, high quality bloggers, and sites like Stack Overflow; a world I’ve never known and I’ve benefited by learning best practices and new skills, and finding and breaking bad habits.

I remember at the time thinking how great some of their features were, the same features that, as Rico points out, are really terrible ideas. I felt fortunate that I already followed his guidelines, and even more fortunate that few people were around to witness the hideous abuses of them!

Be a deployment Boy Scout

The Boy Scouts have a rule:

Leave your campsite cleaner than you found it.

We know how to apply this rule when writing code but we often overlook this rule when it comes to installing or deploying that software.  I’ve seen, and committed, some pretty heinous accounts of changing a user’s machine, and in every single case- every single case- I’ve discovered in retrospect it was a poor decision.  Note I am only talking about internally deployed software where you have control over the environment (ie, I’m not discussing game installers and the like!).

At this point, I live by one golden rule:

Never leave persistent state on a user’s machine.  If you must, all state should be stored in a single folder.

Two caveats:

  • “Never”: Some third-party software will not adhere to this, and there are some situations where it cannot be avoided due to third party dependencies, so you may have to adapt.  I apply this rule only to what I have control over.
  • “persistent state”: Anything that sticks around after a process exits or a user logs off, that isn’t under version control.  Examples of persistent state are files, registry entries, and environment variables.  Usages include installation, file association, and settings persistence.

Some examples of things my tools or tools I’ve seen have put in or required:

  • Editing 3rd-party application preferences files or adding files to the application’s preferences folder.
  • Copying over scripts or other files out of version control onto the user’s machine.
  • Installing shell extensions.
  • Setting a user’s source control environment variables (P4PASSWD, P4CLIENT, etc).
  • Mapping a temporary drive (that scripts rely on for an absolute path, of course!).
  • Leaving persistent registry or environment variables for the user’s branch, project, etc.
  • Storing preferences for applications in multiple places.

I consider all of these mortal sins and red flags warning flares go up when I see them.

Why you shouldn’t do it!

Games development is chaotic.  Computers go through a lot of change, they install a lot of software (first and third party) and uninstall almost as much.  To make matters worse, things often go wrong, and many people are generally writing software and scripts that need to run independently and not interfere with one another.  You can avoid conflicts by not making any persistent changes to a user’s machine.  As long as everything is local to the process, or in some unique files in a well define place (AppData/Local/<company or group>/<app or tool name> on Windows), the risk of conflict is almost none.  By leaving the computer in an unmolested state, apps that do cause persistent changes become noticeable and problems more fixable (and it is easier to clean up after offenders if you have 5 suspicious environment variables rather than 50).

Change also happens in unpredictable ways.  While hard-coding a virtual disk drive seems fine, what happens when you need to run your tools on a machine (an outsourcer’s, for example) that already has a drive with that name?  Setting a persistent environment variable indicating the target branch seems fine, but what happens when 4 different tools each store their own (it will happen if you let it!)?

I’m not going to get into installers.  Don’t do it.  I’ve never seen a reason to do it for internal software.  If your studio does it, I wonder how many people actually understand it or can maintain it.  There’s less and less reason to do anything of the sort nowadays- all your python and .NET applications have no need of a traditional installation.  I’d love to be educated about why some studios use installers for their internal tools, so if you have a success (or horror) story I’d love to hear about it in the comments.

Persistence is a drug- Just say No!

I realize now that persistent settings were a deployment drug.  They didn’t make anything easier.  They were an appealing way to either do things I shouldn’t have been doing, or support workflows I shouldn’t have designed.  And global persistent state like this has the additional unfortunate effect of negatively impacting everything else in the system- because everything, and everyone, views them as the same easy solution, or key to complete power and ease over deployment and bootstrapping.

There are options.  I’ll tell you about them in future posts because I don’t have much time now.  In the meantime, join me in taking the Deployment Boy Scout’s Oath:

On my honor, I will do my best, to do my duty to developers and their computers.  To avoid the use of persistent global state, to seek out better solutions to deployment problems, to keep users’ machines clean and under their control, and to keep my code free of such corrupting influences, always.

Refraktor

Refraktor:  (verb) When you refactor some code and in the process change or mess it up so completely that you need to revert all your changes.

Run/debug your way to brittle software!

While working on pynocle some time ago, I found myself getting away from TDD and going back to the more traditional “run-debug-fix” pattern.  Write code you think is correct, run it to see if it is, if it isn’t, stick a breakpoint and see what’s wrong, change code, repeat until there are no problems.

While this can often be the quickest way to get something working, it ultimately and always comes back to bite.  I’m happy that I’ve gotten to a point with TDD where I notice this behavior and it makes me feel dirty.  Though not always dirty enough to stop it, especially if I’m in a difficult-to-test environment depending on modules I can’t run from pure python.

The problems with run-debug-fix are many.

  1. The code you are writing is difficult enough that you didn’t write it correctly the first time.  So what makes you think you or someone else is going to have an easy time debugging or understanding it in the future.
  2. If the bug was logical, there was obviously some context, state, or situation you had not thought of.  How are you sure you will remember this context or situation when you change the code in the future?  How can you communicate that your code is relying on a certain state somewhere else?
  3. If your design is not testable, you are making it even less testable by adding more implicit logic where you’re fixing the bug.  Implicit logic that is going to be very difficult to test for when you come back later and forget about it.
  4. Most importantly: Every bug you fix or feature you add using run-debug-test is a doubly negative activity.  -1 for the reasons above and -1 for the missed opportunity to add a test.  It would be better to leave the bug there or delete the offending code entirely.  You are increasing the complexity of your software by supporting another code path that did not previously work or exist, instead of increasing the stability of your software by adding tests.
The amount of time you spend under the debugger is inversely proportional to the quality of your software.
I used to pride myself on being able to quickly debug and fix problems in my or other people’s code.  I now take far more pride in having code that is well tested so that other people can fix problems without spending a long time debugging them.

Validation routines as an intro to unit testing

For the past several weeks I’ve been introducing TDD and a focus on unit testing at work to the TA group.  Well I introduced it months ago but am now just convincing (forcing) people to do it.  This can be an imposing subject for people that have spent their entire careers scripting inside of Maya.  I think I’ve finally figured out the easiest way to ease people into TDD and unit testing in a way that is both easy to do and demonstrates immediate benefit (and in fact done it successfully with two people already).

Writing data/content validation.

Writing validation routines for data follows the TDD paradigms to a T.  You just need to explain to people how to write the tests first, and then run those tests from their IDE.  So instead of writing some complex function filled with if checks that tests for different things and needs to be commented to explain all the things it tests for and is constantly breaking, you just write your battery of tests for valid and invalid content.  Then you run them, and keep improving your validation method(s) until the tests pass.  Then, in true TDD style, only go and refactor the validation code itself if you need it- with confidence you aren’t breaking things.  And then when you think of more things to validate, you add more tests, with no risk of regression.

TDD is tailor made for data/content validation.

Walk them through the first few tests yourself to get them set up, then have them write the rest of the tests for a validation routine.  It will also be very clear if they ‘get it’ or not.

Once they get it, there are some other areas that TDD can be applied to with almost as much ease that I’ll go over next time.

Do you remember what being happy feels like?

As I’m coming up on my 5-month mark at my new job, I was thinking about how happy I am here at CCP.  I’ve been waking up at 7:30 for weeks, braving the frigid Icelandic mornings, because I have been so excited to get into work.

My happiness caused me to reflect upon the utter boredom and frustration I felt at my previous job my last six months or so there.  But then I started remembering further back and wondering, when was the last time I felt like this?  Developing something I felt passionate about, something that people saw value in without me having to spend months convincing them, something I didn’t have to beg to get resources to work on.  In a place where people are overwhelmingly positive and non-hostile, with just enough aggression to make sure we can have lively discussions.

I know CCP has its problems and I’ve come in at, in many ways, a good time.  I know there are people unhappy where I am happy.  I know there are people who can work in the same type of environment I was unhappy in and love work every day.  Different strokes for different folks.

What I’m saying is, I spent a long time working, thinking I was happy, but it was because I forgot what happy was.  Or maybe I wasn’t thinking I was happy, maybe I just forgot to consider it entirely.  I forgot what a healthy work environment was, because I came so invested with the people, and stuff I had made, I lost perspective.  (Perhaps this was why I became so unhappy when I switched teams- I lost my already strained sense of ownership and commitment).

No rant or advice or recommendations or even new or good ideas.  Just wanted to share.

-Rob

Return top
 

Switch to our mobile site