How do you estimate that which you’ve never done?

by Rob Galanakis on 29/01/2015

Have you heard about #noestimates? No? Well I’m sure you can guess what it is anyway. But reading the debates reminded me of a story.

While at Game Developer’s Conference a few years ago, I was arguing about estimation with a certain project manager, who, despite having no actual development experience, was in charge of development (Icelandic society is notoriously nepotistic).

“So, maybe no estimation works for your small projects, but when you have to do big projects, and you need to ask for budget, and coordinate many departments and offices, and you need to plan all this in advance, what do you do? How would you plan Incarna?”

Incarna was CCP’s expansion that introduced avatar/character-based “gameplay” into EVE Online. What shipped was the ability for your avatar to walk (not run!) around a room. It was massively over budget, behind schedule, and under delivered. A few months later, 20% of the company was laid off. There’s been no active development on Incarna since 2011, and World of Darkness- which continued to use Incarna’s core technology- was cancelled and the team laid off earlier this year. It was, quite simply, the biggest disaster I’ve seen or heard of my career.*

A character-based game is also something CCP had never done before. They are massively- MASSIVELY- more technologically complex than the “marbles in viscous fluid” EVE flight simulator. CCP did not have the in house experience, especially in Iceland, where most of the (very smart) engineering team had never worked on character based games.

So it was pretty hi-larious that a project manager was using Incarna as an example of why estimation is necessary. But cognitive dissonance is nothing new. Anyway, my response was:

“You don’t plan Incarna. You greenlight a month of development. At the end of a month, you see where things are. Do you keep going for another month? If you are happy with the spend and progress, keep going. If not, pull the plug. Once you can make a prediction at the start of a month, and it holds true for that month, and you do this two times in a row, maybe make a prediction for two months and see how it plays out.
You may pass a year this way. Well, a year isn’t a long time for developing a character-based MMO and game engine from scratch. But at the end of the year, you at least have some experience. But you keep going. If your velocity is consistently predictable, you estimate further out. Eventually, if you can get your velocity stable at the same time you’re growing and developing, you have a fighting chance.**
When your velocity isn’t stable, you reign things in and figure out. If you go through a year of missed month-long predictions, you need to change things drastically (or reboot entirely) if you hope to get something predictable.”

Nothing really insightful there of course- I’m just parroting what has worked me me and many, many others, from Lean-inspired methodologies (and this one in particular says traditional yearly budget cycles are responsible for many terrible business decisions).

A couple months ago I was asked if a significant new feature could get done by June. It would build on several months of foundation and other features. I responded that I was pretty confident that if we aim for June we would have it by September. My rationale, simply, was that previous similar projects shipped 3 or more months late, and I didn’t have enough experience with the team to give a more accurate estimate.

The best predictor of future behavior is past behavior. You need to create historical data before you can extrapolate and plan.

The historical data also needs to be “meaningful.” That is a much more nuanced topic, though.

* It should go without saying that disasters the scale of Incarna are 100% at the hands of management.

** On Star Wars: The Old Republic, management took an interesting strategy of driving velocity into the ground so that while it was terrible individually, it was at least stable. They could then increase the number of people resources and predict, pretty reliably, when it could ship. The game ended up costing about $200 million (I suspect much more, actually), but it wouldn’t have shipped otherwise.

No Comments

Change should be the ally of quality

by Rob Galanakis on 26/01/2015

In The Beauty of Testing, Steven Sinofsky writes:

…great testers understand one the cardinal rules of software engineering—- change is the enemy of quality.

This is not a cardinal rule. This is a outdated and obsolete mode of thinking. Change is how you discover great UX. Change is how you refactor and reduce technical debt. Change is how you incrementally improve both your product and code quality.

Maybe that’s too obvious, and clearly Sinofsky isn’t arguing for static software. More nuanced (and the rest of the piece provides that nuance) would be “change inevitably introduces bugs, and bugs reduce quality.”

This too I take issue with. Your codebase should be verifiably better after you fix a bug: you’ve found a shortcoming in your automated tests, so you add a test, and maybe refactor some stuff as well. Or, you’ve identified a bad experience, and can change it to be better in a controlled manner. A bug is an opportunity for improvement. Without bugs, it can be very difficult to improve.*

It can be difficult for anyone who hasn’t worked in a codebase with extensive testing to understand this. In most cases, fixing bugs is playing whack-a-mole. Whack-a-mole is unacceptable to me. Every change we make at Cozy is making the code clearer, simpler, better tested. It’s making the product smoother, faster, and more intuitive.

Change is necessary; it is up to you to determine if it is a friend or foe.

If you’re practicing disciplined development and automated testing and not creating many bugs, good job! This post isn’t for you :)

1 Comment

Technical debt metaphors get it so wrong

by Rob Galanakis on 21/01/2015

In my previous post about technical debt, I explained how modern definitions of technical debt are harmful. Now I turn my attention to equally harmful metaphors.

Viktoras Makauskas made the following metaphor in a comment on my last post. This is a pretty perfect stand-in for metaphors I’ve read in other articles that harmfully define technical debt.

Imagine your car gets a strange rattle. You go to your mechanic and he says, “it’s your exhaust pipe holder, you need to replace it, but it’s gonna take a while to order a part and ship it, so just park your car here and come back in a week”. You say “no, I have this weekend trip planned, is there something we can do now?”. They say “yeah, we’ll put a strap on it meanwhile, just drive a little more careful and it should hold, but make sure to come back and do a proper fix”. Mechanic charges you now, and then a bit later.

This seems sensible on first read. But upon closer inspection, it’s quite clear the roles here are totally wrong*:

  • The mechanic is the programmer (the role of the “expert”). Well, a mechanic may or may not see your car ever again. They do not have a vested interest in your choice. A mechanic’s relationship to a car is totally different from a programmer’s relationship to code.
  • “You” are the “business” (the role of the “stakeholder”). The metaphor assumes that if you are irresponsible, it only impacts you (it’s your car, your money, your time). This is a problem. A programmer is impacted by business decisions in a way a mechanic is not impacted by whether you fix your car now or later.

This isn’t a simple language problem. It is a fundamental misunderstanding of roles that is naive to the way software development works. Programmers will be the primary sufferers of technical debt. Eventually the business will suffer with a slower pace of innovation and development and higher turnover. But well before that, programmers will be fixing (and refixing) obscure bugs, will bristle under management that tells them to go faster, will be working extra hours to try to improve things, and will eventually burn out. The business will only suffer once real damage has been done to a programming team, and many have given up.

This is why control of technical debt must be in the hands of programmers. Definitions or metaphors that urge otherwise are actively harmful.

Let me close by pointing out I’m just repeating what Ward Cunningham has already written about the original technical debt metaphor. The article ends with:

A lot of bloggers at least have explained the debt metaphor and confused it, I think, with the idea that you could write code poorly with the intention of doing a good job later and thinking that that was the primary source of debt.
I’m never in favor of writing code poorly, but I am in favor of writing code to reflect your current understanding of a problem even if that understanding is partial.

Thanks Ward.

* There are also a couple other problems with this metaphor. First, if “you” and the mechanic are the same person, and responsible for both business and implementation? In that case, there’s no need for a metaphor at all. Second, what happens if the exhaust fails? Do you become stranded? Does the car catch fire? What’s presented here is a false choice between a “correct” solution (replacement) or a “sloppy” solution (strapping it on). Why not rent a car? If there’s no responsible-but-relatively-cheap decision (there almost always is!), it’s still never acceptable to make an irresponsible decision.


Building Sphinx documentation for unfriendly code

by Rob Galanakis on 11/01/2015

Some Twitter friends were discussing how to get Sphinx to work with mayapy to build documentation for code that runs in Autodesk Maya. I’ve had to do this sort of thing extensively, for both Maya and editor/game code, and have even run an in-house Read The Docs server to host everything. I’ve learned a number of important lessons, but most relevant here is:

Always generate your documentation using vanilla Python. Never a custom interpreter.

There’s no philosophical reason for this*. I’ve just found it, by far, the path of least resistance. All you have to do is some mocking in

import mock
for mod in ['maya.cmds', 'pymel.core']: # and whatever else you need
    sys.modules[mod] = mock.MagicMock()

(I do not have the code in front of me so this may be slightly wrong. Perhaps an ex-colleague from CCP can check what used to be in our

Now when Sphinx tries to import your module that has import pymel.core as pmc, it will work fine. That is, assuming your modules do not have some nasty side effects or logic on import requiring correctly functioning modules, which you should definitely avoid and is always unnecessary.

When your documentation generation breaks, it’s now a simple matter of adding a string in one place, rather than a several hour debugging session.

Don’t say I didn’t warn you!

* If anything, I’m philosophically more inclined to use mayapy. So that should tell you what sort of bogeymen await!

No Comments

Undefining “technical debt”

by Rob Galanakis on 6/01/2015

For me, technical debt is defined pretty loosely as stuff you don’t like in the code and need to change to keep up velocity. However, I’ve seen lots of articles lately discussing a precise definition of “technical debt.” I would sum them up as:

  • Technical debt is incurred intentionally. Sloppy code or bad architecture is not debt.
  • It is a business decision to incur technical debt.
  • It is a business decision to pay down technical debt.

I hate this characterization of technical debt. I hate it because it’s damaging. It assumes a conversation like this happens:

Manager: “How long to do this feature?”
Programmer: “We can do that feature in 4 weeks properly, or 2 weeks if we take shortcuts that will hurt our velocity in the future.”
Manager: “OK, take a shortcut and get it down ASAP.”
… 2 weeks later …
Manager: “How long to do this feature?”
Programmer: “We must spend 2 weeks paying down our technical debt, then another 2 weeks to do the feature.”
Manager: “That sounds fine.”

Every muscle in my body twinges when I think about this. Quality is not something you can put off to later. The idea that a team would do a sloppy job but have the rigor to repay it later is unbelievable. The closest I’ve seen is rewriting a system after years of shortcuts, which often does not end well. This mentality goes along with “how many bugs you have should be a business decision”. This isn’t OK. Do not write something you do not plan on living with. Do not place the responsibility of doing a good job on the business. I find it sad that a programmer would think such behavior acceptable. This is your life. This is your code. Take some responsibility. Take pride in your work.

Or don’t, and sling garbage while getting paid a pretty penny. Just don’t pretend you’re respecting your craft.

(I just want to take a moment to give credit to the team at Cozy. We recently had a couple weeks of crunch. The team delivered fully tested code the entire time).


Holiday (product) shipping

by Rob Galanakis on 2/01/2015

This was an interesting holiday season, work-wise, for three reasons.

First: My work was closed down from Dec 20th to Jan 4th (except for Customer Support and whichever developer was on firefighting duty, though that is all remote). We shipped two large products on December 17th, which was a bit too close for comfort, but things went OK and it gave us a few days to fix issues.

Second: I was working a couple hours a day while my son napped. I have quite a backlog of pull requests waiting to get in.

Third: On December 31st at about 5pm, we realized our emails hadn’t been going out. Our email service decided to ship 43,000 lines of code the day before, which resulted in a partial outage for some customers (they sent us success responses but things then broke internally).

What lessons did I learn?

First, if you’re going to ship two days before vacation, make sure your work is solid. We had one deployment on Sunday the 21st for some bugs we didn’t want to live with for 2 weeks, but other than that no new work has gone out. We shipped some solid code, thankfully.

Second, if you’re going to work over a holiday, don’t generate work for others. I really want to get the work I’ve been doing out to production, which would require 1) a code review and 2) a deploy of new code. Even if I skipped code review and deployed myself, if shit hit the fan or I introduced some new bug, I’m making work for others. I took a lot of discipline but I’m proud to say that I have fifteen open pull requests and not a single one is reviewed yet. It’ll be a busy Monday and Tuesday but that’s better than messing with peoples’ vacation.

Third, two weeks is a really long time to shut down. In some ways, shutting down is great, as I’ve written about before. But it sucks not having a good way to get fixes and improvements out to customers. There are a lot of considerations here. I’m not sure what we’ll do next year. It’ll largely be up to the team.

Fourth, you should never, ever ship something directly before a holiday or before you go on vacation. It’s immature and unacceptable. You not only screw over your team when something goes wrong, you screw over everyone depending on your product. They need to jump into action and figure out what’s going on, how to mitigate things, respond to customer complaints, etc. I cannot believe I need to tell anyone this. Don’t ship directly before a holiday.

Anyway, just some thoughts. Happy New Year!

No Comments

More effective interviews

by Rob Galanakis on 30/12/2014

David Smith over at makes some interesting points about the length of most interviews:

So mathematically, you will most likely get the highest confidence interval with: 1) Resume screen, 2) Phone interview, 3) In-person interviews 1-3. From the above, this should represent about 50% of the total causes, but should produce 91% of the total effect. Adding additional interview steps after that 91% brings only incremental improvement at best and backslide at worst.

He makes an extremely compelling argument, and I encourage you to read the entire piece. That said, I still prefer a full day of interviews as both the interviewer and interviewee.

The interviewee angle is easy. I enjoy interviews. I like to dig into my potential employer. I want to grill your second-string players. I want to hear how junior people feel treated. I want as much information as possible before making my choice. But I know this is just me, and people who are less comfortable with interviews probably prefer shorter ones. I also admit I don’t think I’ve learned anything in the second half of a day of interviewing that would have made me turn down a job. But I have learned things that helped me in my job once hired.

The benefits of full-day interviews for the interviewers is much more complex. There are several factors:

  • We have diverse backgrounds and expertise, and each group brings a unique perspective. Candidate postmortems are not dominated by the same couple interviewers.
  • I want to give as many people experience interviewing as possible. I consider it an important skill. Limiting things to three in-person interviews means the interviewers are all “musts” and I don’t get to experiment at the periphery with groups or combinations.
  • People want to be a part of the process. I’ve personally felt frustrated when left out of the process, and I know I’ve frustrated others when I’ve left them out.

For a developer role, I want them to meet with at least: founders, ops, lead developer, two developers, myself. We’re at an absolute minimum of 7. That is with a narrow set of views, without inexperienced interviewers, and leaving good people out. What am I supposed to do?

  • For starters, the interview process should be more transparent and collaborative. Ask the interviewer if they want a full day, two half days, morning or afternoon, etc.
  • No group lunches. I’ve never gotten useful feedback from a group lunch. Keep it down to one or two people. A candidate just doesn’t want to embarrass themselves, so they just shut up, and side conversations dominate.
  • Avoid solo interviews. I used to hope to solo interview everyone. But over time, I’ve found that pairing on interviews enhances the benefits listed above. There are still times I will want a solo interview, but in general I will pair.
  • Cut the crap. Interviewers should state their name and role. Don’t bother with your history unless asked. Don’t ask questions that are answered by a resume. Instead of “tell us about yourself” how about “tell us what you’re looking for”.
  • Keep a schedule. Some people are very bad at managing time. If someone isn’t done, too bad, keep things moving. They will eventually learn how to keep interviews to their allotted time.

Thanks to David for the insightful post. I’ll continue to keep full-day interviews, but we’ll definitely change some things up.


IKEA instructions are the best

by Rob Galanakis on 27/12/2014

I don’t say it with a hint of sarcasm. I’ve put together a lot of furniture lately, and IKEA instructions are the only instructions that are consistently correct and unambiguous. In dozens of units, I’ve confirmed one case of an ambiguous step. But even in that case, I was able to read ahead and eliminate the ambiguity.

Compare this to almost every other piece of furniture I’ve put together. The drawings are often ambiguous, and even worse, the furniture can be constructed in multiple ways. This is rare with IKEA furniture. You may get to the end and find out you messed up, but things won’t really fit together. With my son’s crib, though, I had moulding pointing the wrong way with no structural effects. Unacceptable.

Assembling furniture from basic components is necessarily complicated. IKEA does a great job embracing this complexity by supplying extremely concise-yet-precise instructions and products where the construction process is considered in the design. My guess is that most people who have problems with IKEA construction jump in without understanding what they are doing. Fiberboard planks and screws are deceptively simple.

I think of this lesson often with the design of complex systems. Anything that deals with ACH (credits and debits in the US) is necessarily complex. You can only abstract to a certain level. Did you know that an ACH payment can transition from Succeeded to Failed? Attempting to “hide” the complexity of ACH, like we successfully hide the complexity of a file system, is a fool’s errand. Instead of making a payments API that’s simple to use, it’s be much better to make one that’s precisely defined, thoroughly tested, and well documented. There are still some problems that require a little bit of RTFM. It’s better to make this complexity front and center in a design like IKEA furniture, than to gloss over it and end up with client code that is built like second-rate DIY furniture.

1 Comment

Free Practical Maya Programming with Python eBooks

by Rob Galanakis on 23/12/2014

Merry Christmas and happy holidays everyone,

Last week I asked my publisher if I could make the Practical Maya Programming with Python eBook totally free. I was told some good news and bad news.

The bad news is, they won’t make it free. The good news is, my editor said that Packt often runs free eBook campaigns, and would make the book part of the free campaign whenever they come up. I will blog here when they do (and also please tweet me @techartistsorg if I miss it).

If you can acquire a pirated copy of my book, I encourage you to do so. Packt does not use DRM as far as I know, so just ask a friend who has the book.

Sorry I can’t make it totally free right now, as much as I want to. It sucks to not have full control over something you have personally invested so much in, but I don’t have the energy to fight my publisher on this one (and the fact that they’re DRM-free makes this much less of an issue).

Enjoy, and please leave me a review on Amazon.

1 Comment

We’re not so different, you and I

by Rob Galanakis on 21/12/2014

Ben Sandofsky wrote a post about why QA departments are still necessary, specifically with regards to mobile app development. He makes a good point: mobile apps create a distribution bottleneck that makes very rapid iteration impossible. I agree, and this is a good angle to think about. I would have been happy with an article focused on this.

Ben is clearly a talented guy but this post was insane. In a literal sense. It is a rant for anti-Agile curmudgeons at best, and would leave me questioning the experiences of anyone that thinks this way at worst.

Websites ship embarrassing bugs all the time. They get away with it because they didn’t ship it to all users. You roll-out everything out to 1% of users, and watch your graphs. If things look good, slowly roll out to 100%.

The idea that this is this sort of incremental rollout is ubiquitous amongst web developers is crazy. It requires infrastructure, code designed to support split testing, experienced operations engineers, robust monitoring, a disciplined process, and more. The institutions with this sort of sophistication all have strong automated testing environments. Which brings me to my next issue:

I think automated testing accelerates development, but I haven’t seen a direct correlation between testing and quality. On projects with massive, high quality test coverage, I’ve seen just as many bugs slip through as projects with zero coverage.

This is the software equivalent to climate change denial. Where does this experience come from? I am not sure I’d be able to find a single developer who would corroborate this. Oh, right:

Tell a game developer you don’t need [QA], they’ll tell you you’re nuts.

The game industry is full of these folks who believe what they are doing is such an untestable snowflake. Unsurprisingly, games have historically been the buggiest software around. Never, ever look at game development as an example of how to do QA right. Not just automated testing, but manual QA too.

…a great QA team is far from a bunch of monkeys clicking buttons all day.

Game development has a hallmark technique of hiring masses of QA people and have massive layoffs at the end of projects. There is an entire website dedicated to tales of horror from QA people. It makes The Daily WTF look like paradise.

Take the unicorn of “two week release cycles.” As you build infrastructure for faster releases, simple code becomes unwieldy. Tasks that should take hours take weeks.

What does this even mean? There are endless apps on two week release cycles. I am confused how building infrastructure for faster iterations ends up adding complexity to simple code or tasks.

Disciplined development is a lost art.

You could make this argument when we moved away from punch cards. But the idea that success in mobile apps is achieved through discipline, but success on the web can be achieved by recklessness, is beyond baseless. It’s downright insulting.

I consider it a tragedy that, when faced with the reality of App Store distribution bottlenecks, Ben’s answer is to go back to the process of yesteryear and throw out the lessons we’ve learned. Why not invent new ways of building in quality? New ways of iterating on apps faster? There are so many interesting problems to solve.

Finally, Ben cautions:

Today, any web developer who wants to stay employed has learned to build apps. If web companies want to remain relevant, they’ll have to do the same.

I have a better warning. Don’t throw away the incredible advances we’ve made over the last decade. Don’t downplay the success and rate of innovation in web development as something that doesn’t apply. Don’t throw away the universal “good idea-edness” of automated testing. Don’t rely on a separate department to enforce quality. Don’t stop looking for ways to make development better.

1 Comment