Cloud Based Pipelines?

by Rob Galanakis on 18/07/2011

Originally posted on AltDevBlogADay:

The rest of software is moving into The Cloud, how come we aren’t doing the same with our tools and pipeline?

I love the cloud.  Yes, I know it’s a buzz word for not quite revolutionary concepts, but I love it anyway.  I love it for the practical benefit I get, and I love it for the technological possibilities it brings.  It doesn’t just mean using web apps- it means using amazing applications that run in any browser on any platform, it means not worrying about storing data locally, it means a rich and expanding personal experience based on the connections between your data and everyone else’s.

And then I think about most of the pipelines I’ve seen and I wonder: what have we missed?  Very often, we are building some of the most incredible and expensive games ever with incredibly shitty sets of tools.  Why do we have essentially the same pipelines as we’ve had for the same 10+ years? (I recently finished a case study of Dark Angel’s pipeline, from 2001, which is remarkably similar to some I’ve seen recently).  Game production has changed, but pipelines have not.  We’re releasing games that get downloaded content (or are continuously updated like an MMO), and the amount of content is ballooning.  Yet we’re still using essentially the same technologies and strategies as we were in 2001.  There’s something to learn by looking at Cloud technologies and concepts, buzzword or not.

Can game pipelines, too, move into the cloud?

The one essential aspect of the cloud is its basis in service-based architectures.  For the sake of simplicity and those unfamiliar, let’s say a service is a local or remote process that has some set of exposed methods that can be called by a client through a common protocol (JSON, XMLRPC, etc.).  All other aspects of cloud technologies require this serviced based architecture.  You couldn’t have the characteristic web apps if there was no service behind them.  You couldn’t run the same or similar page on any platform and device if the work was happening on the client instead of the service.  You couldn’t have a backend that automatically scales if the real work was happening in a Rich Client App (RCA) instead of in a service.

Could we build our pipelines with the same service-based approach (if not the always-there distributed-ness), and would we get similar results?

,--.::::::::::::::::::::::::::::::::::::
    )::::::::::::::::::::::::::::::::..
  _'-. _:::::::::::::::::::::::::::..
 (    ) ),--.::::::::::::::::::::::.
             )-._::::::::::::::::::..
_________________) ::::::::::::::...

Yes, we can.  But let’s consider what a service-based pipeline architecture would look like.  The biggest change is moving nearly all functionality out of DCC apps, which are RCA’s, and into libraries that can be consumed by the services.  This is what I’ve been doing for years, but I understand it may be a new thing for many people- but I guarantee you can do it and you’ll be better off because of it, not having to deal with buggy and monolithic DCC apps.  These libraries/services can use headless apps behind the scenes if necessary, to do rendering or some processing or whatever (mayabatch.exe or whatever).  Avoid it if you can, but you could do it.

The DCC and its UI’s, then, become very simple shells which just call methods on the service, and contain very little functionality of their own.  The service does the processing and calls back to the client (and if the function can be done asynchronously, the user keeps working while the work happens in the background).  The service can communicate to other remote and local services to do the work it needs to do.

Conceptually it is simple, but I promise you, the implementation will be complex.  So the benefits better be worth it.

And they would be.  The first thing you get is better abstraction between systems and components.  We remove ourselves from the hacks and workarounds of programming in a DCC, and can instead concentrate on working in a sensible development environment and not have to worry about debugging in app or having to make sure all our libraries work under whatever half-assed and old implementation of python Autodesk provides.  This results in being more deliberate about design decisions- not having a hundred pipeline modules available to you is actually a good thing, it forces you to get your dependencies under control, and you give more consideration to your APIs (I blogged about how server/client systems can be a useful exercise in abstraction).

These abstractions also give greater scalability.  No problem moving your code between versions of your DCC, machine architectures, python/.NET versions, etc.  It doesn’t have the ball and chain of DCC apps, because you’ve taken it all out of the DCC apps.  Compare this flexibility in scalability to something like render farms- they usually have a very specific functionality and required software and added more functionality takes lots of engineering time.  By having ‘normal’ code that can be run on any machine, you can distribute your processing to a farm that can tackle anything, and doesn’t require as complex systems or specialized skills to manage.  This is the distributed processing capacity of cloud computing (in fact you could probably deploy this code to a cloud provider, if you had good server-fu).

These abstractions also lead to language neutrality.  That’s right, I said it.  I didn’t say it is a good idea, just that it’s possible.  Just the same way the Twitter API has been wrapped in three dozen languages, your services should have an API using a common protocol like JSON, and many services and clients can communicate together.  You’re not stuck using COM or marshalling data or any other number of bullshit techniques I’ve seen people do to glue things together.  Your client can be anything- a DCC, a web app, a mobile app- you could even run it via email if you so desired, with zero change to the pipeline itself- only the client code you need to call it.  And don’t forget hosting a web page in a library like Qt or .NET could also run the service.

This is software engineering as we tech artists and pipeline engineers should have been doing all along.

 _______________
| | _________ |o|
| |___________| |
|     _____     |
| DD |     |   V|
|____|_____|____|

Let’s take a simple pipeline, like a character mesh exporter that includes an automatic LoD creator.  In Maya (or Max, or XSI, whatever), the user just hits ‘export selected’, and it can transfer the mesh data and the Maya filename/mesh name to the Local Service.’  It can transfer the mesh data directly as a json object, or it can save it to an fbx file first and transfer the name of the fbx file, whatever- the point is that it isn’t data in the DCC, it’s data from the DCC.

At that point, Maya’s work is done and the user can go back to working while everything else happens in the background in other processes and machines.  Awesome!  Most (all?) DCC’s are still very single threaded so trying to do any real work in background threads is not practical (or stable…).

The Local Service sends the mesh data to some Remote Services to request the generation of some crunched and optimized LoD meshes.  The Local Service can call an Asset Management Service with the scene filename/mesh name, to get the export path of the final mesh file.  The Local Service can then do whatever it needs to do to ‘export’ the content: call some exe files, serialize it, whatever, it just needs to save the exported file to where the Asset Management Service said it should be.

The Remote Services can call back to the Local Service as they finish processing the LoD’s, and the Local Service can save them where they’re supposed to go as well.  All of this without the user having to wait or intervene for anything, and without bogging down his box with expensive, CPU hungry operations.

  __________
/_________/ |
|         | |
| |====|  | |
| |====|  | |
|   ___   | |
|  | @ |  | |
|   ---   | |
|_________|./

Is this complex?  Yes.  Is it possible for a technically competent team to do?  Absolutely not.  Pipelines are the bastard child of game technology, and it show- we have been doing the same crappy things for a decade.  If we want to minimize ballooning costs of content development, develop robust pipelines capable of supporting games after ship with updates and DLC, and, let’s face it, work on some inspiring and exciting new technology, we’ll take pipelines to the cloud.

No Comments

I’m on Google+

by Rob Galanakis on 17/07/2011

I’m on Google+, and it is the FIRST social networking site I’ve ever actively participated in. It seems pretty awesome so far, have I been missing this on Facebook all these years?

Fine me here: https://plus.google.com/u/0/112207898076601628221/posts//p/pub

2 Comments

Game Studio Takeover Nightmare Impossible

by Rob Galanakis on 16/07/2011

There’s a sub-genre of reality television that contains shows where experts come into a failing business and implement changes to fix things.  Three of the most well known are Gordon Ramsey’s Kitchen Nightmares, Robert Irvine’s Restaurant Impossible, and Tabatha’s Salon Takeover (totally awesome show, btw).  I’ve wondered what it’d be like to get a games industry version of one of these experts into a studio to see what she could do.  Fortunately, the programs all follow a very obvious (and repetitive) pattern to find and fix the problems- so you can really just do it yourself (most problems the experts find are obvious anyway- the people in charge are just ignorant or in denial).

Follow these steps at your studio and imagine how things would go down.

Part 1: The initial personnel observation
The experts observe how things run without interferring.  They sit down to eat, watch hidden cameras, whatever.

  1. How do the employees get along?  Are they friendly to each other, do they enjoy work, do they hang out, do they do work?
  2. How does management interact with the employees?
  3. How many employees and managers are there, and what’s the ratio?
  4. Is there anything else fishy (nepotism, unqualified people, etc.)?

Part 2: The facilities inspection
The experts tour the facilities and inspect how things look, especially cleanliness.

  1. Do people have the right computer equipment and licenses?
  2. Are the bathrooms and structure in good shape?  AC working well?
  3. Are the employees treated well physically?  Are there drinks and food available?
  4. Where’s the studio located and where would people rather have it?

Part 3: The tragedy and shutdown
The expert does some minor changes and does a more formal observation, providing minor interventions.  Involves some sort of disaster.  Place eventually closes up and the expert begins to work his or her magic.

  1. What tools and processes go right?  What are the worst?  How far to which side is every tool and process in the middle?
  2. Do you have managers who crack under pressure, or do really obviously wrong things?
  3. Are there people seriously misbehaving?  Are there people seriously crunching?
  4. And the biggest question is: does the studio’s project suck, and what are the major problems with the game (is it not fun, has it taken way too long)?

Part 4: The personnel rebuilding
Relationships are worked on, especially between employees and management.  Lots of training is provided.

  1. What training opportunities exist at your studio?  Are people encouraged to look outside for education?  Is ample opportunity provided internally?
  2. What are your employees biggest grievances?  What has changed the most in the past few years and how do your veterans feel about it?
  3. How are you dealing with your poor performers and rewarding your best?
  4. Figure out why the project/game is in the state it’s in, and put a plan in action to fix it and make sure it doesn’t keep happening.

Part 5: The facilities rebuild unveil
New and improved facilities are unveiled to the team.

  1. Your studio should be feeding you.  There’s no reason, financial or otherwise, not to provide developers with at least lunch every day.
  2. You should have enough bathrooms and they should be clean.

Part 6: First day reopening
The business runs for a day, usually with much better results (and generally a couple hiccups).
With the grievances solved, or at least in the open and being worked on, studio culture should be improved and you can concentrate on building a great product.

Part 7: Checkin later
Expert comes back to check up on how things have come along.
Inevitably, some managers will devolve back into madness; or perhaps things were too far along to stop the studio’s shutdown or crappy project.  If you see this happening, you should leave.


I wonder how something like this would fare in the games industry, and who the hell we could find to do it.

2 Comments

Meaningful return values

by Rob Galanakis on 14/07/2011

I consistently see return values messed up in people’s API design.  Fortunately, there are some hard and fast rules to follow that make return values really hard to fuck up.

  1. Only return True/False for a single condition.  It is not an acceptable contract to say ‘Return True on success, False on failure.’  What is acceptable is ‘Return True on success, False if key does not exist.’  You’d still throw exceptions for invalid arguments or state.
  2. Try to avoid returning from methods that mutate a public property on the instance you are calling the method on.  If foo.Frobnicate() mutates foo by changing foo.Bar (which is public), do not return a value- let the caller query foo.Bar.  It makes it more clear that Frobnicate is mutating and not just returning a value the caller can assign (a better design may be to NOT mutate and let the caller assign the result).  Make mutation the clear contract, not a side effect.
  3. Except if you consistently return self/this.  For certain objects, it may make sense to mutate and return the object being mutated.  So you can string together calls, like you do on strings: foo.Frobnicate().Spam().Eggs(), which would mutate foo 3 times.  Obviously if an object is immutable (like a string, or .NET IEnumerable) this is good design, but it can be unclear that the object is being mutated unless it is a core part of the object’s contract.
  4. Do not have a return value if you don’t need it (private methods).  If you have a private method, and no caller is using the return value, don’t have a return value!  It generally means your contract is unclear, or your methods are doing too much, and your implementation needs some work.
Pay close attention to how your return values work when you design your API’s and you’ll have a very easy way to detect code smell.
No Comments

Why server programmers don’t need ruthlessness

by Rob Galanakis on 11/07/2011

There were some (expected) disagreements to my post about why tech artists need ruthlessness.  Perhaps I can help explain my opinion by providing another one about something I know little enough about to run the risk of mischaracterizations: server programmers, and how they don’t need ruthlessness.

(Please mind the words here- I’m not saying they should NOT have it, just that I don’t think it’s essential)

Server programmers occupy, conceptually, the opposite end of the programmer attribute spectrum from TA’s. They are necessarily highly technical, educated about the lowest levels, always working on highly complex systems, their niche is established, they have a clear area of expertise and control.

This creates a pretty big distance between everyone and server programmers- and this distance allows them to work with more autonomy.  And the autonomy means they don’t have as strong a need for ruthlessness to make decisions.

Tech art overlaps with both Art/Animation and Tools Engineering.  They’re working with specs that are often defined by any other number of parties.  Everyone wants a piece of them but few people understand them (how many times have you heard ‘our server runs automatigically’?).  The core work and decisions of a tech artist is infinitely more scattered than that of a server programmer.

I really should have started with this explanation, as I think it better helps illustrate the causes that require ruthlessness to solve.

So the question to further explore is, how far is the expertise of a tech artist regarding tools from the people using them?  Is it near the same distance as between a server programmer and a user?  A server programmer and an applications programmer?  Something else?

It depends highly on what you consider the job of a tech artist, I think.

No Comments

Condensing tags

by Rob Galanakis on 10/07/2011

Quick note- I’ll be reducing the number of tags used by my blog posts. I have far too many that are too similar right now.

No Comments

A brief history of an animation export pipeline

by Rob Galanakis on 9/07/2011

I’ve spoken a lot about the animation export pipeline I made at my last job.  I started as a Technical Animator and naturally animation was where I spent a lot of my time early on (also because it is the most complex part of a pipeline).  I saw the pipeline through a number of major overhauls and improvements, and it was where I created and validated many of my technical views on pipeline.  I’ll provide this here because I love reading this type of history and micro-post mortem, and I hope there are other people out there that enjoy it.  Note this is only about a small portion of the animation pipeline- this doesn’t include the rigs, animation tools, or even a lot of the other things that were involved in the export pipeline, such as optimizations, animation sharing, and compiling.

When I started, we had a ‘traditional’ export pipeline- export paths were done by manipulating the path of the file being exported, it was using a third-party exporter for writing the data, and it was converting everything (inside Max) to bones in order to get objects to put into the exporter (and manipulate the bones in the case of additive animations) and then deleting them after the export.  This was inflexible (paths), buggy (3rd party exporter), and slow (creating bones).

One of the first things I did was write a ‘frame stripper’ in python that would remove every other frame from most animations (not locomotion or additives).  It operated on the ascii file spit out by the exporter.

After that came a solution for the paths- see, there were cases where we really couldn’t export animations based on the source path, because the source and game skeletons were named differently.  So I came up with a system where we’d associate some data with a skeleton name: export path, export skeleton name, path to a bunch of useful data, etc.  This same concept eventually became the concept behind the database-backed asset management system, but for now it was stored in a MAXScript file that was just fileIn’ed to get the data.  This was a huge win as it put all path information in one place.

After that came time to address the intermittent failures we were getting in our exporter.  It was writing out empty files randomly.  We were never able to get a solid repro and the vendor told us no one else had the problem.  So I wrote a custom exporter that wrote out the same ascii files.  This was also a win because it allowed me to move the ‘frame stripping’ into the export phase, rather than running it as a python script after the export.  It also allowed me to read transforms directly from the PuppetShop rig, and avoid the conversion to MaxBones, so things were significantly sped up.  Funny enough, the vendor got back to us 2 weeks after the exporter was really done and well tested (a year from the initial ticket), saying they found and fixed the problem.

Soon after this, I started work on our Asset Management pipeline/database.  I hooked this new system up into the animation export pipeline, and threw out the old maxscript-based system, and we had a unified asset management pipeline for all dynamic content (character art and animations).

Realizing the power of C# and .NET in MXS at my fingertips, I created a .NET library of data structures for the animation that could be exported out to the ascii files.  This was a major turning point- we could have all processing hooked up to the data structures, rather than part of the export pipeline.  So we could strip frames that way, optimize the files, update formats, save them in binary (via a commandline binary<->ascii converter that could be run transparently from the .NET library), save out additional files such as xml animation markup on save, whatever, without adjusting the 3ds Max export code almost at all.  It gave us a flexibility that would have been impossible to try- maybe even impossible to conceptualize- without this abstraction.

This worked great and was what things were built on for a long time.  At some point I realized that this was still not enough of an abstraction.  I built a motion data framework for some animation tools and realized it could be used for the exporter as well.  Basically you have a common motion data data structure, and any number of serializers/deserializers.  So you could load BVH into this common format, and save it out to FBX, without ever going through a DCC or writing any code especially for it.  You also have glue that can fill the data structures, and apply the data structures back to the scene.  So you remove the concept of an exporter entirely.  In your DCC you can just have:

motiondata = getMotionData(myRig)
FbxSerializer().serialize(motiondata, 'exported.fbx')

Likewise, if you wanted to batch-export all your BVH mocap to stub out a bunch of animations, so you don’t need to export stubs yourself, you can just have a script:

FbxSerializer().serialize(BvhSerializer().deserialize('mydata.bvh'), 'exported.fbx')

Unfortunately by the time I had finished the framework, I wasn’t the main person responsible for the animation pipeline and was moving off the Tech Art team, so I never actually hooked up our export format into the system or ported over the features into it- but I did have it working for various other formats and it worked great.

That’s a pretty natural, albeit fast, evolution (all that happened over 2 years and it was rarely my primary focus).  So, where to go from there?  I guess the next step would be to remove the export step entirely, and just hook the same data structures up on a service that can communicate to an animation runtime/game engine, and Maya/DCC.  The same sort of technology as Autodesk’s Skyline, but in a much more flexible and home-brew solution.  From a tools perspective, this may not be incredibly difficult.  The main hiccup is performance due to the still single-threaded nature of DCC apps.  If you could read the scene and send data on a background thread, performance wouldn’t be a problem.  And the beauty extends itself further when creating a service-based pipeline like this, because you could pretty easily hook MotionBuilder (or even 3ds Max) up to the system.

This, though, presents a pretty big leap, and for the time being (until DCC apps improve multithreaded capabilities), I’ll stick with the pipeline in the state it’s in and bring more systems to the same level of abstraction.

No Comments

Why Tech Artists must have “ruthlessness”

by Rob Galanakis on 7/07/2011

ruthlessness:  pitilessness; mercilessness characterized by a lack of pity .

In my GDC2011 IGDA SIG video interview, I told Bill Crosbie that Tech Artists much possess ‘ruthlessness.’  For those of you who want more info, or (like me) hate watching videos, I thought I should give some further explanation.

As I pointed out in my GDC session, TA’s are often highly embedded and less technically competent than ‘true programmers’ (I know many TAs that are better programmers than most programmers- I say this as a generality and expectation).  This results in one major problem- TA solutions are often ‘narrow’.  That is, they are implemented to solve too specific a purpose and under the all-too-often unhelpful and restrictive art zeitgeist.

Smart and forward thinking solutions to problems often require paradigm shifts- we’ve been developing content pipelines the same way for a decade, while content production has changed significantly.  We cannot come up with narrow solutions- we must come up with comprehensive and sophisticated solutions.  This is difficult because there is so much inertia and expectation about doing things the same way they’ve been done.

You cannot fight this inertia without ruthlessness.  It is your job as a TA to uncover the essence of your artists problems, but it is also your job to solve it in the way you think is best, not the way art teams necessarily expects.

It takes ruthlessness to intentionally break backwards compatibility so teams must move to newer and better ways of doing things and not rely on legacy tools- just make sure they don’t catch on to the intentionality of it.

It takes ruthlessness to deploy beta pipelines so they can be fixed and improved.  You cannot hold off until things are perfect, you need to get things out into the wild ASAP- just be ready to fix and iterate quickly and make sure people’s problems are addressed.

It takes ruthlessness to force your artists to endure short term pain for long term benefit- just make sure the benefit materializes.

It takes ruthlessness to force your artists to redo or throw away work if the new and better ways require something different- just don’t do this too often or you may be the problem.

It takes ruthlessness to say “no” small tasks you can do in an afternoon so you can concentrate on larger tasks- just make sure you eventually due these small tasks as it is one reason TAs are so effective!

It takes ruthlessness to ignore unhelpful criticism when implementing fundamental changes- just make sure you can tell the difference between people who criticize because they don’t want to understand what you’re doing, and those who criticize because they want to be helpful.

It takes ruthlessness to lie in order to ease people’s fears if they will be addressed and you don’t want to explain it all- just make sure enough people actually know the full story so you can get good feedback.

It takes ruthlessness to tell people to ‘suck it down’ if there’s nothing you can do or if it isn’t worth your time to do anything- just make sure they know and believe you care.

It takes ruthlessness to tell people they are wrong and you are right- just make sure that’s the case.

It takes ruthlessness to achieve your vision.

One of the differences between a good TA and a Great TA is this ability to be ruthless.  Great TAs have proven successful, and have a vision, and will stop at nothing to achieve it.  They have a group of people who believe in them and are willing to promote and defend them because they have seen the benefit the vision can bring.  If you strive to be a Great TA, don’t be afraid to show a little ruthlessness.

4 Comments

Goodbye Austin, hello Atlanta and CCP

by Rob Galanakis on 5/07/2011

Earlier today I left Austin for Atlanta, to start at CCP Atlanta while my immigration goes through.  I’ll definitely miss Austin- Atlanta is not my first choice of cities to live in (especially Hotlanta during the summer!), but I hope I enjoy it while I’m there.  If you want to meet up while I’m in town, send me an email: rob.galanakis@gmail.com .

I hope to continue my blogging frequency and am looking forward to writing more code again.

2 Comments

Classic Pipeline Case Study Part III

by Rob Galanakis on 3/07/2011

In Part I and II, we analyzed the pipeline design for Dark Angel.  Now let’s see what results, if any, that may have had in the final product.

Dark Angel got pretty abysmal reviews.  In particular, it was criticized for the following:

  1. Repetitive, button mashing combat.
  2. Repetitive, boring, linear environments.
  3. Boring puzzles.
  4. Terrible AI.
  5. Terrible camera.
  6. PS2 version looked like a port.

It has the following (relative) positives:

  1. Good looking environment art.
  2. Decent character art.
  3. Decent sound.
  4. Cool combat animations.

There are some things, such as camera or voice acting, that are not very impacted by content pipelines.  And there are design decisions, such as restricting inventory during boss fights, that are just bad design decisions.  But for many (and the severest) negatives, you can pretty easily see how pipeline deficiencies and positives manifested themselves.

For example, the combat scripting seemed tedious, difficult, and error prone.  I am absolutely not surprised that combat is repetitive, when the overhead required to add variety is simply so great.  I’m also not surprised that it had complex combat animations, given that they seemed to understand and plan for it (specialized root bones).  Likewise, the difficulty in scripting, and the use of it AI, resulted in abysmal AI.

The well-designed Bundle system meant assets could easily be reused (to a fault, it seems), but the amount of asset reuse meant asset reuse worked well.  However, the half-baked world builder resulted in half-baked levels and boring puzzles- there just wasn’t enough possibility to iterate, change, or remove.  I imagine once something was done, it was copied, pasted, and became difficult or impossible to iterate on.  I wonder how much the world builder actually worked, and how much was done through text files directly.

The relatively good job on the art side shouldn’t be surprising from a team that obviously understands the technical side of art creation- the appendices and other info was generally useful and obviously an area of expertise from the engineering team.

The same engineering team, though, obviously didn’t understand artists minds.  There were far too many commandline-only tools that content developers were expected to use.  So you had tools that were difficult to use, and only usable by specialized people.  No surprise you end up with a mediocre Xbox->PS2 port, when you have tools like that.  Artists are better equipped to understand the nuances and difficulties involved with the port.

——–

I think that’s about it.  Now, I’m in no way talking about any absolute truths with regards to the impact of tools on the overall quality of the game.  Dark Angel was a failure and it had nothing to do with the tools.  Countless other games prove the lack of correlation.  What I am saying, though, is that the higher the quality of tools for a feature, the higher quality the feature. I hope there is nothing revolutionary about that statement.  Saying you can get high quality art or design features without good tools means you are relying on luck.  You are hoping you get it right the first time (same as my issues with naming convention driven pipelines).

Applying that statement can be one of the most important factors when doing any high level decision making regarding tools and pipeline.

1 Comment