Archive of published articles on September, 2011

Back home

pynocle 0.10 released

26/09/2011

My first useful open source project, pynocle, is finally ready for me to talk about.

Get the code via Hg/GoogleCode here: http://code.google.com/p/pynocle/
Browser the pynocle-generated metrics for pynocle here: http://pynocle.googlecode.com/hg/exampleoutput/index.html

pynocle is a series of modules designed to provide the most comprehensive code analysis of python available from a single source.  It is designed to be as dead simple to use as possible- create/configure a pynocle.Monocle object, and run it.  You can get by quite well only knowing 2 methods on a single object.

Right now, pynocle has support for:

  1. Cyclomatic Complexity
  2. Coupling (afferent and efferent)
  3. Google PageRank-like algorithm for measuring coupling (requires numpy)
  4. Source lines of code
  5. Dependency diagrams (requires GraphViz Dot)
  6. Coverage (requires coverage module)
It is intended to run out-of-the-box with minimal work.  Over the coming months, I’m going to add:
  1. More configuration support.  Right now this is truly just an API, which I prefer, but it may make it easier if it can be configured through text.
  2. Runnable from commandline.  I plan to make the whole thing runnable, as well as individual components.
  3. Python easy_install/PyPI support.  Right now, you do it all by hand.
  4. Get it running on Linux.  I am catching a WindowsError in a few places and also am missing the filetype indicator at the top of the files.  I’m not a *nix guy, so if you can help with this, I’d love it (should be simple).
  5. Improve rendering of reports.  Right now, most are in plain text (except dependency images, and coverage html report).  I’d like to make them all some form of HTML.
  6. Add more metrics.  Believe it or not, I’m pretty happy with the current metrics, but I’ll be adding more as time goes on and I get ideas or people ask.
My end goal is to have something comparable to NDepend, but much more limited in scope (both because of the amount of work, and python’s dynamic nature making static analysis more restrictive).
This is my first potentially cool open source project.  If you would like to contribute, great!  Please email me.  If you have any advice for me, I’d love that to!  What’s involved in ensuring this project is successful and adopted?
No Comments

Don’t use global state to manage a local problem

25/09/2011

Just put this up on altdevblogaday: http://altdevblogaday.com/2011/09/25/dont-use-global-state-to-manage-a-local-problem/


I’ve ripped off this title from a common trend on Raymond Chen of MSFT’s blog.  Here are a bunch of posts about it.

I can scream it to the heavens but it doesn’t mean people understand.  Globals are bad.  Well, no shit Sherlock.  I don’t need to write another blog post to say that.  What I want to talk about is, what is a global.

It’s very easy to see this code and face-palm:

global spam = list()
global eggs = dict()
global lastIndex = -1

But I’m going to talk about much more sinister types of globals, ones that mingle with the rest of your code possibly unnoticed. Globals living amongst us. No longer! Read on to find out how to spot these nefarious criminals of the software industry.

Environment Variables

There are two classes of environment variable mutation: acceptable and condemning.  There is no ‘slightly wrong’, there’s only ‘meh, I guess that’s OK’, and ‘you are a terrible human being for doing this.’

  1. Acceptable use would be at the application level, where environment variables can be get or set with care, as something needs to configure global environment.  Acceptable would also be setting persistent environment variables in cases where that is very clearly the intent and it is documented.  Don’t go setting environment variables willy-nilly, most especially persistent ones!
  2. Condemning would be the access of custom environment variables at the library level.  Never, ever access environment variables within a module of library code (except, perhaps, to provide defaults).  Always allow those values to be passed in.  Accessing system environment variables in a library is, sometimes, an Acceptable Use.  No library code should set an environment variable, ever.

Commandline Args

See everything about Environment Variables and multiply by 2.  Then apply the following:
  1. Commandline use is only acceptable at the entry point of an application.  Nothing anywhere else should access the commandline args (except, perhaps to provide defaults).
  2. Nothing should ever mutate the commandline arguments.  Ever!

Singletons

I get slightly (or more than slightly) offended when people call the Singleton a ‘pattern.’  Patterns are generally useful for discussing and analyzing code, and have a positive connotation.  Singletons are awful and should be avoided at all costs.  They’re just a global by another name- if you wouldn’t use a global, don’t use a singleton!  Singletons should only exist:
  1. at the application level (as a global), and only when absolutely necessary, such as an expensive-to-create object that does not have state.  Or:
  2. in extremely performance-critical areas where there is absolutely no other way.  Oh, there’s also:
  3. where you want to write code that is unrefactorable and untestable.
So, if you decide you do need to use a global, remember, treat it as if it weren’t a global and pass it around instead (ie, through dependency injection).  But don’t forget: singletons are globals too!

Module-level/static state

Module-level to you pythonistas, static to your C++/.NET’ers.  It’s true- if you’re modifying state on a static class or module, you’re using globals.  The only place this ever belongs is generally for caching (and even then, I’d urge you to reconsider).  If you’re modifying a module’s state- and then you’re acknowledging what you’re doing by, like, having to call ‘reload’ to ‘fix’ the state, you’re committing a sin against your fellow man.  Remember, this includes stuff like ‘monkeypatching’ class or module-level methods in python.

The Golden Rule

The golden rule that I’ve come up with with globals is, if I can’t predict the implications of modifying state, find a way not to modify state.  If something else you don’t definitely know about is potentially relying on a certain state or value, don’t change it.  Even better, get rid of the situation.  This means, you keep all globals and anything that could be considered a global (access to env vars, singletons, static state, commandline args) out of your libraries, entirely.  The only place you want globals is at the highest level application logic.  This is the only way you can design something where you know all the implications of the globals, and rigorously sticking to this design will improve the portability of your code greatly.

Agree?  Disagree?  Did I miss any pseudonymous globals that you’ve had to wrangle?

No Comments

The skinny on virtual static methods

22/09/2011

Today at work, I saw the following pattern in python:

class Base(object):
    @classmethod
    def Spam(cls):
        raise NotImplementedError()
class Derived(Base):
    @classmethod
    def Spam(cls):
        return 'Eggs'

After thinking it over and looking at the alternatives, I didn’t have an objection. I asked for some verbose documentation explaining the decision and contract, as this is really not good OOP and, some (me) would say, polymorphism through convention- which in dynamically typed languages like python, is how polymorphism works. But it is unfamiliar and generally considered bad practice, especially in statically typed languages.

Of course this sparked some disagreement, but fortunately, this is a topic I’ve read deeply into a number of times- first, when I was thinking about how to implement the pattern and realized the error of my ways.  Second, when I was working in another codebase that was a case study in antipatterns where this was used quite heavily.  I read into it yet again today so I could write this blog post and send it to the persons who disagreed with my judgement of the pattern in statically typed compiled languages.  So read on.

Let me reiterate- I’m not opposed to this pattern per-se in python or other dynamically typed languages.  I think it is a sure sign of code smell, but there are legitimate reasons, especially if you’re working on, say, a Maya plugin and have some restricted design choices, and this can smell better than the alternative, which is a bunch of other shitty code to deal with it.  In fact, it @abc.abstractstaticmethod was added to the abstract base class module in python 3.2, with Guido’s consent.  Which doesn’t mean it is right, and I’d still avoid it, but it’s not terrible.

My problem is with the pattern in statically typed languages.  When you make a method abstract or virtual on an interface of class, you are saying, ‘you can pass around instances of me or derived from me and they will have a method named Foo.’  Instance methods are bound to the instance.  When you define a method as static, you are binding it to the class.  So there are two reasons this doesn’t work- one technical/semantic, the other conceptual.

The technical/semantic cause is that virtual and static are opposite!  Virtual tells the compiler, ‘look up what method to call from the vtable of the object at runtime.’  Static tells it, ‘use this method at this address.’  Virtual can only be known at runtime, while static must be known at compile time.  I mean, you need a class to invoke the static method on!  The only way around this is through either dynamic typing, or some sort of reflection/introspection/codegen- in which case, you aren’t working with static typing, you’re emulating a dynamically typed environment in a statically typed one!

So, clearly the concept of ‘virtual static’ is impossible in static languages like C#, C++, and Java.  It doesn’t stop people from trying!  However, does the fact that the languages don’t support the feature make it a bad idea (the same way that just because python 3 supports it doesn’t necessarily make it a good idea)?

Yes, yes, a thousand times yes.  Let me restate the above: in order for ‘static virtual’ to be supported in a statically typed language, you need to use its dynamic typing faculties.  Look, I’m all for ‘multi-paradigm’ languages (even though some artard on Reddit berated an article I wrote because that phrase doesn’t actually make sense), but we need to be very careful when we start using patterns that go against the foundation of our languages.  Like I said- I’m not fundamentally opposed to the pattern, I am just opposed to using it in statically typed languages.

But that’s like saying, I prefer tomato juice to evolution.  I don’t even know what that means.  You cannot have virtual static methods in a statically typed language.  They are incompatible.

So much for the technical (vtable)/semantic (dynamic) reason (were those 2 reasons, or one reason?).  What about the conceptual one?

Well like I said earlier, virtual or abstract methods are part of a contract that says, I or a subclass will provide an implementation of this method.  Classes and interfaces define contracts, and at their core, nothing else (especially not how they’re laid out in memory!).  So if you’re passing around a type, and you’re saying, this type will have these methods- well, what does that look like?  I can hardly fathom what the class declaration would look like:

class Base {
    public virtual static Base Spam() { return Base(); }
    public virtual string Ham() { return "Base Ham"; }
}
class Derived : staticimplements Base {
    public override static Base Spam() { return Derived(); }
}

Well, shit. We’re saying here that Derived’s type implements the contract of Base’s type- well, Base has a ‘regular’ instance contract, the Ham method.  What happens to this on Derived?  Is Ham part of Base’s contract?  It must be, because otherwise I have no idea what the Spam() method is going to return for Derived.  Alright, so if you ‘staticimplements’ something, you get get all static and instance methods as part of your contract (and this is how python works, too).

So how do we use this?

void Frob(Base obj) { ... }

Wait. Shit. This says we’re passing in an instance of Base, whereas we want to pass in the Type of an object that staticimplements Base. So:

void Frob(BaseType obj) { ... }

So now let’s jump back to our class definitions:

class BaseType : Type { public virtual static Base Spam() { return Base(); }
class Base : staticimplements BaseType { public virtual string Ham() { return "Base Ham"; } }
class Derived : staticimplements Base { public override static Base Spam() { return Derived(); } }

Now we’re getting somewhere. We can define types that inherit from the Type object (that is some class like .NET’s Type class), and we can staticimplements those (and if you staticimplements, that implies you also get all instance methods).

Well shit, wait. If Base inherits from Type, then instances of Base will also get all instance methods from the Type object? Well ok, I can deal with that, we don’t have to use Type- what if Type inherits from RootType, and BaseType inherits from RootType, and RootType is just an empty definition so instances of objects that inherit from BaseType don’t have all of Type’s instance methods?

void Frob(BaseType baseType) {
    Base obj = baseType.Spam();
    //Well, how do we get an instance of BaseType from an instance of Base?  We can't.
    //RootType rt = obj.GetType(); //What good is RootType here?
    //BaseType bt = obj.GetBaseType(); //Wait, so we put a method on the instance that needs 
                                //to be overridden on the subclass to return an instance of the actual type?
}

I’m not going to go any further because I doubt anyone has even read this far. The question of virtual static functions in statically typed languages is pointless- much easier, then, to just throw up your hands and hack together whatever using reflection, dynamic, templating, or any other form of introspection. You can, for sure, come up with workarounds that are often quite specific and span hundreds of lines. I’ve read and gagged at a hundred of them.  But given that it is currently not just technically impossible, but conceptually brain-melting, why would you?

The problem at its core is, I think, that people learn a golden hammer in one language (in this case, python, Delphi, etc.) and try to apply it to another (C#, Java, C++), as well as people coming up with a design and then figuring out how to shoehorn the idea into the language.  Well guess what- not every language can execute any arbitrary patterns or designs well.  Learning python made this patently clear.  The language is so concise, each line (and character!) so meaningful, it is patently clear when I am doing something unpythonic- the meaningless lines of code, the extra characters, the redundant patterns in the code, the roundabout way to achieve something.  Static languages don’t have this concision, they don’t have the same economy, or flexibility.  There is too often a legitimate workaround or over-engineering, so when illegitimate workarounds are devised- well, who even considers it?  I certainly didn’t (hello, class Blah<T> where T : Blah<T>!).

So what are your choices here?  If you really think that this is your only option, your design stinks.  Stinks in that it has a foul code smell.  You have a few options:

  1. Singletons, which is a pretty big copout because you’re really just creating a static class by another name (but to be clear, are still a potential solution),
  2. Just create a new instance and call an instance method.  Suck it up, it really isn’t a big deal, though you basically require generics and a new() constraint for it, or the conceptual opposite:
  3. Dependency injection.  My guess is, if you’ve made it this far (congrats and thanks!), and you disagree, you’re not familiar with the idea of Dependency Injection or Inversion of Control.  I’d encourage you to read up on it, and realize it isn’t nearly as complicated as you may think it is, but far too interesting to get into here.

Good luck!


I’d encourage you to do your own googling and reading about this problem. It’s quite interesting and really highlights conceptual differences between languages, and understanding the problems or benefits to the approach will make you a better programmer and designer in your preferred language.

No Comments

Automation must be a last resort

18/09/2011

A repost from altdevblogaday.  Original post here: http://altdevblogaday.com/2011/09/10/automation-must-be-a-last-resort/  As is usual, the title is more inflammatory than the contents, the contents muddle the issue, and things are far more clear after reading the comments.


As tools programmers and tech artists, we are responsible for the efficiency of our designers and artists.  And most tools programmers and TA’s I’ve worked with take this very seriously, and are generally very clever, so very few things can stand in their way when they are determined to speed up a developer’s workflow.  Most commonly, such speedups are achieved by the automation of repetitive tasks.

But we are also responsible for the quality of our codebase.  “Simplicity” of code and systems is commonly accepted as an ideal all coders should strive for.

Everything should be made as simple as possible, but not simpler.

And here is my problem.  Automation increases complexity and reduces simplicity.

An Example

Consider the following diagram, which could represent a single workflow with many steps.  Each Step represents some unique concept or block of code or logic that exists in a pipeline- for example, exporting the content, format conversion, writing a metadata file, assigning a material, and importing into game.  Right now, the user performs each one manually.

Obviously we can do better- we can half the number of steps if we write some code to automatically, say, launch an exe to process the just-exported content, and we can automatically write the metadata file on import.

 

Once this is in the wild, we realize we can automate the whole thing!  So on export, we do everything, and it even imports the content into game.  Great!  But of course we still need to support some manual intervention for things that don’t ‘fit in.’

There’s a problem here, though.  A big one.  The code has essentially remained the same- so even though the user’s experience is simpler (which is always the goal!), the way we got there was to add more complexity into the codebase.  Because here’s the thing about automation:  Automation relies on inference.  And inferring things in code is notoriously difficult and brittle.  We have basically all the same code we had when we started (though I’m sure we fixed and introduced some new bugs), except we have now effectively doubled the connections between the components, and each connection is brittle.  How much of your automation relies on naming, folder structures, globals (environment variables and singletons are globals too), or any number of circumstances that are now built into your codebase?  Likewise, if you merely added buttons to create automation, the additional complexity there is obvious.  All the old stuff is still in place, you’ve just created another UI and code path on top of it that is either using it or also accessing the same internals.

That is not what we should strive for.

This, instead, is what we should strive for:

 

This isn’t always possible- but I’ve seen enough pipelines to know that it is probably possible on most pipelines at your studio, and definitely possible on some.  It should always be our goal- that every time we want to ‘automate’ what the user does, we instead say “how can I reduce the complexity of the code so nothing needs to worry about this.”  This is how you identify automation that increases complexity versus refactorings that reduce complexity: when your change simplifies the codebase (this is open to interpretation but I’d imagine you can judge this pretty easily), and ‘automates’ previously manual parts of the pipeline, that is no longer automation- you have done an excellent refactoring that has reduced complexity and it is not automation (at least not actually- the users are free to call it what they want).

It isn’t always possible.  More commonly it would be possible but not without a substantial refactoring somewhere (maybe not even your code).  Sometimes, it is just moving the complexity around rather than removing it.

These things are fine! The important thing is that you are now really thinking about your codebase.  The goal isn’t to reduce the complexity of your codebase in a day, it is to ensure you are only adding valuable complexity and that you have identified opportunities to reduce complexity.

Identifying Trends

It’s not very difficult to identify when we are adding excess complexity when automating, or when we are simplifying.

If you have simple configuration needs, such as choosing two options or files, see if you can infer that setup instead from what the user chooses to do (such as providing him two choices, rather than one configurable one).

In contrast to that, prefer upfront configuration to inference if the configuration adds significant power and simplifies the code.

If common use cases no longer fit into the scope of the tool’s effective workflow, refactor the tool.  Do not start adding ‘mini-UI’s that support these additional use cases, or you will end up with a complex and confusing mess.

Always present the minimum possible set of options to the user that allows her to get her job done effectively.

As a corollary, if the code behind your simpler UI becomes significantly more complex when simplifying the actual UI, it is likely your system can be streamlined overall.  The lower the ratio of UI options to code behind, the better.

Conclusions

All too often I see tools programmers and technical artists automating processes by building new layers of code on existing systems.  Coders should always look for ways of simplifying the overall system in code (or moving the complexity out and abstracting it into another system) as a way to achieve a streamlined workflow for the user, rather than building automated workflows and adding complexity and coupling to the existing code.

I intentionally didn’t provide precise examples or anecdotes, but I will gladly provide personal examples and observations in the comments.  Thanks.

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius – and a lot of courage – to move in the opposite direction.

No Comments

Python software metrics- my first useful OS project?

5/09/2011

I’ve tried to open-source code quite a few times, but the projects have been niche enough that they haven’t been very useful.  Well, I finally have something universally useful.

I’ve take an interest in code metrics recently (as documented on this blog) and I have been quite upset to learn that there are few good tools for measuring them in Python code.  PyLint and PyChecker and the like are not what I’m talking about- I want dependency graphs, measure of cyclomatic complexity, automatic coverage analysis, etc.

So basically what I’m doing is creating a framework that wraps a bunch of existing functionality into an easy-to-use system, and expands or refactors it where necessary.  My goal is to make it a ‘drop in’ system to it will be trivial to get thorough code metrics for your codebase (similar to how simple it is to do in Visual Studio).

Right now I’ve created a SLOC (Source Lines of Code) generater, a wrapper for nose and coverage, and hooked it up to pygenie to measure Cyclomatic Complexity- which is unfortunately going to need a significant refactoring, so I won’t be able to fork it directly.  I’ll be hooking it up into our automated test framework at work this week as well for some battle testing.  I’m 100% sure there’s a good deal of extensibility and configuration adjustments I’ll need to make to support alternative setups.  Next up will be automatic generation of dependency graphs (which doesn’t look easy at all, unfortunately).  And writing tests (this is the first project that I didn’t sort-of-TDD in a while).  Oh, and getting it into Google Code.

Is this something you guys can see hooking into your codebases?  Do you see the value of and want to find out metrics of your codebases?

Oh and it’s tentatively called ‘pynocle’, if you have a better name I’d love to hear it.

4 Comments