Posts Tagged ‘Programming’

Too ignorant to know better

My first big python project last year was yet another feed aggregator (taogreggator). Before I started, I looked around at what other aggregators were available, and wasn’t happy with any of them in terms of features, complexity, or trying to get each working.

Of course, 9 months later, that project is dead and I’ve successfully got the python ‘planet’ module up and running at www.tech-artists.org/planet.

Note, this blog post probably reveals what a big programming phony I am ;) Remember though that this sort of thing is well outside my usual domain of expertise.

So what happened? Why did it take so long to realize I was doing something stupid, regroup, and adopt something that actually works?

I was too ignorant to know better. Well, to be fair, I didn’t undertake this project out of hubris or to build something better, I built it mostly as a significant project I could train my python skills with.

I’m not interested in why it failed. There are 100 reasons why it failed, none of them unexpected or interesting. I’m interested in why I undertook it in the first place and took so long to trash it.

1. I didn’t know anything about the web

I still know barely anything, but trying to take an existing package and get it running was incredibly difficult, because I was so out of water. I didn’t even have the vocabulary, and was unfamiliar with everything I was supposed to do and the concepts of how things worked. My own project allowed me to get into it gradually.

2. Too inexperienced to know the challenges ahead of me

It wasn’t actually that difficult to get the app running locally. I even opened up a router port and ran my PC as a server, for remote connections. But I had an Ubuntu server to deploy to, and know nothing about Linux. I had never created a web app before. So at every step, I thought I was almost there. Every known was an unknown unknown to me, because I had no idea what to expect.

3. Too inexperienced with the commandline and the python environment

I talked about it in my Relearning Python series. When I started out, I didn’t really get how python works, because I came from .NET where I didn’t have to worry about any of that. I have a much, much better understanding now, and the environment is one of the early things I teach any new python programmer, because once you start importing code, or writing complex scripts, you need to know how it works. I didn’t understand the environment so I had a very difficult time getting any third-party systems set up.

4. Pythonic is more than a coding style

When I came to python, I was indoctrinated in the ways of a .NET programmer. It took me a long time to understand that ‘pythonic’ applies to more than just lines of code. It has to do with how you run your entire application. The way I run planet I’d consider entirely pythonic- I have a very thin script that generates and uploads some files. The planet module itself is pythonic- there’s some straightforward documentation, commented ini files, and templates, and you’re supposed to customize things and build a few wrapper scripts to run the stuff you need. This looseness was foreign, as I was more used to a much data-driven, rigid way of customizing an app. Being data driven is not great in all circumstances, especially when developing frameworks and apps like this, where the programmer is the user. When I saw what I ended up with with planet, I was embarrassed with how confusing my design was (though, to be fair, it had more features planned). Without understanding how I should use modules like planet, I couldn’t use them. Such basic stuff is not covered in a readme.

So, several weeks ago, I finally made an effort to deploy my custom aggregator on an AWS windows server. I still couldn’t get it working. And I was having even more questions about why I did stuff a certain way (I don’t think the code or design is particularly bad, but it made it difficult to use on a server). It was a huge failure. So three days later, after an awful day at work, I regrouped, and spent the entire evening figuring out existing aggregators, and after struggling with various ones, chose ‘planet’, and got pretty much everything working.

The lessons are pretty clear. You need some minimum knowledge to be able to make an informed decision. Attempt something of a very limited scope to give you that knowledge before making your decision. You will have plenty of options to reinvent the wheel when you know what you’re doing. On the other hand, if you’re pursuing a project only for educational purposes, do whatever you want :)

Next time I’m going to follow some tutorial end to end. It was fun hacking away on something way too complex, but I failed to deliver a server to the community, and, tbh, the time could have been better spent.

Python logging best practices

Logging is one of those things that, being orthogonal to actually getting something done, many developers fail to learn the nuances of. So I want to go over a few things I had to learn the hard way:

We are blessed in the python community because we have the wonderful ‘logging’ module in our standard library, so there is no barrier to entry or excuse to not use proper logging mechanisms. There are often reasons to roll your own of something, that something will probably never be logging. Don’t do it (this goes for all major languages).

The logging module is incredibly flexible. The ‘handlers’ are the key to leveraging the power of the logging module. Handlers can do pretty much whatever you want them to do. Once you get past the most basic logging, you should start reading up on Handlers. Understanding handlers is the key to understanding logging, in my experience.

Root-level configuration should generally only be done by the application, not any library modules. Ie, ‘logging.basicConfig’ should only be (and usually can only be) called very early on. Examples of root-level configuration are setting the format of the logs, setting the logs to print to stdout/stderr, etc. Anything that has to do with global state (and streams are examples of global state), should be handled by the application, never by a library. Rarely should you add a StreamHandler. A FileHandler for a single logger can be useful in some cases (like, if you have a server that is part of a larger application) but should generally be avoided.

If you have multiple classes in a file, give them each their own logger. Do not use a single module logger for many classes. Identify the logger by the class name so you know what logger produced what log.

Putting self.logger = logging.getLogger(type(self).__name__) on a base class is a good way to get a unique logger for each subclass, without each subclass having to set up their own logger.

logger.<methodname>('spam, eggs, and %s', myvar) should be used instead of logger.<methodname>('spam, eggs, and %s' % myvar), as it saves a string formatting.

Make a module with your commonly used log format strings, so each developer doesn’t have to come up with their own, and you achieve some standardization.

Almost never use printing. Use logging, and set your logger(s) up to log to stdout with a StreamHandler while you are debugging. Then you can leave your ‘prints’ in, which will make life easier when you need to go back in to find bugs.

You almost never want to catch, log, and re-raise. Let the caller be responsible for logging and handling the error, at the level it can be handled properly. Imagine if at every level, every exception was logged and re-raised. Your log would be a mess!

I consider the levels are follows- DEBUG only for developers, INFO for general internal usage, WARNING for deployment (I don’t know why you’d have your log level set higher than WARNING). Another way of thinking about them is, DEBUG has all information which only developers care about, INFO has little enough information that the stuff there is relevant and enough that problems can be diagnosed by a technical person, and WARNING will just tell you when something goes wrong. I wouldn’t make any more fine-grained levels than this, but it is up to you and your team to figure out where to use what. For example, do you log every server and client send/recv as DEBUG or INFO? It depends, of course.

The more library-like your code, the less you generally log. Your library should be clear, working, and throw meaningful exceptions, so generally your real library-libraries shouldn’t even need to log.

Logging is not a replacement for raising exceptions. Logging is not a way to deal with exceptions, either.

Remember these are guidelines only (and my guidelines). There are always exceptions to these rules (no pun intended).

I have a feeling those of you writing web/server apps are more familiar with logging best practices than those of us writing code in client apps. But these are all things I’ve seen in the real world so I thought them worth giving my two cents about them. What are your logging guidelines?

Tabs vs. Spaces

A friend asked on G+ recently about tabs vs. spaces. A lot of people agreed with what I said so I thought I’d turn it into a proper post.

There’s a good summary here: http://www.jwz.org/doc/tabs-vs-spaces.html. This is also a link Jeff Atwood has in his post on the subject.

So why are spaces preferred except tabs? Tabs have the nice feature of being both more compact, and the display of the code in an IDE can be customized (I prefer shorter indents, some prefer larger). Spaces are more verbose in a lot of ways. But I’m not going to go over pros and cons with using them because, frankly, they’re not the reason.

Spaces are preferable to tabs because, like the Zen of Python says, explicit is better than implicit. Explicit in the sense that it is more compatible in more places.

PEP8 tells us to limit lines to 79 columns, because our code may be running on fixed-width terminal windows, and python is a scripting language, so people would be looking at the actual code on those terminal windows. As opposed to compiled code, where you’re generally not going to look at or edit the code on those terminal windows.

Speaking of terminals. There are a lot of times we’re editing code in unfamiliar places. That’s not just something like a terminal window. It is an unfamiliar text editor. It is an editor embedded into some program. It is a diff tool. It is any number of places we may need to write or debug code outside of our primary editor/IDE. Who knows what happens when you hit ‘tab’? How are things configured? Why bother with the ambiguity?

Well nothing is stopping you from requiring tabs for your studio, and breaking python’s PEP rules, and educating and configuring everyone’s editors to use tabs. However, the first time you need to go in and edit some code you find on the internet or download through pip or easy_install, you’re going to screw up and create a syntax error. Not only that but nearly every IDE can be easily configured to use spaces instead of tabs for both indenting and dedenting. And where you’re not sure of the default, or don’t want to configure it, you can just use spaces and backspace.

So for python, there’s no reason to use tabs. Just don’t do it. You’re using a language that is dependent upon whitespace for code structure. You need to take it seriously and remember you and your code is part of the larger python community. It isn’t about preference, it is about compatibility.

If you aren’t using a whitespace-dependent language, feel free to establish a standard and enforce it. Just never do it with python.

Everything can be a server/client!

We Tech Artists can get intimidated when talking about servers and clients. They remind us of a world of frameworks and protocols we’re not familiar with, run by hardcore server programmers who seem to have a very demanding job. Fortunately, that needn’t be the case, and understanding how to turn anything into a server/client can open limitless possibilities.

You can think of server/client as a way to get two processes to communicate to each other using sockets, that is more flexible than other means of IPC such as COM or .NET marshalling. Your server can be local, or it can be remote, and very little usually has to change. Moreover, you can define much more flexible protocols/mechanisms, so you can communicate across literally any programming language or platform.

The practical reason everything can be a server/client is because we don’t have to understand much of how anything works under the hood. You follow some examples of how to set up a server and client using the framework of your choice (I’m a huge, huge fan of ZeroMQ which has bindings for pretty much everything including python and the CLR). Once you get comfortable, you just design your interface, and implement one on the server and on the client (the client just usually sends data over to the server and returns the response). Actually I really like how WCF recommends you build your server and client types, even though I am not a big fan of the framework. And I do the same for Python even though it’s not strictly necessary ;)

So your server just needs to poll for messages in a loop, and the client sends requests to it, and the server sends back replies. So driving one app with another is as simple as creating a server on the slave and polling in a (usually non-blocking) loop, and having the client send commands to it. You can invert the relationship on a different port and now you have bi-directional communication (hello live-updating in your engine and DCC!).

The real power of this, I’ve found, is that I really have full control over how I want things to work. No more going through shitty COM or .NET interop, no more Windows messaging. I define the interface that declares what functionality I need, and can implement it in a sensible and native way (ie, not COM, etc.).

For example, we use this for:

  • Recreating a Maya scene in our editor, and interactively updating our editing scene by manipulating things in Maya, even though their scene graphs and everything else are nothing alike.
  • Running a headless version of our editor, so we can interact with libraries that only work inside the editor/engine, from any other exe (like regular python, Maya, or MotionBuilder).
  • Having a local server that caches and fetches asset management information, so data between tools is kept in sync for the entire machine and there are no discrepancies per-app.

If we had a need, we could easily extend this so any other programs could talk to each other. In fact this is generally how it’s done when apps talk to each other: I’m not presenting anything new, just trying to convince you it becomes really really easy.

If you’re anything like me, thinking about things in a server/client scenario can give you an entirely new perspective on how you develop tools and pipelines.

Large initializers/ctors?

With closures (and to some extent with runtime attribute assignments), I find the signatures of my UI types shrink and shrink. A lot of times we have code like this (python, but the same would apply to C#):

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    self.value = value
    self._InitButtons()    

  def _InitButtons(self):
    self.button = Button('Press Me!', parent=self)
    btn.clicked.addListener(self._OnButtonClick)

  def _OnButtonClick(self):
    print id(self.button), self.value

However we can easily rewrite this like so:

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    btn = Button('Press Me!', parent=self)
    def onClick():
      print value
    btn.clicked.addListener(onClick)

Now this is a trivial example. But I find that many types, UI types in particular, can have most or all of these callback methods (like self._OnButtonClick) removed by turning them into inner functions. And then as you turn them into inner functions in init, you can get rid of stored state (self.value and self.button).

But as we take this to the extreme, we end up with very simple classes (and in fact I could replace FooControl with a function, it doesn’t need to be a class at all), but very long init methods (imagine doing all your sub-control creation, layout, AND all callback functionality, inside of one method!).

I’ve decided I’d rather have a long init method, usually broken up into several inner functions, rather than a larger signature on the class with layout, callbacks, and stored state. In my mind, it is easier to pull something out into a type attribute, rather than remove it, as anything on the type is liable to be used externally. And breaking up your layout into instance methods that can really only be called once (_InitButtons), from the init, adds a cognitive burden for me.

So I can justify this decision to eliminate extra attributes rationally, but what seals the deal is, I’m not unit testing any of this code anyway. So whether it is in one long method, or broken up into several methods, it isn’t getting tested.

I started out as very much in the ‘break into small methods’ camp but have wholesale moved into the ‘one giant __init__ with inner functions’ camp. I’m curious what you all prefer and why?

Three options for data correctness

In a previous post, I linked to Rico Mariani’s performance advice for Data Access Layers. On G+, Tyler Good asked:

I just read the posts and the linked blogs, I had a question about some specific implementations. How do you deal with classes that represent another non-[in this case]-Python entity that may be updated outside of Python?

I’m not sure if this sort of case is outside of the scope of what’s being talked about in the articles, but if there’s a better way to do getting on things like p4 paths or elements in a Maya file (that may have been changed by the user since instantiating/loading the object) I’d really like some ideas about that.

You basically have three options and fortunately they line up easily on a scale:

Technique Correct Difficulty
Transactions Always High
Fetch-on-demand Usually Medium
Store in memory Maybe Low

Let’s get on the same page first. Let’s consider all three types of interactions- database through a DAL, perforce (or any source control) interaction, and interaction with some host application (Maya, or your engine, or whatever). So what are the three approaches and how do they differ?

Store in Memory

You create a code object with a given state, and you interact with that code object. Every set either pushes changes, or you can push all changes at once. So for example, if you have a tool that works with some Maya nodes, you create the python objects, one for each node, when you start the tool. When you change one of the python objects, it pushes its changes to the tool.

This is the simplest to reason about and implement. However, the difficultly quickly becomes managing its correctness. You need to lock people out of making changes (like deleting the maya node a python object refers to), which is pretty much impossible. Or you need to keep the two in sync, which is incredibly difficult (especially since you have any number of systems running concurrently trying to keep things in sync). Or you just ignore the incorrectness that will appear.

It isn’t that this is always bad, more that it is a maintenance nightmare because of all sorts of race conditions and back doors. Not good for critical tools that are editing any sort of useful persistent data. And in my opinion, the difficulties with correctness are not worth the risk. While the system can be easy to reason about, it is only easy to reason about because it is very incomplete and thus deceivingly simple. So what is better?

Fetch on Demand

Here, instead of storing objects in two places (your code’s memory, and where they exist authoritatively, like the Maya scene, or a Perforce database), you store them only where they exist authoritatively and create the objects when that data is queried. So instead of working with a list of python objects as with Store in Memory, you’d always query for the list of Maya nodes (and create the python object you need from it).

This can be simple to reason about as well but can also be quite slow, depending on your dependency. If you’re hitting a DB each time, it will be slow. If you need to build complex python objects from hundreds of Maya or Max calls, it will be slow. If you need to query Perforce each time, it will be slow.

I should note that this is really just a correctness improvement upon Store in Memory and the workings are really similar. The querying of data is only superior because it is done more frequently (so it is more likely to be correct). The changing of data is only more likely to be correct because it will have had less time to change since querying.

That said, in many cases the changing of data will be correct enough. In a Maya scene, for example, this will always be correct on the main thread because the underlying Maya nodes will not be modified by another thread. In the case of Perforce, it may not matter if the file has changed (let’s say, if someone has checked in a new revision when your change is to sync a file).

Transactions

Transactions should be familiar to anyone who knows about database programming or has read about Software Transactional Memory. I’m going to simplify at the risk of oversimplifying. When you use a transactions, you start a transaction, do some stuff (to a ‘copy’ of the ‘real’ data), and commit the transaction. If the ‘real’ data you are reading or updating has changed, the whole transaction fails, and you can abort the transaction, or keep trying until it succeeds.

Mass simplification but should be enough for our purposes. This is, under the hood, the guaranteed behavior of SCM systems and all databases I know of. The correctness is guaranteed (as long as the implementation is correct, of course). However, it is difficult to implement. It is even difficult to conceptualize in a lot of cases. There are lots of user-feedback implications: an ‘increment’ button should obviously retry a transaction, but what if it’s a spinner? Are you setting an explicit value, or just incrementing? Regardless, where you need correctness in a concurrent environment, you need transactions. The question is, do you need absolute correctness, or is ‘good enough’ good enough?

Recommendations

Avoid Store in Memory. If you design things this way, break the habit. It is a beginner’s mistake that I still make from time to time. Use Fetch on Demand instead. It should be your most common pattern for designing your tools.

Be careful if you think you need Transactions. Ensure they are where they need to be (database, SCM), but don’t just go around designing everything as if it needs to be transactional. If you have two programs that can edit the same file- is one or the other just winning OK? How likely is that to happen? How will you indicate the failed transaction to the user? I’d suggest designing your tools so transactions are not necessary, and just verify things are correct when they cross an important threshold (checkin, export, etc.). Do your cost-benefit analysis. A highly concurrent system will need transactions, tools that only work with local data will likely not.

It should be clear, but still worth pointing out, you can mix-and-match these patterns inside of your designs.

Hope that clarifies things, Tyler.

There’s idiomatic, and there’s just being respectful

I work in mixed language environments. Python, C#, C++, and more, can all make their rounds. It isn’t uncommon to have someone focused on C++ have to write something in another language, and it isn’t uncommon that I come across their code some point in the future.

It is easy to learn a language’s syntax but difficult to learn its idioms. Good luck trying to explain what ‘pythonic’ means to someone who is new to python or programming! So I forgive the transgressor when I see non-idiomatic code.

Usually.

There are some errors I find unforgivable. Errors that indicate a complete lack of understanding of the platform you are writing on. Errors like this (C#):

var foo = new Foo()
if (foo != null) {...}

Creating an instance is probably the most basic operation you can perform in an OO language, and the author clearly did not understand it.

Another unforgivable type of error is when someone tries to fix a bug but does not bother to understand what’s actually going on.

class Foo {
private bool _somevar;

...
}

There was some bug in the code somewhere, I can’t remember what. A developer changed ‘private bool _somevar’ to ‘private bool _somevar = False’ and declared the bug fixed (spoiler: it wasn’t).

Probably the best example comes from memory management, as the least understood things in programming tend to:

try { someUIControl.SetText(someGiantString); }
except OutOfMemoryException {
someUIControl.Clear();
GC.Collect()
someUIControl.SetText(someGiantString);
}

The only thing this did is change the stack trace. The problem was due to a .NET garbage collection implementation detail- the Large Object Heap and huge strings- and the ‘fixer’ just tried something every authority tells you not to do, which is catch an OOME.

If you’re going to leave your domain to write code in another language- I applaud you. It can show an endeavouring personality! But please have some respect for the language you are writing in- read a book, read a blog, ask for help. It’ll make you a better programmer, I promise.

Thank you, Rico Mariani, for reminding me how bad I was

A little while ago I read two great articles by Rico Mariani, a MS employee who usually blogs about performance in .NET (though python being an OO language the same advice applies there). The articles in question were these:

Performance Guidelines for Properties

Performance and Design Guidelines for Data Access Layers

I’d suggest at least skimming over them. He talks about, for property accessors, not allocating memory, locking, doing IO, having side effects, and being fast. For the DAL article, you should really read it, but the part that was especially relevant is “Whatever you do don’t create an API where each field read/write is remoted to get the value.”

It was a shocking reminder of my early days programming. Every point mentioned in those two articles, I was hands down guilty of. I don’t mean, I’ve done that sort of thing occasionally. I mean, I designed entire systems around everything you shouldn’t do with regards to properties and DAL design. To be fair, this was years ago, I was new to programming, in way over my head, and didn’t have people to turn to (no one at the studio could have told me what an ORM was or given me these suggestions about properties), so I don’t feel much guilt. And I learned better relatively quickly, well before reading those posts.

I work with a lot of new programmers, and experienced programmers who aren’t focused on higher level languages. The articles, most of all, reminded me how far I’ve come and how lucky I am. The new programmers haven’t had a chance to make the epic mistakes I have. The experienced programmers trained in a world without such useful managed languages, high quality bloggers, and sites like Stack Overflow; a world I’ve never known and I’ve benefited by learning best practices and new skills, and finding and breaking bad habits.

I remember at the time thinking how great some of their features were, the same features that, as Rico points out, are really terrible ideas. I felt fortunate that I already followed his guidelines, and even more fortunate that few people were around to witness the hideous abuses of them!

Be a deployment Boy Scout

The Boy Scouts have a rule:

Leave your campsite cleaner than you found it.

We know how to apply this rule when writing code but we often overlook this rule when it comes to installing or deploying that software.  I’ve seen, and committed, some pretty heinous accounts of changing a user’s machine, and in every single case- every single case- I’ve discovered in retrospect it was a poor decision.  Note I am only talking about internally deployed software where you have control over the environment (ie, I’m not discussing game installers and the like!).

At this point, I live by one golden rule:

Never leave persistent state on a user’s machine.  If you must, all state should be stored in a single folder.

Two caveats:

  • “Never”: Some third-party software will not adhere to this, and there are some situations where it cannot be avoided due to third party dependencies, so you may have to adapt.  I apply this rule only to what I have control over.
  • “persistent state”: Anything that sticks around after a process exits or a user logs off, that isn’t under version control.  Examples of persistent state are files, registry entries, and environment variables.  Usages include installation, file association, and settings persistence.

Some examples of things my tools or tools I’ve seen have put in or required:

  • Editing 3rd-party application preferences files or adding files to the application’s preferences folder.
  • Copying over scripts or other files out of version control onto the user’s machine.
  • Installing shell extensions.
  • Setting a user’s source control environment variables (P4PASSWD, P4CLIENT, etc).
  • Mapping a temporary drive (that scripts rely on for an absolute path, of course!).
  • Leaving persistent registry or environment variables for the user’s branch, project, etc.
  • Storing preferences for applications in multiple places.

I consider all of these mortal sins and red flags warning flares go up when I see them.

Why you shouldn’t do it!

Games development is chaotic.  Computers go through a lot of change, they install a lot of software (first and third party) and uninstall almost as much.  To make matters worse, things often go wrong, and many people are generally writing software and scripts that need to run independently and not interfere with one another.  You can avoid conflicts by not making any persistent changes to a user’s machine.  As long as everything is local to the process, or in some unique files in a well define place (AppData/Local/<company or group>/<app or tool name> on Windows), the risk of conflict is almost none.  By leaving the computer in an unmolested state, apps that do cause persistent changes become noticeable and problems more fixable (and it is easier to clean up after offenders if you have 5 suspicious environment variables rather than 50).

Change also happens in unpredictable ways.  While hard-coding a virtual disk drive seems fine, what happens when you need to run your tools on a machine (an outsourcer’s, for example) that already has a drive with that name?  Setting a persistent environment variable indicating the target branch seems fine, but what happens when 4 different tools each store their own (it will happen if you let it!)?

I’m not going to get into installers.  Don’t do it.  I’ve never seen a reason to do it for internal software.  If your studio does it, I wonder how many people actually understand it or can maintain it.  There’s less and less reason to do anything of the sort nowadays- all your python and .NET applications have no need of a traditional installation.  I’d love to be educated about why some studios use installers for their internal tools, so if you have a success (or horror) story I’d love to hear about it in the comments.

Persistence is a drug- Just say No!

I realize now that persistent settings were a deployment drug.  They didn’t make anything easier.  They were an appealing way to either do things I shouldn’t have been doing, or support workflows I shouldn’t have designed.  And global persistent state like this has the additional unfortunate effect of negatively impacting everything else in the system- because everything, and everyone, views them as the same easy solution, or key to complete power and ease over deployment and bootstrapping.

There are options.  I’ll tell you about them in future posts because I don’t have much time now.  In the meantime, join me in taking the Deployment Boy Scout’s Oath:

On my honor, I will do my best, to do my duty to developers and their computers.  To avoid the use of persistent global state, to seek out better solutions to deployment problems, to keep users’ machines clean and under their control, and to keep my code free of such corrupting influences, always.

Run/debug your way to brittle software!

While working on pynocle some time ago, I found myself getting away from TDD and going back to the more traditional “run-debug-fix” pattern.  Write code you think is correct, run it to see if it is, if it isn’t, stick a breakpoint and see what’s wrong, change code, repeat until there are no problems.

While this can often be the quickest way to get something working, it ultimately and always comes back to bite.  I’m happy that I’ve gotten to a point with TDD where I notice this behavior and it makes me feel dirty.  Though not always dirty enough to stop it, especially if I’m in a difficult-to-test environment depending on modules I can’t run from pure python.

The problems with run-debug-fix are many.

  1. The code you are writing is difficult enough that you didn’t write it correctly the first time.  So what makes you think you or someone else is going to have an easy time debugging or understanding it in the future.
  2. If the bug was logical, there was obviously some context, state, or situation you had not thought of.  How are you sure you will remember this context or situation when you change the code in the future?  How can you communicate that your code is relying on a certain state somewhere else?
  3. If your design is not testable, you are making it even less testable by adding more implicit logic where you’re fixing the bug.  Implicit logic that is going to be very difficult to test for when you come back later and forget about it.
  4. Most importantly: Every bug you fix or feature you add using run-debug-test is a doubly negative activity.  -1 for the reasons above and -1 for the missed opportunity to add a test.  It would be better to leave the bug there or delete the offending code entirely.  You are increasing the complexity of your software by supporting another code path that did not previously work or exist, instead of increasing the stability of your software by adding tests.
The amount of time you spend under the debugger is inversely proportional to the quality of your software.
I used to pride myself on being able to quickly debug and fix problems in my or other people’s code.  I now take far more pride in having code that is well tested so that other people can fix problems without spending a long time debugging them.
Return top
 

Switch to our mobile site