Archive of articles classified as' "pynocle"

Back home

Using code metrics effectively

19/02/2014

As the lead of a team and then a director for a project, code metrics were very important to me. I talked about using Sonar for code metrics in my previous post. Using metrics gets to the very core of what I believe about software development:

  • Great teams create great software.
  • Most people want to, and can be, great.
  • Systematic problems conspire to cheat people into doing less than their best work.

It’s this last point that is key. Code metrics are a conversation starter. Metrics are a great way to start the conversation that says, “Hey, I notice there may be a problem here, what’s up?” In this post, I’ll go through a few cases where I’ve used metrics effectively in concrete ways. This is personal; each case is different and your conversations will vary.

Helping to recognize bad code

A number of times, I’ve worked with one of those programmers who can do amazing things but write code that is unintelligible to mortal minds. This is usually a difficult situation, because they’ve been told what an amazing job they’ve been doing, but people who have to work with that code know otherwise. Metrics have helped me have conversations with these programmers about how to tell good code apart from bad.

While we may not agree what good code is, we all know bad code when we see it, don’t we? Well, it turns out we don’t. Or maybe good code becomes bad code when we aren’t looking. I often use cyclomatic complexity (CC) as a way to tell good code from bad. There is almost never good code with a high CC. I help educate programmers about what CC is and how it causes problems, giving ample references for further learning. I find that because metrics have a basis in numbers and science, they can counteract the bad behaviors some programmers have that are continually reinforced because they get their work done. These programmers cannot argue against CC, and without exception have no desire to. They’re happy to have learned how they can keep themselves honest and write better code.

It’s important to help these programmers change their style. I demonstrate basic strategies for reducing CC. Usually this just means helping them split up monolithic functions or methods. Eventually I segue into more advanced techniques. I’ve seen lightbulbs go off, and people go from writing monolithic procedures to well-designed functions and classes, just because of a conversation based in code metrics and followup mentoring.

I use CC to keep an eye on progress. If the programmer keeps writing code with high CC, I have to work harder. Maybe we exclusively pair until they can stand on their own feet again. Bad code is a cancer, so I pay attention to the CC alarm.

Writing too much code

A curious thing happens in untested codebases: code grows fast. I think this happens because the code cannot be safely reused, so people copy and paste with abandon (also, the broken windows theory is alive and well). I’ve used lines of code (LoC) growth to see where it seems too much code is being written. Maybe a new feature should grow a thousand lines a week (based on your gut feeling), but if it grows 3000 lines for the last few weeks, I must investigate. Maybe I learn about some deficiency in the codebase that caused a bunch of code to be written, maybe I find a team that overlooked an already available solution, maybe I find someone who copy and pasted a bunch of stuff because they didn’t know better.

Likewise, bug fixing and improvements are good, so I expect some growth in core libraries. But why are a hundred lines a week consistently added to some core library? Is someone starting to customize it for a single use case? Is code going into the right place, do people know what the right place is, and how do they find out?

LoC change is my second favorite metric after CC, especially in a mature codebase. It tells me a lot about what sort of development is going on. While I usually can’t pinpoint problems from LoC like I can with CC, it does help start a conversation about the larger codebase: what trends are going on, and why.

Tests aren’t being written

A good metrics collection and display will give you a very clear overview on what projects or modules have tests and which do not. Test count and coverage numbers and changes can tell you loads about not just the quality of your code, but how your programmers are feeling.

If coverage is steadily decreasing, there is some global negative pressure you aren’t seeing. Find out what it is and fix it.

  • Has the team put themselves into a corner at the end of the release, and are now cutting out quality?
  • Is the team being required to constantly redo work, instead of releasing and getting feedback on what’s been done? Are they frustrated and disillusioned and don’t want to bother writing tests for code that is going to be rewritten?
  • Are people writing new code without tests? Find out why, whether it’s due to a lack of rigor or a lack of training. Work with them to fix either problem.
  • Is someone adding tests to untested modules? Give them a pat on the back (after you check their tests are decent).

Driving across-the-board change

I’ll close with a more direct anecdote.

Last year, we ‘deprecated’ our original codebase and moved new development into less coupled Python packages. I used all of the above techniques along with a number of (private) metrics to drive this effort, and most of them went up into some visible information radiators:

  • Job #1 was to reduce the LoC in the old codebase. We had dead code to clean up, so watching that LoC graph drop each day or week was a pleasure. Then it became a matter of ensuring the graph stayed mostly flat.
  • Job #2 was to work primarily in the new codebase. I used LoC to ensure the new code grew steadily; not too fast (would indicate poor reuse), and not too slow relative to the old codebase (would indicate the old codebase is being used for too much new code).
  • Job #3 was to make sure new code was tested. I used test count and coverage, both absolute numbers and of course growth.
  • Job #4 was to make sure new code was good. I used violations (primarily cyclomatic complexity) to know when bad code was submitted.
  • Job #5 was to fix the lowest-hanging debt, whether in the new or old codebase. Sometimes this was breaking up functions that were too long, more often it was merely breaking up gigantic (10k+ lines) files into smaller files. I was able to look at the worst violations to see what to fix, and work with the programmers on fixing them.

Aside from the deleting of dead code, I did only a small portion of the coding work directly. The real work was done by the project’s programmers. Code metrics allowed me to focus my time where it was needed in pairing, training, and mentoring. Metrics allowed the other programmers to see their own progress and the overall progress of the deprecation. Having metrics behind us seemed to give everyone a new view on things; people were not defensive about their code at all, and there was nowhere to hide. It gave the entire effort an air of believably and achievability, and made it seem much less arbitrary that it could have been.

I’ve used metrics a lot, but this was certainly the largest and most visible application. I highly suggest investing in learning about code metrics, and getting something like Sonar up on your own projects.

1 Comment

Using Sonar for static analysis of Python code

15/02/2014

I’ve been doing static analysis for a while, first with C# and then with Python. I’ve even made an aborted attempt at a Python static code quality analyzer (pynocle, I won’t link to it because it’s dead). About a year ago we set up Sonar (http://www.sonarqube.org/) to analyze the Python code on EVE Online. I’m here to report it works really well and we’re quite happy with it. I’ll talk a bit about our setup in this post, and a future post will talk more about code metrics and how to use them.

Basic Info and Setup

Sonar consists of three parts:

  • The Sonar web interface, which is the primary way you interact with the metrics.
  • The database, which stores the metrics (Sonar includes a demonstration DB, production can run on any of the usual SQL DBs).
  • The Sonar Runner, which analyzes your code and sends data to the database. The runner also pulls configuration from the DB, so you can configure it locally and through the DB.

It was really simple to set up, even on Windows. The web interface has some annoyances which I’ll go over later, and sometimes the system has some unintuitive behavior, but everything works pretty well. There are also a bunch of plugins available, such as for new widgets for the interface or other code metrics checks. It has integrations with many other languages. We are using Sonar for both C++ and Python code right now. Not every Sonar metric is supported for Python or C++ (I think only Java has full support), but enough are supported to be very useful. There are also some worthless metrics in Python that are meaningful in Java, such as lines in a file.

The Sonar Runner

I’ll cover the Runner and then the Website. Every night, we have a job that runs the Runner over our codebase as a whole, and each sub-project. Sonar works in terms of “projects” so each code sub-project and the codebase as a whole have individual Sonar projects (there are some misc projects in there people manage themselves). This project setup gives higher-level people the higher-level trends, and gives teams information that is more actionable.

One important lesson we learned was, only configure a project on the runner side, or the web site. An example are exclusions: Sonar will only respect exclusions from the Runner, or the Web, so make sure you know where things are configured.

We also set up Sonar to collect our Cobertura XML coverage and xUnit XML test result files. Our Jenkins jobs spit these out, and the Runner needs to parse them. This caused a few problems. First, due to the way files and our projects were set up, we needed to do some annoying copying around so the Runner could find the XML files. Second, sometimes the files use relative or incomplete filenames, so parsing of the files could fail because the Python code they pointed to was not found. Third, the parsing errors were only visible if you ran the Runner with DEBUG and VERBOSE, so it took a while to track this problem down. It was a couple days of work to get coverage and test results hooked into Sonar, IIRC. Though it was among the most useful two metrics and essential to integrate, even if we already had them available elsewhere.

The Sonar Website

The Website is slick but sometimes limited. The limitations can make you want to abandon Sonar entirely :) Such as the ability to only few metrics for three time periods; you cannot choose a custom period (in fact you can see the enum value of the time period in the URL!). Or that the page templates cannot be configured differently for different projects (ie, the Homepage for the ‘Entire Codebase’ project must look the same as the Homepage for the ‘Tiny Utility Package’ project). Or that sometimes things just don’t make sense.

In the end, Sonar does have a good deal of configuration and features available (such as alerts for when a metric changes too much between runs). And it gets better each release.

The Sonar API

Sonar also has an API that exposes a good deal of metrics (though in traditional Sonar fashion, does not expose some things, like project names). We hook up our information radiators to display graphs for important trends, such as LoC and violations growth. This is a huge win; when we set a goal of deleting code or having no new violations, everyone can easily monitor progress.

Summary

If you are thinking about getting code metrics set up, I wholeheartedly recommend Sonar. It took a few weeks to get it to build up an expertise with it and configure everything how we wanted, and since then it’s been very little maintenance. The main struggle was learning how to use Sonar to have the impact I wanted. When I’ve written code analysis tools, they have been tailored for a purpose, such as methods/functions with the highest cyclomatic complexity. Sonar metrics end up giving you some cruft, and you need to separate the wheat from the chaff. Once you do, there’s no beating its power and expansive feature set.

My next post will go into more details about the positive effects Sonar and the use of code metrics had on our codebase.

3 Comments

Pynocle update

2/10/2011

New pynocle uploaded to google code (not PyPI yet). In this is a much better dependency graph rendering, module filename resolution, optimizations (such as only calculating dependency data and filename resolutions once), replacing imports with AST, and other improvements. However I took out the ability to run pynocle over more than just a single directory :( Hopefully this can be added back in. I really need to refactor and test some of this code, though, since it’s becoming quite hairy. The python import machinery is the first thing I’ve truly hated about python.

Next up is probably a substantial refactor of the entire system to work from a relational DB (or maybe even a python mapping pickle), rather than the ad-hoc method it uses right now.  This should allow richer metrics and more flexibility in their use.

The big backend problems right now are exactly that lack of relational data (which I can hopefully solve with a rewrite), and the difficulty in testing the module import mechanisms.  Trying to test it thoroughly meaning practically rewriting and testing all of python’s module import resolution, and then creating a sample of test data that can replicate all of the hundreds of edge-cases (paths with mixed slashes, relative paths, absolute paths with relative pieces, builtin modules with no files, modules not available on the test machine, modules in god knows where in the sys path, etc.).  I’ve done myself a disservice by not building enough tests where I can build them, though (post about this in a few days).  Anyway, I’ll update when I can.

See current metrics here: http://pynocle.googlecode.com/hg/exampleoutput/index.html

Dependency graph here:
http://pynocle.googlecode.com/hg/exampleoutput/depgraph.png

1 Comment

pynocle 0.10 released

26/09/2011

My first useful open source project, pynocle, is finally ready for me to talk about.

Get the code via Hg/GoogleCode here: http://code.google.com/p/pynocle/
Browser the pynocle-generated metrics for pynocle here: http://pynocle.googlecode.com/hg/exampleoutput/index.html

pynocle is a series of modules designed to provide the most comprehensive code analysis of python available from a single source.  It is designed to be as dead simple to use as possible- create/configure a pynocle.Monocle object, and run it.  You can get by quite well only knowing 2 methods on a single object.

Right now, pynocle has support for:

  1. Cyclomatic Complexity
  2. Coupling (afferent and efferent)
  3. Google PageRank-like algorithm for measuring coupling (requires numpy)
  4. Source lines of code
  5. Dependency diagrams (requires GraphViz Dot)
  6. Coverage (requires coverage module)
It is intended to run out-of-the-box with minimal work.  Over the coming months, I’m going to add:
  1. More configuration support.  Right now this is truly just an API, which I prefer, but it may make it easier if it can be configured through text.
  2. Runnable from commandline.  I plan to make the whole thing runnable, as well as individual components.
  3. Python easy_install/PyPI support.  Right now, you do it all by hand.
  4. Get it running on Linux.  I am catching a WindowsError in a few places and also am missing the filetype indicator at the top of the files.  I’m not a *nix guy, so if you can help with this, I’d love it (should be simple).
  5. Improve rendering of reports.  Right now, most are in plain text (except dependency images, and coverage html report).  I’d like to make them all some form of HTML.
  6. Add more metrics.  Believe it or not, I’m pretty happy with the current metrics, but I’ll be adding more as time goes on and I get ideas or people ask.
My end goal is to have something comparable to NDepend, but much more limited in scope (both because of the amount of work, and python’s dynamic nature making static analysis more restrictive).
This is my first potentially cool open source project.  If you would like to contribute, great!  Please email me.  If you have any advice for me, I’d love that to!  What’s involved in ensuring this project is successful and adopted?
No Comments