Blog of Rob Galanakis (@robgalanakis)

Using Sonar for static analysis of Python code

I’ve been doing static analysis for a while, first with C# and then with Python. I’ve even made an aborted attempt at a Python static code quality analyzer (pynocle, I won’t link to it because it’s dead). About a year ago we set up Sonar (http://www.sonarqube.org/) to analyze the Python code on EVE Online. I’m here to report it works really well and we’re quite happy with it. I’ll talk a bit about our setup in this post, and a future post will talk more about code metrics and how to use them.

Basic Info and Setup

Sonar consists of three parts:

  • The Sonar web interface, which is the primary way you interact with the metrics.
  • The database, which stores the metrics (Sonar includes a demonstration DB, production can run on any of the usual SQL DBs).
  • The Sonar Runner, which analyzes your code and sends data to the database. The runner also pulls configuration from the DB, so you can configure it locally and through the DB.

It was really simple to set up, even on Windows. The web interface has some annoyances which I’ll go over later, and sometimes the system has some unintuitive behavior, but everything works pretty well. There are also a bunch of plugins available, such as for new widgets for the interface or other code metrics checks. It has integrations with many other languages. We are using Sonar for both C++ and Python code right now. Not every Sonar metric is supported for Python or C++ (I think only Java has full support), but enough are supported to be very useful. There are also some worthless metrics in Python that are meaningful in Java, such as lines in a file.

The Sonar Runner

I’ll cover the Runner and then the Website. Every night, we have a job that runs the Runner over our codebase as a whole, and each sub-project. Sonar works in terms of “projects” so each code sub-project and the codebase as a whole have individual Sonar projects (there are some misc projects in there people manage themselves). This project setup gives higher-level people the higher-level trends, and gives teams information that is more actionable.

One important lesson we learned was, only configure a project on the runner side, or the web site. An example are exclusions: Sonar will only respect exclusions from the Runner, or the Web, so make sure you know where things are configured.

We also set up Sonar to collect our Cobertura XML coverage and xUnit XML test result files. Our Jenkins jobs spit these out, and the Runner needs to parse them. This caused a few problems. First, due to the way files and our projects were set up, we needed to do some annoying copying around so the Runner could find the XML files. Second, sometimes the files use relative or incomplete filenames, so parsing of the files could fail because the Python code they pointed to was not found. Third, the parsing errors were only visible if you ran the Runner with DEBUG and VERBOSE, so it took a while to track this problem down. It was a couple days of work to get coverage and test results hooked into Sonar, IIRC. Though it was among the most useful two metrics and essential to integrate, even if we already had them available elsewhere.

The Sonar Website

The Website is slick but sometimes limited. The limitations can make you want to abandon Sonar entirely :) Such as the ability to only few metrics for three time periods; you cannot choose a custom period (in fact you can see the enum value of the time period in the URL!). Or that the page templates cannot be configured differently for different projects (ie, the Homepage for the ‘Entire Codebase’ project must look the same as the Homepage for the ‘Tiny Utility Package’ project). Or that sometimes things just don’t make sense.

In the end, Sonar does have a good deal of configuration and features available (such as alerts for when a metric changes too much between runs). And it gets better each release.

The Sonar API

Sonar also has an API that exposes a good deal of metrics (though in traditional Sonar fashion, does not expose some things, like project names). We hook up our information radiators to display graphs for important trends, such as LoC and violations growth. This is a huge win; when we set a goal of deleting code or having no new violations, everyone can easily monitor progress.

Summary

If you are thinking about getting code metrics set up, I wholeheartedly recommend Sonar. It took a few weeks to get it to build up an expertise with it and configure everything how we wanted, and since then it’s been very little maintenance. The main struggle was learning how to use Sonar to have the impact I wanted. When I’ve written code analysis tools, they have been tailored for a purpose, such as methods/functions with the highest cyclomatic complexity. Sonar metrics end up giving you some cruft, and you need to separate the wheat from the chaff. Once you do, there’s no beating its power and expansive feature set.

My next post will go into more details about the positive effects Sonar and the use of code metrics had on our codebase.

3 thoughts on “Using Sonar for static analysis of Python code

  1. Lior Tal says:

    I also recently checked out Sonar (for Java code).

    I personally dislike the client-server architecture it imposes. Many times i just like to have some code analysis locally to see how new code looks like. This complicates the process a bit.

    Are you using Sonar only for code metrics or also for code analysis? if so, how are you handling the biggest issue of introducing code analysis to a “live” project ? (e.g; gajillion errors during the nightly build)

    1. Hi Lior,

      You can use the Sonar Runner to do a “dry run” and show you the differences in metrics (new violations, etc) in your local code vs. what’s on the server. This is really nifty if you want to use it, though we didn’t use it much.

      I really only used Sonar for metrics, and did not worry about style or code analysis. I rely on the IDE for code analysis and style enforcement. I feel style and code issues should always be caught at the development phase, so rely on tools for it. Your file should always be “green.” If teams aren’t achieving this, I find it requires a different approach than the more nebulous concept of ‘good code vs. bad code.’ Code analysis also happens in Sonar, but it’s not very in-depth (especially for Python), and we handle most issues upstream as described. We usually don’t put in many style checks.

      Regarding introducing code analysis to a live project, I will go into this more in my next post (I’ll also add that not having nearly any style or analysis in the metrics reduces the noise). If how we used Sonar in that context is still unclear, tell me and I’ll attempt to clarify.

Leave a Reply