Archive of articles classified as' "functional programming"

Back home

All languages need first class tuples

6/12/2013

I was doing some work on a Flask app today and built up some chart data in Python that had a list of two-item tuples like [(<iso datetime string>, <value>), ...]. I needed to iterate over this same structure in JavaScript and of course was reminded how great it is that I can unpack these things easily in Python ala “for datestr, value in chartdata” because I was missing it so much in JavaScript. But rather than keep these as tuples and have to play around with non-idiomatic JS, I just made my tuple into a dictionary, so my chart data became a list of dicts instead of a list of tuples.

I really dislike having to use objects (or names, more precisely) for these micro data structures. Over time I’ve moved more and more away from creating classes for data structures in Python, a ‘best practice’ habit I brought over from C#. It is silly in a dynamically typed language. There’s nothing more clear about:

for entry in chartdata:
    chart.add(entry.x, entry.y)

Than:

for x, y in chartdata:
    chart.add(x, y)

In fact it’s probably less clear, because there is the totally unhelpful variable name “entry.”

At some point- maybe even three items?- it becomes more clear and self-documenting to use names (that is, dicts instead of tuples), but for the very common cases of simple collections used by nearby code, tuples and automatic unpacking can’t be beat!

14 Comments

Python Singletons

18/11/2012

In my opinion, good python libraries and frameworks should spend effort guiding you towards the ‘pit of success’, rather than trying to keep you from failing. They do this by spending most effort on things related to the critical path- clear interfaces, simple implementations, thorough documentation.

Which is why singletons are, to me, the worst form of framework masturbation in python. You will never be able to stop people from doing something stupid if they’re determined (in pure python). In the case of a singleton, that means instantiating more than one instance of a type. So spending effort on ‘designing’ singletons is not just a waste of effort, but actively harmful. Just provide a clear way to use a single instance, and your system should fail clearly if it detects an actual problem due to multiple instances (as opposed to, trying to detect multiple instances to keep said problem from happening).

The best method for singletons in python, then, is- whatever is simplest!

  1. Some form of module or class state is, to me, the clearest. It requires someone reading or using your code to know nothing more than the most basic python. Just prefix your class def with an underscore, and expose an accessor function to an instance stored on the module (or on the class). The capacity for failure is minimal and the behavior is clear (it requires no behavior modification to the type itself).
  2. Overriding __new__ is pretty bad but OK. It requires someone to understand the subtleties of __new__, which is a useful thing to teach someone but, are singletons really the time and place?
  3. Using a metaclass is a terrible solution. It has a higher likelihood of failure (how many people understand the nuances of metaclasses!?). Misdirection even for people just reading your code, trying to understand your type’s behavior. Avoid.
The question to ask yourself before doing any of this is, “is a singleton a technical requirement or an architectural preference?” Ie, a single instance of an application event loop (QApplication, etc) I’d consider a technical requirement and make it foolproof (in C?). But technical requirements are few and far between and should be driven by underlying system/OS requirements rather than your code’s design or architecture. If it’s an architectural preference- “there should only be one instance of this manager/window/cache”- there’s absolutely no reason to confuse your code (especially you object’s behavior!) to achieve it. Just use design, documentation, and examples, to show people the right way to use it.
4 Comments

Passing around complex objects is the opposite of encapsulation

15/02/2012

I see this a lot:

class Foo:
    spam = None
    eggs = None

def frob(foo):
    return sprocket(str(foo.eggs))

f = Foo()
s = frob(f)

It tends to be more sinister, and difficult to see, in verbose examples. But generally it is easily identified by the called method using a single attribute or
method from the object passed in (or multiple in longer functions that should be split up ;) ). Sometimes I bring this up and say, “pass in the value directly,” and the ‘why’ clicks right away. Sometimes people (including my older self) say “but taking in a ‘foo’ encapsulates my method!”

I guess.  It certainly hides the detail that `frob` needs only `.eggs` and doesn’t also need `.spam`. But you’ve also coupled the implementation of `frob` to the interface of `Foo`. So you’ve achieved encapsulation by greatly increasing coupling.

Of the two, I’d vastly prefer a method that must take additional parameters if its implementation changes (ie, if it needs access to `.spam`), than increase coupling. High coupling leads to brittle, untestable, and non-reusable code. Changing the interface of a method leads to… what exactly?

Not only that but the contract of a method is much clearer (to both callers and maintainers) if it takes in meaningful parameters, rather than a single object which it accesses a bunch of properties of. It conveys more information for callers, and establishes what it is supposed to do to maintainers (who will not be able to just get or set the attribute of an object that happened to be passed into that method because it was a convenient place to do so).

So it is usually vastly preferable to take in the values the function uses, rather than pass around complex objects, and in fact this is a common design paradigm in functional programming. But obviously I’m not just using strings and ints everywhere. So what guidelines do I follow?

  1. Immutable objects are fine to pass around (though prefer the advice about just listing what the function takes as per above).
  2. Mutable objects should never be passed around, as I consider creating an object and passing it to a method that mutates it one of the greatest sins in OOP.
3 Comments

Large initializers/ctors?

26/01/2012

With closures (and to some extent with runtime attribute assignments), I find the signatures of my UI types shrink and shrink. A lot of times we have code like this (python, but the same would apply to C#):

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    self.value = value
    self._InitButtons()    

  def _InitButtons(self):
    self.button = Button('Press Me!', parent=self)
    btn.clicked.addListener(self._OnButtonClick)

  def _OnButtonClick(self):
    print id(self.button), self.value

However we can easily rewrite this like so:

class FooControl(Control):
  def __init__(self, value):
    super(FooControl).__init__()
    btn = Button('Press Me!', parent=self)
    def onClick():
      print value
    btn.clicked.addListener(onClick)

Now this is a trivial example. But I find that many types, UI types in particular, can have most or all of these callback methods (like self._OnButtonClick) removed by turning them into inner functions. And then as you turn them into inner functions in init, you can get rid of stored state (self.value and self.button).

But as we take this to the extreme, we end up with very simple classes (and in fact I could replace FooControl with a function, it doesn’t need to be a class at all), but very long init methods (imagine doing all your sub-control creation, layout, AND all callback functionality, inside of one method!).

I’ve decided I’d rather have a long init method, usually broken up into several inner functions, rather than a larger signature on the class with layout, callbacks, and stored state. In my mind, it is easier to pull something out into a type attribute, rather than remove it, as anything on the type is liable to be used externally. And breaking up your layout into instance methods that can really only be called once (_InitButtons), from the init, adds a cognitive burden for me.

So I can justify this decision to eliminate extra attributes rationally, but what seals the deal is, I’m not unit testing any of this code anyway. So whether it is in one long method, or broken up into several methods, it isn’t getting tested.

I started out as very much in the ‘break into small methods’ camp but have wholesale moved into the ‘one giant __init__ with inner functions’ camp. I’m curious what you all prefer and why?

6 Comments

Don’t use global state to manage a local problem

25/09/2011

Just put this up on altdevblogaday: http://altdevblogaday.com/2011/09/25/dont-use-global-state-to-manage-a-local-problem/


I’ve ripped off this title from a common trend on Raymond Chen of MSFT’s blog.  Here are a bunch of posts about it.

I can scream it to the heavens but it doesn’t mean people understand.  Globals are bad.  Well, no shit Sherlock.  I don’t need to write another blog post to say that.  What I want to talk about is, what is a global.

It’s very easy to see this code and face-palm:

global spam = list()
global eggs = dict()
global lastIndex = -1

But I’m going to talk about much more sinister types of globals, ones that mingle with the rest of your code possibly unnoticed. Globals living amongst us. No longer! Read on to find out how to spot these nefarious criminals of the software industry.

Environment Variables

There are two classes of environment variable mutation: acceptable and condemning.  There is no ‘slightly wrong’, there’s only ‘meh, I guess that’s OK’, and ‘you are a terrible human being for doing this.’

  1. Acceptable use would be at the application level, where environment variables can be get or set with care, as something needs to configure global environment.  Acceptable would also be setting persistent environment variables in cases where that is very clearly the intent and it is documented.  Don’t go setting environment variables willy-nilly, most especially persistent ones!
  2. Condemning would be the access of custom environment variables at the library level.  Never, ever access environment variables within a module of library code (except, perhaps, to provide defaults).  Always allow those values to be passed in.  Accessing system environment variables in a library is, sometimes, an Acceptable Use.  No library code should set an environment variable, ever.

Commandline Args

See everything about Environment Variables and multiply by 2.  Then apply the following:
  1. Commandline use is only acceptable at the entry point of an application.  Nothing anywhere else should access the commandline args (except, perhaps to provide defaults).
  2. Nothing should ever mutate the commandline arguments.  Ever!

Singletons

I get slightly (or more than slightly) offended when people call the Singleton a ‘pattern.’  Patterns are generally useful for discussing and analyzing code, and have a positive connotation.  Singletons are awful and should be avoided at all costs.  They’re just a global by another name- if you wouldn’t use a global, don’t use a singleton!  Singletons should only exist:
  1. at the application level (as a global), and only when absolutely necessary, such as an expensive-to-create object that does not have state.  Or:
  2. in extremely performance-critical areas where there is absolutely no other way.  Oh, there’s also:
  3. where you want to write code that is unrefactorable and untestable.
So, if you decide you do need to use a global, remember, treat it as if it weren’t a global and pass it around instead (ie, through dependency injection).  But don’t forget: singletons are globals too!

Module-level/static state

Module-level to you pythonistas, static to your C++/.NET’ers.  It’s true- if you’re modifying state on a static class or module, you’re using globals.  The only place this ever belongs is generally for caching (and even then, I’d urge you to reconsider).  If you’re modifying a module’s state- and then you’re acknowledging what you’re doing by, like, having to call ‘reload’ to ‘fix’ the state, you’re committing a sin against your fellow man.  Remember, this includes stuff like ‘monkeypatching’ class or module-level methods in python.

The Golden Rule

The golden rule that I’ve come up with with globals is, if I can’t predict the implications of modifying state, find a way not to modify state.  If something else you don’t definitely know about is potentially relying on a certain state or value, don’t change it.  Even better, get rid of the situation.  This means, you keep all globals and anything that could be considered a global (access to env vars, singletons, static state, commandline args) out of your libraries, entirely.  The only place you want globals is at the highest level application logic.  This is the only way you can design something where you know all the implications of the globals, and rigorously sticking to this design will improve the portability of your code greatly.

Agree?  Disagree?  Did I miss any pseudonymous globals that you’ve had to wrangle?

No Comments

WTFunctional: Be Declarative

25/07/2011

Functional programming is one of the most important developments in programming, but one that has been understandably slow to be adopted and understood by many programmers and tech artists.  Over a few posts, I’m going to try to go into the how and why of using a more functional style in your daily programming activities.

First up is demonstrating that functional programming is declarative: it makes your code more expressive and optimized.

Most programmers are used to seeing this:

list = []
for i = 0 to 10 do
  if i % 2 == 0:
    list.append(i)
//list is now [0,2,4,6,8,10]

Less familiar would be:

list = range(0, 10).filter(lambda i: i % 2 == 0)

The first focuses on the how: increment i from 0 to 10, and append every even item and 0 to a list.  This is an imperative style.  The second focuses on the what:  for each item from 0 to 10, select all even items.  This is a declarative style, which is an aspect of functional programming.  In this trivial case, the difference is, well, trivial.  But the key differences are:

  1.  The declarative style does not specify the enumeration mechanism- it uses the ‘range’ function, rather than incrementing explicitly (as a regular foreach loop does).
  2. The declarative style does not specify the filtering mechanism- it uses a ‘filter’ function, rather than an explicit ‘if’ statement.
  3. The declarative style does not specify the storage mechanism- it usually just returns any type that can be enumerated/iterated over, not a concrete type like a list/array/etc.

These differences create three key benefits:

  1. The abstracted enumeration mechanism means the enumeration mechanism can be optimized, and doesn’t have to be considered by the user.
  2. The abstracted filtering means the filtering can be optimized because its implementation is hidden from the user, and its intention is more explicit- this is the declarative part of it.  We’ll see how to read a more complex statement next.
  3. The abstracted storage mechanism grows out of the other two abstractions- there may not be a storage mechanism at all, but possibly just generators- it really depends on what is expedient for the statement.

Let’s try out a more concrete example.  In this case, we’ll be doing some complex enumeration- grouping, sorting, and projejcting.  We want to get a collection of MyObject from active table rows that are ordered by date and then by ID.

dateAndItemsMap = dict()
for row in myTable.rows:
    if row.isActive:
        if row.date not in dateAndItemsMap:
            dateAndItemsMap[row.date] = list()
        dateAndItemsMap[row.date].append(new MyObject(row))
sortedDates = dateAndItemsMap.values()
sortedDates.sort()
itemsSortedByDateThenId = list()
for date in sortedDates:
    items = dateAndItemsMap[date]
    items.sort(lamba obj: obj.id)
    itemsSortedByDateThenId.extend(items)

Wow, that’s a lot of code!  And not at all clear when reading it.  Let’s read it: Create a dictionary, and for each row, if it is active, make sure the map has a list for the row’s date, and append a new MyObject to the list at row.date in the map.  Then sort the keys, then iterate over the sorted keys, get the sorted list value, and keep extending the result list.  That’s a mouthful, and I think that was pretty brief.

Let’s compare this to the declarative style:

myTable.rows.filter(lambda r: r.isActive).select(lambda r: MyObject(r)).order_by(lambda o: o.date).then_by(lambda o: o.id)

One line?  One stinking line?  Let’s read it: For each row that is active, select a new MyObject, and order those by dates, and then by id.  Notice a) how the explanation expresses what you want, not how you want to get it, and b) the explanation reads very similar to the code.

This is why declarative programming rocks, right now.  It is worth its weight in gold to learn how to use LINQ in C#, itertools in python, or whatever declarative querying mechanism your language hopefully has.  Your code will become infinitely clearer.

The reason to be declarative will be even more awesome in the future, is when we can ‘prove’ software to be side-effect free (pure), and the compiler or runtime can automatically parallelize it and optimize it.  This is one reason languages like SQL have been so effective- the software/hardware can actually reorder or adjust your query to optimize it, and those algorithms or optimizations can change because the language itself has no notion of how the algorithms for JOIN, GROUP_BY, etc. are implemented.

That makes sense, I hope, and it is just one benefit of learning about functional programming.  Next up will probably be closures.

No Comments