Blog of Rob Galanakis (@robgalanakis)

All languages need first class tuples

I was doing some work on a Flask app today and built up some chart data in Python that had a list of two-item tuples like [(<iso datetime string>, <value>), ...]. I needed to iterate over this same structure in JavaScript and of course was reminded how great it is that I can unpack these things easily in Python ala “for datestr, value in chartdata” because I was missing it so much in JavaScript. But rather than keep these as tuples and have to play around with non-idiomatic JS, I just made my tuple into a dictionary, so my chart data became a list of dicts instead of a list of tuples.

I really dislike having to use objects (or names, more precisely) for these micro data structures. Over time I’ve moved more and more away from creating classes for data structures in Python, a ‘best practice’ habit I brought over from C#. It is silly in a dynamically typed language. There’s nothing more clear about:

for entry in chartdata:
    chart.add(entry.x, entry.y)

Than:

for x, y in chartdata:
    chart.add(x, y)

In fact it’s probably less clear, because there is the totally unhelpful variable name “entry.”

At some point- maybe even three items?- it becomes more clear and self-documenting to use names (that is, dicts instead of tuples), but for the very common cases of simple collections used by nearby code, tuples and automatic unpacking can’t be beat!

14 thoughts on “All languages need first class tuples

  1. Kwpolska says:

    For God’s sake, you can use a list for the exact same purpose. Or any other list-like structure in any other language.

  2. Adam Skutt says:

    Dicts and tuples aren’t interchangeable structures and it shouldn’t be suggested that they are. If you need a tuple and want the entries to have names, then used namedtuple, not a dict.

    1. Adam: I don’t care about the entries having names. It was basically a list of keys and values (changelist to value, or something like that). Of course they’re not interchangeable in general but in the simple case, as I stated, they certainly are. Returning `(changelist, value)` from a function vs. `{‘changelist’: changelist, ‘value’: value}` is basically the same thing.
      Kwpolska: Right, it’s the automatic unpacking that I’m missing. Obviously you can use arrays in JS like Python lists. But you can’t unpack them automatically, like during iteration: `for changelist, value in collection`, you would have to say the equivalent of `for item in collection: changelist = item[0]; value = item[1]`.

  3. Adam Skutt says:

    Well, for starters, unpacking a dictionary doesn’t work the way you think it does.

    However, even if it did, the fact that lookups and mutability between the two are important and shouldn’t be ignored for syntactic convenience.

  4. Adam Skutt says:

    The fact that lookups and mutability between the two are different. My apologies.

    1. Not sure what you mean by “unpacking a dictionary”? Also I’m not sure what you mean by ignoring the lookups and mutability. This is a controlled use case and despite any argument about right and wrong from an ideological level, yes this is really about being convenient (easy to understand, write, and maintain). Certainly lists, tuples, and dicts are different types with different uses, but equally certainly there are times when the right type isn’t black and white?

      The main point of my post is I like being able to write “for changelist, value in collection: use(changelist); use(value)” rather than “for item in collection: use(item[0]); use(item[1])” or “for item in collection: use(item.changelist); use(item.value)”

  5. Adam Skutt says:

    Yes, I get your main point. but then you complained that the items within the tuple don’t have names, and that if one wants names, one should use a dict. But that’s emphatically wrong in Python, because we have namedtuple, which lets us have our cake and eat it.

    You get both:
    point = collections.namedtuple(‘Point’, [‘x’, ‘y’, ‘z’])
    x, y, z = Point(1, 2, 3)
    point.x*point.x + point.y*point.y + point.z*point.z

    The comment about unpacking a dictionary was to further how absurd it’s to mention them when you’re talking about unpacking, because:

    foo, bar = {‘foo’: 1, ‘bar’: 2}

    almost certainly doesn’t do what you expect. A dictionary is not a substitute for the feature you want in any way, shape or form, or even its logical extension (i.e., unnamed record type -> named record type). They’re not interchangeable in any case, much less the “simple case”. To suggest otherwise means you don’t really grasp what’s going on when you’re writing the code you like so much.

    On top of all of that, it’s ideologically poor because data structures should based on their defining attributes, not syntactic convenience. In other words: select a tuple because it’s the right data structure for the code logically, not because Python provides them with extra syntactic sugar. Reasoning about code is difficult enough even when programmers aren’t choosing data structures out of laziness.

  6. Daniel Watts says:

    I would suggest that deconstruction or pattern matching over common data structures is the killer feature, rather than tuples specifically.

    Tuples are lovely and all, and have nice sugar associated with them in Python, but it’s a generally useful feature to be able to extract values from a data structure and bind them to names at the same time.

    A Clojure-y example might be:

    (for [[pos vel & more] some-sequence]
    (let [{:x px, :y py} pos] (+ px py)))

    Clojure doesn’t have tuples, but allows you to use a vector, map etc. on the left hand side of a binding form. There is a magic keyword to reduce clutter when you want to unpack a map using the same word for the keyword in the dict and the name to bind to.

    Haskell does a similar trick, although you need to invoke a language extension to get nice syntax when unpacking records IIRC. Haskell and (functional style) Scala make more use of ADTs, giving a differently sugary symmetry when pattern matching via using the constructor function to pull out values you’re interested in:

    let (Vector x y _) = someVector
    [a,b,c] = someListOfExactlyThreeThings
    (one : two : three : more) = someListOfAtLeastThreeThings
    (foo, bar) = someTuple
    in
    (x+y) — what a waste of all those names…

    Scala muddies the waters by allowing you to specify deconstructors I believe, but that is odd magic and not the default.

    Err, yes. Distracted. But anyway: all languages should support at least some form of deconstructing or pattern matching! And more languages should support tuples, because they also have useful properties.

  7. Daniel Watts says:

    Oh, and @Adam:

    I Python only with the greatest reluctance, but even I think you’re going a bit over the top there. I can envisage many cases where you have a collection of dicts that are known to contain at least some keys but have arbitrary quantities of additional data. Why wouldn’t you use them in that case? Or maybe you need the ::shudder:: mutability offered by dicts?

    Granted, if you *could* deconstruct a dict it’d be cleaner:
    for {‘x’: x, ‘y’: y} in [{‘x’: 3, ‘y’: 7}, {‘x’: 2, ‘y’: 4, ‘tag’: ‘lemon’}]: foo(x*x + y*y)

    Anyway, in general I object to “data structures should based on their defining attributes, not syntactic convenience.” Syntactic convenience is a wonderful attribute, and should be actively encouraged everywhere. If (for example) your language of choice makes working with tuples orders of magnitude easier than working with record types or whatever, such that you use the former even when the latter has more useful properties otherwise, I’d suggest your language is broken. Languages are for humans first and foremost after all!

    Granted you should try not to let your code degrade into unreadable sludge (although frequently code that is convenient to write is also easy to read) or run like molasses, but if doing so comes at a massive convenience cost then a design somewhere has failed.

  8. Thanks for that Daniel, you have better narrowed down the important stuff and thanks for the information.

    Adam, if you really feel I “don’t grasp what’s going on” when I program, then I really thank you for all the effort you spend trying to educate me. Here I was all this time thinking the reason you comment so much on this blog was because you didn’t like me! Thanks for explaining how dictionary unpacking works, I’ve never even thought of using it but will now make sure I use it every chance I get.

  9. Adam Skutt says:

    Daniel:

    I can envisage many cases where you have a collection of dicts that are known to contain at least some keys but have arbitrary quantities of additional data. Why wouldn’t you use them in that case?

    You would, because they fit that scenario and a tuple does not fit that scenario.

    Anyway, in general I object to “data structures should based on their defining attributes, not syntactic convenience.”

    Your objection really boils down to the final line, “Languages are for humans first and foremost after all!” and while that might be true, it doesn’t matter since computer programs are for the users, not their authors.

  10. Daniel Watts says:

    Adam:

    Sorry, I should have spent more time on that bit. I’d argue that “computer programs are for the users, not their authors.” is only true for dead programs, unchanging lumps of executable code.

    For every other program, authors are a distinct category of users too. They interact with the program differently, to be sure, as they’re mostly prodding at source code rather than the binary blob or whatever comes out the other end of your build process. But it seems reasonable that the easier it is to comprehend and modify the source, the better it should be for the final product. Faster iteration, fewer trivial bugs, all that good stuff!

    From this point of view syntactic convenience benefits everyone, and *should* be a strong motivator for picking a given data structure or representation over the others, weighed against all the other concerns such a performance, import dependencies, memory overhead, sequence determinism etc etc as dictated by your domain. As I said, it’s one important property among many, but to disregard it entirely because it only affects the people writing and maintaining the code… I don’t buy that. The experience of those people matters too, even if I am biased :)

    It is important to make such decisions with good taste and awareness of your co-authors, of course, as different programmer-users have different tolerances for when concision becomes obfuscation.

    Rob: I actually missed the bit where you mentioned moving away from classes and records and towards lightweight combinations of basic data types, but good on ya. The boilerplate associated with class definition and object instantiation, even in Python, just feels too cumbersome for some purposes. Tuples, lists and dicts can do many lovely things all on their own!

  11. @Rob, I agree with your post, because my own Python has followed the same pattern over the years: fewer classes, and more quick lists of tuples, especially in places where a library is just “talking to itself” and not presenting an API for general use (where tuples are hard to expand later if you want to include more information than was provided in the first version of the API).

    But one place where my habit is slightly different than yours is that I tend to choose the tuple→object upgrade more often than tuple→dict — though I have certainly done both before, and they both work!

    Why do I choose objects? I think it has to do with what the semantics of a dictionary mean to me. Generally, a program is a mix of programmer-chosen data, like the names we choose for our data structures, and external-real-world-chosen data, like the form fields from a web request. When I see a Python object, it suggests to me that the attribute names are fixed and are programmer-chosen and are not going to vary from one run of the code to the next. But when I see a dictionary, I tend to assume on first glance that BOTH its keys and values — not just the values — are an externally driven or externally specified set of values.

    So “foo.bar” looks safe, because we generally create objects with their attributes set up consistently. But “foo[‘bar’]” looks like it needs some safely checks around it, because who knows whether the dict has that key present this time around or not?

    Of course this difference might only reflect my own habits ­— but since I agree with @Daniel that code readability is critically important, I thought I would mention that object and dictionary use “look different” to me semantically in a way that makes a tuple→namedtuple→object progression make the most sense to my eyes.

    Thanks for bringing the subject up!

    1. Honor to have you comment, Brandon. While I’m a big fan of the namedtuple concept, I wish they were a) builtin and b) didn’t require two statements (defining the ‘class’, instantiating). I guess what I want is something like a ‘Frozen Bundle’.

      I’m sure I’d use them much more often because while what you are talking about (dot-access vs. key-lookup) is definitely preference, I think yours is a good preference :) I just can’t bring myself to create data-only types, because they are either verbose or unclear.

      Always such a tradeoff… I thought there was supposed to be only one way of doing things!

Leave a Reply