2017-09-18

Pass by what?

"Passing objects by value" is a horrible oxymoron. As Stallman says for the term "intellectual property", it is a "seductive mirage". It tries to conflate things that were never meant to be conflated, and in fact cannot be conflated consistently.

It is a matter of a clash of two philosophies. In one of them, we handle values, which are unchangeable elements of abstract types viewed as mathematical sets, and functions acting on them are ordinary mathematical functions, producing new values without changing the old one in any way. f(x)=x^2+3x+5 surely doesn't change x. This naturally leads to a functional paradigm, where the very idea of "changing state" is either totally foreign, or at least requires explicit transfer of state (in this case, it is a namespace containing x) through the called function.

In another one, we handle objects, which are representations of real-world objects, elements of abstract classes as the blueprints of their specification, and as such absolutely changeable with the passing of time. Functions acting on them (methods) are not mathematical functions, but algorithms, in which the natural elementary step is changing the state of some object (or a construction of a new object---copying an object is a special case of this---but surely explicitly called, never implicit in something like a function call). This naturally leads to imperative paradigm, where the idea of "changing state" is fundamental and the state is always implicit, in the same way the real world is always implicit. When you ask your son to "go to the shop and buy 10 eggs", you don't pass him the Universe as a parameter (for goodness sake!), and you expect, as a matter of course, that after such an action is executed (if it doesn't raise a TheyHadNoMoreEggs exception), some external shop will have 10 eggs fewer. And you're perfectly fine with that. You must be, since it's how the real world works.

[Right abstraction level is crucial here. If a quantum physicist tells me that's not how the real world really works, I'll tell them they are mixing abstraction levels. Just as the canonical Python implementation is written in C, the canonical real world implementation is written in quantum mechanics. It can be written in different languages, though.]

Long time ago, people knew the difference very well. For instance, first versions of ALGOL had functions that created new values, and procedures which changed existing ones. Heritage of that thinking is still present in Pascal, which still syntactically separates functions and procedures, although without enforcing the rule that functions mustn't change values, nor the procedures must create new ones. Practicality made the conflation possible, even necessary, and a smartass making the BCPL saw they could eat their cake and have it too---for "values" in BCPL are just what fits in a processor word. (Arrays, and strings as a special case of them, cheat, in the sense of actually being a pointer.) A function receiving its arguments from the stack can equally well pick up a word representing the value, as a word representing its address. So, why not? If we add one-character tokens for converting value into address and back (# and !), they'll just drown in the syntactic noise of a new language, and we can use one concept to cover both philosophies. A path to hell, as they say, is paved with good intentions. :-]

C, like a "BCPL 3K", built on that. The characters are different (& and * instead of ! and #), and we finally have "true" arrays (including "true" string literals), and we got a few more characters to support them (", \, [, ],... necessitating the use of trighraphs since they weren't available in all the character sets), but the fundamental properties stayed the same. Now everyone understood that a "true" array should push its whole value onto the stack, but that seemed absolutely unacceptable in a performance-obsessed world---and besides, that way would make it impossible to call legacy BCPL code from C, which would kill C in the crib. So, as a compromise, and a token of backward compatibility with BCPL, arrays "convert into pointers at the slightest provocation", in the words of legendary Kernighan. Then, to actually pass values bigger than one processor word into functions, we got "struct"s, which could contain arrays, but they didn't decay into pointers since backward compatibility wasn't an issue, BCPL having no support for structs at all. So, structs are the first example of an "abstraction leak", when we finally saw that "transferring a value" into an "imperative function" will require non-trivial copying, but it was too late to go back to the old paradigm.

C++ just added a few more plates of good intentions onto the aforementioned path. As first "implementations" (in fact just a bunch of preprocessor macros) of C++ had "class" as simply an alias for "struct", it seemed obvious that objects, whatever they are, should copy themselves into functions---except, of course, in case that the functions are the methods on those same objects, that doesn't actually make any sense anymore. :-) So the silly "this"-rule was invented, saying that the object the method was called on wasn't copied, but other objects, as arguments of that methods, are. Now you have the situation that son.buy_eggs(10, shop) (after a successful execution) produces a world in which your son has 10 eggs, but the shop also has that same 10 eggs, since your son took them from the copy of the shop, that stopped existing as soon as the function stopped executing. You can only hope that he took them by value, not by reference, since otherwise you'd get a segmentation fault when you try to make an omelette. It's horrible, and at the same time a logical consequence of everything before it.

Of course, a total outrage when people realized what swindle they have been sold as "object programming" made Strostrup quickly patch C++ by introducing "references", so you could write son.buy_eggs(10, shop), and it would actually change the shop. That in turn produced a total consternation of C-experts, but they were quickly dismissed as old farts who can't keep pace with the modern programming philosophies.

Alas, backward compatibility is a bitch. That's why "this" is still a pointer and not a reference (I'm convinced, if Guido lent Bjarne his time machine, it would be the first embarrassment he'd go about fixing), and much more importantly, the declaration of set<Egg> buy_eggs(int n, Shop s) still means that the poor shop is copied at every call. That was masked by a single-character (to more easily drown it in the syntactic noise, of course) addition of "&" in front of "s" in the declaration (unfortunately, in the meantime ASCII won the encoding war and you couldn't just add new symbols at will, but hey, "&" can be reused for quasi-similar purpose), and an unwritten rule that passing by value is something yucky that you use only when you have to, and something you don't talk about in high-class OO social circles. :-P

The rest, as they say, is history. "const" grew besides "&" to pacify old farts at least a bit, then copy constructors were added to pacify everyone else, then smart pointers were added to pacify Herb Sutter and Dave Abrahams ("Want speed? Pass by value!", a legendary article that finally asked a long-awaited question of "Hey, are we sure we know what we're doing here?"), but that didn't really work because you can't pacify one Herb Sutter by trivial cosmetic surgeries, so C++11 (yes, in 2011:) finally makes a radical paradigm shift: copy constructors are yucky and inefficient, move constructors are the latest craze. And when you look at it closely, move constructors are a final admission that objects and values are fundamentally different kinds of animals, and it was wrong to ever try to put them in the same category.

What does the mainstream market offer today? Besides C++, which has jumped the shark so many times it's not funny anymore even to mock it? There's Java, that has totally accepted the inevitability of that conclusion, having a built-in support for BCPLian "primitive values" that fit into a "word" of its virtual machine, and a completely separate support for objects, passed by reference because that's the only sensible way to pass objects. We have Haskell, a functional language mocking the very idea of "changing objects", living in its own mathematical world of values where it's straightforward to write a least fixed point of search over uncountable spaces, but any I/O requires nontrivial category theory. And of course, we have Python: the only mainstream language that's completely on the other side. Every referrent is an object. Mathematical "values" (like the number five) are simulated by objects whose only special property is that they have no method that changes them. Copying as an abstract operation is totally meaningless, though wide classes of objects can define what it means for their instances---and always explicit, never done simply because you name something,  call a function, or do something equally irrelevant. And finally, when your son buys eggs in the shop, a completely realistic scenario happens. Yay!

No comments:

Post a Comment