Recently I came across this post on www.builderau.com.au. The article is talking about how the author was attempting to figure out how to put a private variable onto a class in Python and how his efforts were fruitless. And so, like any good programmer, he questions the very need for a feature that isn't in his language of choice. And also, like a good programmer, I *completely* disagree with him. Now before I get into this rant, be aware that I do not know the author in question and I have nothing against him, I simply to not agree with him.
I am a Microsoft programmer as most of you know, and I write daily in C# and VB.net. I have written Delphi, Java, Javascript, Pascal, and a bit of C++ and Ruby, and have found that every language has its upsides and its downsides. In fact, most all languages have features that some people consider to be deficiencies. Just look at arguments over static typing versus duck typing and case sensitivity out there. They all boil down to personal preference (I'm sure a lot of people will even argue that), and the people that love dynamically typed languages are just not going to bend to people who want the compiler to check their types, and vice versa.
So, having said that, I still completely disagree with the argument that having private variables on a class in any way limits the "potential" of my code. Unless, if by "potential" you mean the potential to completely break in totally unexpected ways. One of the core tenets of object oriented development is encapsulation and having worked in several systems of varying complexity I cannot imagine what a system would look like if you had different objects going around modifying other objects private variables as if they owned them.
The whole purpose of software, and modern programming constructs, is to manage complexity. The encapsulation of data inside of a class limits the number of parts of the application that can interact with said variable, thus decreasing complexity. I would argue that the biggest problem in development is the growing complexity of code, and the feature interactions that come out of this. When Object Oriented programming was first thought up it was so that objects could encapsulate and control their own data. This would reduce complexity because now you could guarantee that certain data could only be accessed and modified by a controlled subset of methods, therefore limiting the number of potential interactions in your code, which in turn reduces complexity.
Although, in *some* instances being able to get to private variables might be useful, and even in .net you are able to do this using reflection. But just because you are able to doesn't mean you should, and in fact, there really should be very little reason to ever go into a class and modify a private variable. If you have enough knowledge of the class to modify the variable with complete confidence then you should probably rewrite it to do what you want without the need to delve into its nether regions. The class itself defines a contract through its properties and methods for what it is able to accomplish and by modifying internal parts of it you are compromising this integrity, which means that no other part of the system can trust that it is fulfilling its contract.
Another thing that I just don't understand is that this argument falls apart even more when you consider that Python isn't compiled. If you are working in a dynamic language then you are almost certainly going to have access to any code that you are calling. So, instead of manipulating the variables in a class directly, why wouldn't you just modify the class to do what you wanted? Or if you weren't able to modify the class, then surely you could just inherit from it and override the parts you don't like.
The author might have a *bit* (this is one tiny tiny bit) more of an argument if he was working in a compiled language, and therefore he could not directly modify the code. Then you might have some reasons why you would want to modify private variables, but in this case you would be flying blind and could accidentally modify something that could have unpredictable consequences. Also, the author says:
"There's a lot to be said for defensive programming, but after a certain point it's reducing the power of our code. I could see the argument towards field safety if you were distributing a library and you wanted to reduce the possible bugs others could write using your code, but for the majority of cases it appears that it's motivated out of a fear that users will tamper with the programmer's perfect code, in nothing less than a malicious attempt to destroy its purity."
I can't help but stare at that sentence in disbelief. I whole heartedly refuse to believe that a majority of programmers *hide* their variables inside of classes because we don't want some other developer to destroy the "purity" of our code. Seriously, who thinks like this? Does this author really think that having private variables is defensive programming? Far from it. Failing fast is defensive programming, unit testing is defensive programming, Design by Contract is defensive programming, and always assuming tampered data is defensive programming. Encapsulation is not defensive programming, it is just common sense.
The author then goes on to reference AJAX as an example of what happens when people use our code in unexpected ways, but I don't really think that this is even remotely a valid argument. First of all, the XMLHttpRequest object *is* being used in exactly the way that Microsoft envisioned it when it was first introduced in OWA (Outlook Web Access). People have begun to use it in many ways (the first popular example of AJAX that most people point out (Google Maps) wasn't even using the XMLHttpRequest object) that were not anticipated by the Outlook team, but not through direct manipulation of its internals. Its interface has been used properly to create many new ideas. If the XMLHttpRequest object had an externally visible variable to set its state, and then I just started resetting its value to whatever I wanted, I'm pretty sure that my efforts would not go very far.
So, in summary, please encapsulate. If I want to extend I will inherit, rewrite, override, replace, whatever…but please don't start exposing your object internals for all the world to see. I thought we left this kind of nonsense years ago.
P.S. We all have different languages that we like, and we all love to defend our particular languages, but we also have to accept that all languages have deficiencies. We have multiple languages for a reason and so make sure you keep in mind…
"If all you have is a hammer, everything looks like a nail."
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.
"core tenants"
Should be "core tenets". A tenet is a point or an idea. A tenant is someone who rents their home.
Thanks, I got it all fixed up now. Nothing to see here folks… move along.
I think you are confusing two different concepts, encapsulation and access control. Python does not allow you to restrict access to fields or methods via the private keyword. But you still can (and should) encapsulate your class’s internals. You just can’t have the compiler refuse to build should someone try to go around your restrictions.
Saying Python doesn’t allow encapsulation because it doesn’t have a private keyword is like saying it doesn’t allow object types because it doesn’t use static typing. It does, you just have to stop thinking like a C#/Java developer.
I’m sorry, but I think that you are missing the point of encapsulation in OOP. Many people will throw around incomplete definitions of what encapsulation is, and they will say things like "encapsulation is when you group together methods and data into a single functional group." Well, that is not at all the whole concept of encapsulation in OOP. Encapsulation is all about hiding the details of implementation from the consumer, and therefore access control plays a key part of encapsulation. Since you probably won’t believe me, here is a quote that might convince…
"Encapsulation (Information Hiding). A principle, used when developing an overall program structure, that each component of a program should encapsulate or hide a single design decision… The interface to each module is defined in such a way as to reveal as little as possible about its inner workings." [Oxford, 1986]
– Peter Coad and Edward Yourdon. Object-Oriented Analysis, 2nd ed.
You can hide the implementation details without having the compiler fail should someone try to access a hidden method or instance variable. Access control is a key part of encapsulation as it is typically used in languages like Java or C++/# (the compiler not allowing the program to do things is a key part of many aspects of those languages), and therefore developers who have been brought up on them often equate something being visible with it being accessible. But notice nowhere in the Coad/Yourdon quote you provided does it use the words "access" or "control". You can hide the implementation details of something without locking it up behind the identifier ‘private’. Unless of course you are paranoid that people will disregard your documentation and intentionally mess up your classes internals, which is what probably prompted the quote from Nick Gibson’s blog (btw, I am not him in case our similar first names were confusing).
I agree, nowhere in that quote does it use the words access or control. And I agree that as far as implementation details go, you can lock those up without using a private keyword. Encapsulation though is all about hiding state from the consumer of your class. If you cannot hide state then you cannot encapsulate. I don’t care if the compiler throws errors or if the runtime throws errors, if I can get to internal state of an object and mess with it, (without jumping through serious hoops) then the language does not *fully* support encapsulation. If I have to read the documentation to decided whether I should change the value of a property on a class then I would say that the implementation details are not hidden from me.
McConnell describes information hiding like this… "This (encapsulation) is the information hiding described in Section 6.2 all over again. You know everything about a module that it wants you to know and nothing else."
What he is saying is that encapsulation is all about the consumer only being able to access what the module *wants* you to access. It is in effect the application of information hiding. In which access control plays a huge part. If you are still not convinced, then I think we are just going to have to agree to disagree. 🙂
When we talk about information hiding, we are talking about not presenting something to the consumer. We we talk about access control, we are talking about actively preventing the consumer from doing anything with it. The philosophy behind languages like Java or C# is that if those two should be the same. If something is hidden from the consumer, they shouldn’t be able to do anything with it. But that is not an inherent property of encapsulation.
Consider the analogy of the car’s engine (seems like this is a common analogy when discussing encapsulation). The details of how the engine works is encapsulated if you will from the driver. They don’t care whether or not it has a fuel-injector or a carburetor or whatever (I am not a car person, so excuse my ignorance if the terminology is wrong). However there is no lock or other form of access control that prevents the driver from messing around in there (even though if they do it may well void the warranty). But the fact that the driver physically could mess with the engine doesn’t mean they have to concern themselves with its details. The encapsulation of the engine isn’t diminished by the fact that the car’s hood can open.
Similarly the fact that a consumer could look at a Python class’s internal details and mess with them if they really want to (even though if they do, they can no longer expect the developer to support their code) doesn’t diminish the fact that they don’t have to concern themselves with the classes details. It is still fully encapsulated.
Python uses the "we’re all consenting adults" philosophy of encapsulation. That is, you give other programmers a hint that certain members shouldn’t be accessed, but if they really want to, you let them.
I blogged about this here: http://www.ginstrom.com/scribbles/2007/09/09/hide-your-privates-but-dont-be-a-prude/
So in this case, I would say that it really is a feature (a conscious design decision) and not a deficiency 🙂
really good arguments on both sides! Love it.
@Nick Brown "The encapsulation of the engine isn’t diminished by the fact that the car’s hood can open"
Just in relation to C#, for example, a private member variable *can* be modified — but you have to resort to using reflection. I’d say this is similar to popping the hood. Something that consumers of the code wouldn’t do, but those mechanics who are intimate with the inner workings might do when trouble shooting.
@Nick Brown: I am just going to have to say that we are operating on different definitions of encapsulation.
@Ryan Ginstrom: I would posit that saying "We are all consenting adults model of encapsulation" is kinda an oxymoron. But I believe I have already made that clear. 🙂 And I even somewhat agree with parts of what Alex Martelli is saying, (He is quite obviously much smarter than I) but I still believe that by limiting the potential number of ways in which an object can be interacted with reduces complexity. And for those of us mere mortals, such as myself and most people I know, we need ways in which to minimize complexity in our software. Exposing internal variables will inevitably lead to someone using those variables, and then what happens when I need to modify my class? There is strong arguments for both sides of this.
@secretGeek: I would argue that reflection is a feature of the runtime, not the language. In the C# language spec there is *no* way of getting to private variables. Reflection is a part of the framework (which is shared across all languages) that lets you bypass the languages encapsulation mechanisms.
I think it’s possible to reduce complexity without resorting to private methods/member variables. When I program in C++ I use the private keyword, but I try to do so sparingly.
I started to write some of my reasons here, but it got too long, so I turned it into a blog post:
http://www.ginstrom.com/scribbles/2007/11/12/three-reasons-to-avoid-private-class-members/
By the way, the live preview feature of your comment box is really cool 🙂
I can’t say I agree with this. The difference really boils down to a difference in utility and philosophy – is the language made stronger by addition of privates (no, because you can always get them anyway), and is it worth making a dedicated programmer jump through hoops to get a private field they really want (not in python-land).
@Eric Ah yes, but you are thinking like a python programmer! See, for me, I *want* a developer to have to jump through hoops to get to my private variables. It will make him think twice about whether he really wants to get to those variables and why they are hidden. If I make it easy for him to get to those variables, then he will use them easily. I do see your point though, in your software you want these internal variables to be easily reached. I do not want this, I want my objects to be encapsulated, and for the consumers of my classes to use well defined interfaces. If I expose these internal variables then I suppose now I am have to write unit tests that take into account that these variables can change at any point? That would be some *ugly* code. I’m sorry, but I just don’t think you guys are going to convince me that easy access to internal variables is a good thing. I just think that the downsides are too great in terms of maintainable and scalable software. And yes, my argument is invalid if every person who touches your code is an excellent programmer with knowledge of the system that they are using, but this is rarely the case.
Hi, Justin,
Love this sentence:
"This would reduce complexity because now you could guarantee that certain data could only be accessed and modified by a controlled subset of methods, therefore limiting the number of potential interactions in your code, which in turn reduces complexity."
Frustrated at the lack of mathematical proofs in our field, I recently derived some mathematical laws showing precisely how encapsulation (and information hiding) does exactly what you say; the article’s here:
http://www.edmundkirwan.com/encap/intro.html
And if that’s a little too much, then the real highlight is the first graph here:
http://www.edmundkirwan.com/encap/page5.html
Which illuminates your conjecture nicely.
Regards,
Ed