I was reading Matt Podwysocki’s blog yesterday (which, by the way, is awesome. Go subscribe.) and he had up a post called “Recursing into Recursion – Memoization”. Very good post if you want to learn about memoization. I made a post about a generic memoize function a while back, so memoization is not what I am here to talk about. The thing that piqued my interest was when I got toward the end of the post and saw this:
Don’t forget that proper memoization requires the functions that you are writing to be pure. What that means is that given the same input, the same output will always be calculated and returned.
My first thought was “Oh Matt, pure means that a function has no side effects, while deterministic means that a function always returns the same results for a give set of parameters”. Then, like any good programmer, my second thought was “Actually, you are probably the one who is wrong.” So, I went and looked it up, and not surprisingly I was the one who was wrong.
Here is the definition of a pure function from wikipedia:
- The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds, nor can it depend on any external input from I/O devices.
- Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
And the definition of a deterministic algorithm by the NIST:
An algorithm whose behavior can be completely predicted from the input.
So pure functions are actually a subset of deterministic functions. All pure functions are deterministic functions but not all deterministic functions are pure functions. For example, we could have a function that takes a specific set of parameters, and then returns a deterministic result, yet writes the values out to disk or to a console. By definition it would no longer be pure, but it would be deterministic. For example:
Deterministic and pure:
public int Add(int val1, int val2) { return val1 + val2; }
Deterministic, but not pure:
public int Add(int val1, int val2) { Console.WriteLine(val1 + " " + val2); return val1 + val2; }
Now, deterministic methods can also not consume external state, because that would effect their functioning. But most definitions of deterministic functions deal simply with the fact that you can determine the output based on the input.
As Matt stated in his blog post, in memoization the need for a function to be deterministic is very important. This is clear because of the fact that if we have a setup of inputs and then we cache the set of outputs, we better hope that the function will return the same result for every call with those inputs, otherwise we just changes the methods behavior with our memoization.
Pure functions are important for many reasons, but number one being that it allows us to more easily reason about the behavior of a function. If a function is pure and has no side effects, then we are more likely to be able to reason about the performance of said functions, since we will have less variables to account for. You also cannot effectively make a non-pure function run in parallel without heavily analyzing its behavior. A pure function, on the other hand, should not be depending on anything other than the values you passed into it and the values that is holds. It should also not be changing any shared state, and therefore can be run in parallel without need for locking.
I hope that if you didn’t know what pure methods were before, that you now have a good idea of what they are. But I also hope that if you had any misconceptions like I did, that they are all cleared up now.
You know, it is truly awesome to learn something new, but it is even better to learn that something you thought you knew was wrong.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.
Thanks, Justin. The distinction between pure and deterministic functions is clear to me now.
— Kevin
There is another practical consequence to functions that do not cause any observable side-effects: It’s much easier to reason about their foreseeable failure paths.
For example, the Add function which writes to the console is subject to failure if the console is not available. This is not an exceptional condition in .NET because of the console’s nature (except perhaps in a finalization scenario when the environment is shutting down), but still it is a side effect that makes it difficult to reason about failure paths.
@Sasha I would consider the error paths to be part of the behavior of the method. So being able to reason about the behavior would include this, but I agree that this is a subtle point that people may not pick up on. Thank you for clarifying this for us!
Thanks for the post, it was a good read. So let me ask this question: can a function still be pure if you pass the I/O writing as a parameter?
public int Add(int val1, int val2, ILogger logger)
{
logger.Info(string.Format("{0} {1}", val1, val2));
return val1 + val2;
}
I assume not, but just curious…
@Brian Nope, because the whole idea of a "pure" method is to be able to reason about its behavior. Having it use any kind of external I/O, net access, etc… makes it very hard to reason about the execution of the method.