One of the small helper utilities I was called upon to create at work recently, was a "Retail Price Calculator". This simply takes a cost price, adds VAT (Value Added Tax) and incorporates any markup / multiplier to generate a retail price.
Now, we can look at this problem a number of ways. My first implementation simply required the formula to be stored step-by-step. For example, to enter the formula:
Cost * 1.175(VAT) * 2 + 30.50
Would be stored (in a settings file) as:
Cost * 1.175
Answer * 2
Answer + 30.50
This actually worked very well - the code to process each step is simple, as you have a binary operation for each step. The obvious problem, is that a programmer needs to maintain the formula database - requiring the end user to input a formula in this way is not good!
So, I needed a way of taking a normal, human-readable formula and converting it in a mathematically sound way.
This brings me to the point of the article. I decided that this problem must be fairly common and that if I wrote a small library class to do the job, it would probably come in handy again as a part of my programming toolkit. It has! While I'm sure that there are some very complex, well tested libraries out there to do this, bear in mind that a) The solution had to be lightweight and b) I do programming for fun!
Although I did this in C# and will probably write the article in a "C#" sort of way, the algorithm outlined basically works for any language where you have stacks (or can simulate a stack using another data structure).
As you would expect for this sort of thing, there are
going to be two stages for the algorithm. These
stages are:
1. Parsing - take our original, human readable
("infix") formula and convert it in to a format
that can be easily evaluated. In this case, we
will be using a pair of stacks in what is known
as the "Shunting Yard Algorithm" - you'll see
why later. The stacks put the formula in to
"Reverse Polish" notation - much easier for
a computer to evaluate.
2. Evaluation - take the stack created in step
1 and calculate the end result of the formula. This
may also require replacing variables (such as "VAT")
with values (such as 17.5%).
For basic mathematical operators, we can use the term "BODMAS" to remember precedence. This stands for "Brackets, Of, Divide, Multiply, Add, Subtract". Of is simply another term for multiply, included to make a nice word!
This means that if you have an expression containing brackets, an addition and a multiplication, you always calculate the values in the brackets first. Next, you perform the multiplication and finally the addition. If the addition occurs inside the brackets, it therefore occurs before any multiplications outside the brackets.
I mentioned that Reverse Polish Notation is easier for the
computer to evaluate, but what is it? Well, when you
or I write a mathematical expression, it will normally
be in "infix" form:
1 + 2 * 3
This does not take in to account operator precedence.
In Polish notation, this will look something like
the following:
+ * 2 3 1
An evaluator will initially find the plus. It looks
at the next token, and finds that first, it needs to
do a multiplication, which it does with the numbers
2 and 3. Finally, the last token is the number 1 which
is added to the result of the multiplication. This
is really simple - once you can do the parsing, you
almost get the evaluation for free!
This is, conceptually, the most tricky part to get your head around. Having said that, most programmers should be able to follow it. You should be familiar with the concept of "stacks", "pushing" and "popping" before continuing.
We need two stacks. Let's call the first stack the "Temporary Stack", used for temporary storage of expression operators. The second stack is the "Output Stack", which will end up containing our parsed expression. Note that the second stack could be another data structure of your choice, such as a queue or list. Both of my stacks store "String" data types, and are formally converted to the correct data types at the evaluation phase.
The shunting yard algorithm is very simple (in pseudo-C):
... while(tokens left to parse) { string token = GetNextToken(); // Get next token from original string if(token == a number) OutputStack.Push(token); else { if(Precedence(token) > Precedence(TemporaryStack.LastOperator)) TemporaryStack.Push(token); else { while(Precedence(token) <= Precedence(TemporaryStack.LastOperator)) { lastOperator = TemporaryStack.Pop(); OutputStack.Push(lastOperator); } TemporaryStack.Push(token); } } } while(TemporaryStack is not Empty) { lastOperator = TemporaryStack.Pop(); OutputStack.Push(lastOperator); } ...Get Code
My suggestion would be to write out an expression on paper and manually work it through the algorithm to see what happens. For an extension of this method to include brackets and functions, see this Wikipedia page.
Now you have the formula converted to Reverse Polish notation, the actual evaluation is extremely simple, and goes something like this:
int Evaluate(stack OutputStack) { String token = OutputStack.Pop(); if(token is a value) return token; else { // This sample only works for binary operators! // Recursion only works because the current token // has been popped already. int operand1 = Evaluate(OutputStack); int operand2 = Evaluate(OutputStack); if(token is '*') return (operand1 * operand2); else if (token is '+') return (operand 1 + operand2); else if (token is '-') return (operand 1 - operand2); else if (token is '/') return (operand 1 / operand2); } }Get Code
That's it! We recursively call the evaluation function on each operator token until the recursion comes to a natural conclusion and it is the final value which is returned. Once you get your head around the parsing, the evaluation could not be simpler.
My pseudocode above only works for binary operators and
with no brackets. Once you have the idea, it's relatively
simple to add support for other operators. The biggest
decision is how to store operator details: you'll need
to store left-to-right precedence and how many
operands the operator expects. Remember that while brackets
may be important for parsing, they do not appear in
the final parsed expression and therefore can be ignored
during evaluation.
I would suggest that if your chosen language supports it,
you should wrap all this up in a nice clean class.
Of course, you'll need some support functions - a tokeniser
among other things, but if you know your language, that shouldn't
be a problem. I would suggest that you replace variable names
at parse time, too - that way your parser is powerful
enough that variables can actually contain function expressions!
Also, bear in mind how accurate you want things. I would not
suggest following my lead and having an integer return type,
unless accuracy really doesn't matter. Keep it accurate and then
round at the end of the process if necessary.