Updated benchmarks: Flash 10 Release Player
February 7th, 2010
Since I had my benchmark code already warmed up from Friday and a photoshop doc open with my histograms, I figured I should make good on the updated speed tests that I promised a year ago.
Here is basic math, receiving a huge boost between debug and release modes. The shaded bar is the duration of the call, so shorter means faster execution.

From here down, everything is running 10m iterations. The scale of these graphs is 560ms. Here's the difference in calling a static public variable, versus a local copy of same.

Finally, replacing the Math helper methods with some logic and basic math:








These stack up pretty well with what I saw in the debug player a year ago (although there is an overall speed increase in all the tests).
If anyone is curious, there is a slowdown in calculating powers of a number (i * i * i * i) as you have to reference that variable over and over - for me the break-even between stringing on more 'i's and just calling Math.pow() was around 60.
AS3 Math Optimization – Math.random()
February 5th, 2010
I was messing around with some bitmap code this evening and I came across a performance bottleneck with some repeated calls to Math.random(). Since I haven't posted anything in a while, I decided to do some benchmarking and try out an alternate (psuedo) random number generator.
I chose the XOR algorithm, primarily because AS3 is lightning-quick with bit operations, and the algorithm is only 4 lines of code. Here are two versions I tried:
It's worth noting that this algorithm works from a seed number. For my purposes, picking one random seed and iterating is fine, but this function is predictable enough that you wouldn't want to use this method for crypto or anything where you'd want "very random" data sets. This also could be useful in some situations where you'd want to replay a set of random numbers - such as generating enemies or levels in a game and then creating a replay by saving the seed number.
For my benchmark, I ran 25 sets of 10 million iterations on each function, using the 10.0 release player in Safari.

Interestingly, the logic to gate negative numbers in the signed-int version was enough to push it out quite a bit past the uint version. All told, the uint algorithm runs in just under one fourth the time it takes to call Math.random().
As always, Wikipedia will tell you as much as you want to know about the ins and outs of this algorithm and how it stacks up against other generators. And, just for fun, here's a frequency graph of the output.
AS3 Math Optimization – int is the new floor()
October 22nd, 2008
UPDATE: The benchmarks on this thread were taken in the Flash 9 Debug player (which includes some crazy bloating) - I've posted an updated set of tests here using the Flash 10 Release player, which should give you a better picture of real-world savings for the majority of web users.
In case you missed my earlier post, I've had a raging Flash-on for optimization lately, and today I'm jumping into the Math class. But first! A little warmup with operators.
I've often heard that addition is fastest, followed by subtraction and multiplication, with division being the slowest but I'd never seen any proof so I threw together my own test:

Way to go subtraction, almost 12% faster than addition! Ok, on to the good stuff. From here on down I'm running 1,000,000 iterations per test, the same test code I used previously, and compiling each test individually about five times (or until I'm satisfied with an average time).
Alright - the Math class. First and foremost this class holds some handy constants so my human meat-brain doesn't have to. The downside is that the AVM doesn't play well with static properties of other classes so there's a pretty significant slowdown. Instead, just retrieve the constant from Math once and store it in a locally scoped variable.

While a lot of the Math methods are invaluable (Math.atan2 == ♥), there are also a few that aren't as complex and slow your swf down unnecessarily. A big one that's easy to work around is the pow method – unless you're doing some heavy lifting with it, you'll see a good boost by keying out your formula by hand:


Or you can replace costly comparison calls with a little logic:



Going a step further, you can take advantage of int casting to shear off any significant digits to the right of the decimal.

By substituting int as a super fast round-down, it's an easy jump to true rounding:

UPDATE: A lot of other people have posted that you can recreate Math.ceil() by using int(i)+1, which never rang true to me – it only works on non-integers. I finally came around to a fix using modulus:
(i % 1) ? int(i) + 1 : i;
I don't have a fancy graph for this one but I benched it around 75% faster than Math.ceil.
My last trick is a monstrous, ugly, hack that I'm hiding over on the code page, it's an approximation for computing sine (which means you can also use it for co-sine by adding π/2 radians). Interestingly, this method of calculating the sine of an angle is actually less efficient and less accurate than the built in method, but the lag from calling the static method pushes Math.sin over the top.

Next: I'm hoping to give arrays a little loving in the not too distant future...if I ever get some free time that is. In the meantime – steer clear of unshift!
Further reading:
Joa Ebert on optimization
Speed tests over on OS Flash
Bitwise gems at Polygonal
Loops!
October 3rd, 2008
I've been thinking about AS3 optimization lately so I dusted off the old benchmark code and started speed testing some things. I meant to put together a post filled with exciting optimizations and tricks, but I got bogged down almost immediately in loop code.
When I started getting into it, I was somewhat surprised by how many ways there are to write a basic for loop. Combined with the different number types, that grows into a huge amount of variation in the way people can write code, so it follows that some methods must be better than others. I broke it down into four different structures:
- Undefined variable, count up to ten.
- Variable set to 10, negative increment to zero.
- Variable set to 0, increment to ten.
- Variable set to 10, decrement to zero.

These results are a little odd…let's look at what's going on. First of all, the obvious one: int is fast, uint is slow, and untyped is downright painful. So int it is, now on to technique.
The first and third test appear to be the same code, but there's a pretty large discrepancy between them at runtime. In fact the first test always wins by a visible margin. The difference between the first technique and the other three, is that the first test doesn't need an extra trip to memory to manually assign value to the newly instanced variable.
You may also be wondering about the negative increment in the second test – this is actually two tricks in one (and for a long time, I believed this to be the fastest execution). Even using a negative value, the addition operator takes about ¾ the time of the decrement (j--) operator. On top of that, looping backwards from a high value to zero, means you don't need to run a comparison, you can just cast the variable directly to a boolean.
All the rest of the code should be pretty pedestrian. There is one little snag in the untyped results but I suspect that's coming from weirdness in the JIT when it has to continually recast those values, and it's worth noting that uint really doesn't like the increment/decrement operators. And lastly, if you're wondering, I didn't test while loops here because both while and for loops are compiled into the same code – as far as the player is concerned, there is no difference, so use whichever you prefer.
UPDATE: Ok this is borked. In the middle of the night it came to me (in a vision?) – since the variable is never defined in the first test, maybe the JIT is just jumping the rails and never going through that loop at all? That would certainly explain the speed increase (better than my feeble explanation earlier).
So testing should have been an easy iterator variable - testing a loop that counts to 10, over 1 million iterations should trace out 10 million. Somehow the undefined loop code is so extremely broken that it's actually leaking out into the scope above! When I tested I got back 10...implying that the test code actually broke my benchmark harness. I'm still trying to get to the bottom of this but in the meantime, just go with method #2.

