Soon after the rollout of our new audio engine, a few of our users reported some performance problems. We believe we have fixed this performance problem, but the problem itself was actually a rather bizarre bug that seems to be an Apple performance problem. So, I thought it might make for an interesting blog post.
The performance problem was not actually caused by audio itself, but was an implementation detail of how things were implemented. Basically, there is a background worker thread that is used to periodically monitor the OpenAL state to find out if sounds have finished playing, if more buffers need to be queued from streaming, and other state management stuff. It is often the case that this background thread doesn’t need to do much work. So to be nice to the CPU, we call usleep() which tells the thread to sleep for a specified amount of time. On other platforms like Windows, Linux, FreeBSD, and yes, even Mac OS X, this yields the thread and lets other threads run and generally helps reduce CPU utilization.
But it appears that on iOS, usleep() is consuming a large amount of processor time. Paradoxically, our very attempt to improve performance and lower CPU utilization resulted in the exact opposite effect.
We initially didn’t catch this problem in testing because the applications we tested against didn’t consume enough resources. As long as you have enough spare CPU, your performance will be fine. But if your app was in a certain ‘sweet zone’ of CPU utilization, say 90% utilization for example, you would start seeing horrendous stuttering in your app. This is why only some users were directly affected by this problem.
If you look at the following picture, it is a performance trace using Apple’s profiler tool called Instruments. In this example, I did a trace of our new featured demo program “Ghosts vs. Monsters.”
The highlighted zone covers the section of normal gameplay (after the game has started and loaded). You’ll notice that usleep is taking the most time of our entire program. But because Ghost vs. Monsters is actually so efficient, we still have plenty of CPU to spare, so we never actually notice any performance problems while playing it. In an app that was maxing out CPU (not pictured), I saw that usleep was consistently in the top 3 functions and it was registering about 8% utilization. That means for an app that was already in the 90% utilization range, usleep would push it over the edge.
So we needed to find a workaround for this. The solution we ended up selecting was to lower the pthread priority level so the thread scheduler would invoke this thread less often. But we learned a few other things about pthread scheduling on iOS. First, it seems that the default thread priority is actually the highest (31). Second, lowering the thread priority seemed to make little or no difference until we set the absolute lowest priority value (0). This seems like a bug too.
But once we did this, performance seems to have been restored. Below is a profile of a high CPU utilization app. It lives in that ‘sweet spot’ I talked about earlier. Basically, we don’t want the background thread to consume a significant chunk of CPU that would push it beyond 100% utilization and start dropping frames. As you can see in the following picture, the CPU utilization remains just below max (we’re not seeing stuttering/frame drops). The other thing to notice is that the usleep utilization is now down to 0.8%. For comparison, before this fix, usleep was taking about 8% of the time and was experiencing CPU overutilization.
Even though we worked around this problem, I am hoping Apple might improve this. I have filed Bug #8857757 on the Apple Bug Reporter and mirrored it at Open Radar.