Hi everyone,
A common question that comes up in these forums over and over has to do with recording latency, audio drivers, and device formats. I'm going to provide a brief overview of the different types of devices, how they interface with the computer and Audition, and steps to maximize performance and minimize the latency inherent in computer audio.
First, a few definitions:
Monitoring: listening to existing audio while simultaneously recording new audio.
Sample: The value of each individual bit of audio digitized by the audio device. Typically, the audio device measures the incoming signal 44,100 or 48,000 times every second.
Buffer Size: The "bucket" where samples are placed before being passed to the destination. An audio application will collect a buffers-worth of samples before feeding it to the audio device for playback. An audio device will collect a buffers-worth of samples before feeding it to the audio device when recording. Buffers are typically measured in Samples (command values being 64, 128, 512, 1024, 2048...) or milliseconds which is simply a calculation based on the device sample rate and buffer size.
Latency: The time span that occurs between providing an input signal into an audio device (through a microphone, keyboard, guitar input, etc) and when each buffers-worth of that signal is provided to the audio application. It also refers to the other direction, where the output audio signal is sent from the audio application to the audio device for playback. When recording while monitoring, the overall perceived latency can often be double the device buffer size.
ASIO, MME, CoreAudio: These are audio driver models, which simply specify the manner in which an audio application and audio device communicate. Apple Mac systems use CoreAudio almost exclusively which provides for low buffer sizes and the ability to mix and match different devices (called an Aggregate Device.) MME and ASIO are mostly Windows-exclusive driver models, and provide different methods of communicating between application and device. MME drivers allow the operating system itself to act as a go-between and are generally slower as they rely upon higher buffer sizes and have to pass through multiple processes on the computer before being sent to the audio device. ASIO drivers provide an audio application direct communication with the hardware, bypassing the operating system. This allows for much lower latency while being limited in an applications ability to access multiple devices simultaneously, or share a device channel with another application.
Dropouts: Missing audio data as a result of being unable to process an audio stream fast enough to keep up with the buffer size. Generally, dropouts occur when an audio application cannot process effects and mix tracks together quickly enough to fill the device buffer, or when the audio device is trying to send audio data to the application more quickly than it can handle it. (Remember when Lucy and Ethel were working at the chocolate factory and the machine sped up to the point where they were dropping chocolates all over the place? Pretend the chocolates were samples, Lucy and Ethel were the audio application, and the chocolate machine is the audio device/driver, and you'll have a pretty good visualization of how this works.)
Typically, latency is not a problem if you're simply playing back existing audio (you might experience a very slight delay between pressing PLAY and when audio is heard through your speakers) or recording to disk without monitoring existing audio tracks since precise timing is not crucial in these conditions. However, when trying to play along with a drum track, or sing a harmony to an existing track, or overdub narration to a video, latency becomes a factor since our ears are far more sensitive to timing issues than our other senses. If a bass guitar track is not precisely aligned with the drums, it quickly sounds sloppy. Therefore, we need to attempt to reduce latency as much as possible for these situations. If we simply set our Buffer Size parameter as low as it will go, we're likely to experience dropouts - especially if we have some tracks configured with audio effects which require additional processing and contribute their own latency to the chain. Dropouts are annoying but not destructive during playback, but if dropouts occur on the recording stream, it means you're losing data and your recording will never sound right - the data is simply lost. Obviously, this is not good.
Latency under 40ms is generally considered within the range of reasonable for recording. Some folks can hear even this and it affects their ability to play, but most people find this unnoticeable or tolerable. We can calculate our approximate desired buffer size with this formula:
(Sample per second / 1000) * Desired Latency
So, if we are recording at 44,100 Hz and we are aiming for 20ms latency: 44100 / 1000 * 20 = 882 samples. Most audio devices do not allow arbitrary buffer sizes but offer an array of choices, so we would select the closest option. The device I'm using right now offers 512 and 1024 samples as the closest available buffer sizes, so I would select 512 first and see how this performs. If my session has a lot of tracks and/or several effects, I might need to bump this up to 1024 if I experience dropouts.
Now that we hopefully have a pretty firm understanding of what constitutes latency and under what circumstances it is undesirable, let's take a look at how we can reduce it for our needs. You may find that you continue to experience dropouts at a buffer size of 1024 but that raising it to larger options introduces too much latency for your needs. So we need to determine what we can do to reduce our overhead in order to have quality playback and recording at this buffer size.
Effects: A common cause of playback latency is the use of effects. As your audio stream passes through an effect, it takes time for the computer to perform the calculations to modify that signal. Each effect in a chain introduces its own amount of latency before the chunk of audio even reaches the point where the audio application passes it to the audio device and starts to fill up the buffer. Audition and other DAWs attempt to address this through "latency compensation" routines which introduce a bit more latency when you first press play as they process several seconds of audio ahead of time before beginning to stream those chunks to the audio driver. In some cases, however, the effects may be so intensive that the CPU simply isn't processing the math fast enough. With Audition, you can "freeze" or pre-render these tracks by clicking the small lightning bolt button visible in the Effects Rack with that track selected. This performs a background render of that track, which automatically updates if you make any changes to the track or effect parameters, so that instead of calculating all those changes on-the-fly, it simply needs to stream back a plain old audio file which requires much fewer system resources. You may also choose to disable certain effects, or temporarily replace them with alternatives which may not sound exactly like what you want for your final mix, but which adequately simulate the desired effect for the purpose of recording. (You might replace the CPU-intensive Full Reverb effect with the lightweight Studio Reverb effect, for example. Full Reverb effect is mathematically far more accurate and realistic, but Studio Reverb can provide that quick "body" you might want when monitoring vocals, for example.) You can also just disable the effects for a track or clip while recording, and turn them on later.
Device and Driver Options: Different devices may have wildly different performance at the same buffer size and with the same session. Audio devices designed primarily for gaming are less likely to perform well at low buffer sizes as those designed for music production, for example. Even if the hardware performs the same, the driver mode may be a source of latency. ASIO is almost always faster than MME, though many device manufacturers do not supply an ASIO driver. The use of third-party, device-agnostic drivers, such as ASIO4ALL (www.asio4all.com) allow you to wrap an MME-only device inside a faux-ASIO shell. The audio application believes it's speaking to an ASIO driver, and ASIO4ALL has been streamlined to work more quickly with the MME device, or even to allow you to use different inputs and outputs on separate devices which ASIO would otherwise prevent.
We also now see more USB microphone devices which are input-only audio devices that generally use a generic Windows driver and, with a few exceptions, rarely offer native ASIO support. USB microphones generally require a higher buffer size as they are primarily designed for recording in cases where monitoring is unimportant. When attempting to record via a USB microphone and monitor via a separate audio device, you're more likely to run into issues where the two devices are not synchronized or drift apart after some time. (The ugly secret of many device manufacturers is that they rarely operate at EXACTLY the sample rate specified. The difference between 44,100 and 44,118 Hz is negligible when listening to audio, but when trying to precisely synchronize to a track recorded AT 44,100, the difference adds up over time and what sounded in sync for the first minute will be wildly off-beat several minutes later.) You are almost always going to have better sync and performance with a standard microphone connected to the same device you're using for playback, and for serious recording, this is the best practice. If USB microphones are your only option, then I would recommend making certain you purchase a high-quality one and have an equally high-quality playback device. Attempt to match the buffer sizes and sample rates as closely as possible, and consider using a higher buffer size and correcting the latency post-recording. (One method of doing this is to have a click or clap at the beginning of your session and make sure this is recorded by your USB microphone. After you finish your recording, you can visually line up the click in the recorded track with the click in the original track by moving your clip backwards in the timeline. This is not the most efficient method, but this alignment is the reason you see the clapboards in behind-the-scenes filmmaking footage.)
Other Hardware: Other hardware in your computer plays a role in the ability to feed or store audio data quickly. CPUs are so fast, and with multiple cores, capable of spreading the load so often the bottleneck for good performance - especially at high sample rates - tends to be your hard drive or storage media. It is highly recommended that you configure your temporary files location, and session/recording location, to a physical drive that is NOT the same as you have your operating system installed. Audition and other DAWs have absolutely no control over what Windows or OS X may decide to do at any given time and if your antivirus software or system file indexer decides it's time to start churning away at your hard drive at the same time that you're recording your magnum opus, you raise the likelihood of losing some of that performance. (In fact, it's a good idea to disable all non-essential applications and internet connections while recording to reduce the likelihood of external interference.) If you're going to be recording multiple tracks at once, it's a good idea to purchase the fastest hard drive your budget allows. Most cheap drives spin around 5400 rpm, which is fine for general use cases but does not allow for the fast read, write, and seek operations the drive needs to do when recording and playing back from multiple files simultaneously. 7200 RPM drives perform much better, and even faster options are available. While fragmentation is less of a problem on OS X systems, you'll want to frequently defragment your drive on Windows frequently - this process realigns all the blocks of your files so they're grouped together. As you write and delete files, pieces of each tend to get placed in the first location that has room. This ends up creating lots of gaps or splitting files up all over the disk. The act of reading or writing to these spread out areas cause the operation to take significantly longer than it needs to and can contribute to glitches in playback or loss of data when recording.