If you want to use Linux audio for professional work, but have been using Windows up until now, then there are some important differences you need to know about.
This article will help explain the basics of Linux audio, and it will do so by directly comparing each part of the system to the equivalent in Windows. This is to make it easy for musicians and sound engineers who understand Windows well to switch to Linux and get back their fun and freedom.
So sit back, relax, and get ready to learn:
- Exactly what an audio framework is;
- The key differences between the way Linux and Windows handle audio;
- How Window’s WASAPI and ASIO compare with Linux’s ALSA and JACK.
What is an audio framework?
Operating systems are complex, so even something as seemingly simple as playing back or generating a sound can have multiple layers of applications talking to each other.
In software a framework can be thought of as the scaffolding that you build the actual application around. It can also be thought of as an application, or set of applications, that do nothing until you add the functionality. The audio framework does nothing until another application tells it to play a sound, then it springs into life.
This diagram below, is a simplified diagram of the current Linux audio framework (often referred to by the colloquialism “stack”.)
We deliberately missed out that the App can actually talk to ALSA directly, as very few do as they would lose all the benefits of using JACK or Pulse Audio.
What are these mysterious things inside the penguin pyramid? Don’t worry, all will soon be revealed.
Why do we need audio frameworks?
To make your computer actually generate sound requires a huge amount of code, and every program that wants sound does not want to have to write it all from scratch.
To solve this problem the operating system provides a sound framework that all the applications can share.
When an application wants to transmit sound out of your speakers, it tells the framework to do the work, and if things go well the sound will pop out of your speakers and into your ears.
This diagram shows what would happen if there was no audio framework.
Windows and Linux have totally different audio frameworks. For professional audio we want:
- Low latency. As small a delay in processing as possible;
- “Bit perfect”. No altering the volume, sample rate, or dithering the audio without our permission.
Both operating systems are able to provide this, but it is not the default as most general users are not interested in these features, they probably just want to play games or surf the web.
It’s easier for them if the operating system can deal with all the technical stuff like sample rates without bothering them.
What does your audio interface’s hardware driver do in the audio framework?
We have talked about audio frameworks, but what about the actual drivers? You may never have heard about audio frameworks before, but only about drivers.
Hardware drivers are just part of the audio framework. They are treated very differently on Windows and Linux.
For your hardware to talk to the computer it needs a special middleman called a driver. This driver tells the computer exactly what your sound card can do, how many input and outputs it has, and very technical things you would probably die of boredom if you tried to understand.
There are generic drivers that can work with many different audio interfaces, but pro audio interfaces tend to have all sorts of special features that will need a custom driver.
If you want good low latency crash free performance, then the interface needs a good driver written specially for the operating system you are using.
From a users perspective, without knowing about the audio framework, the biggest change going from Windows to Linux is that there is no need to download audio drivers any more.
In Windows you download ASIO/WASAPI drivers which talk directly to your applications
To use a professional audio interface in Windows you will need to download the drivers from the manufacturers website and install them. These will almost certainly include an ASIO driver.
This type of special driver created by Steinberg will bypass the operating system as much as is possible and enable low-latency high-quality sound.
There will probably be another driver included for general Windows use. This is for software that does not support ASIO, and it will probably be Windows Core Audio which uses WASAPI.
This is the current type of driver used in Windows 10, and supersedes MME (1991), DirectSound (1995), and WDM (1998).
Windows Core Audio can also bypass the operating system as much as is possible, and get low latency in “exclusive” mode, but ASIO is still the most widely used pro audio standard.
In Linux audio the driver is built into ALSA which talks to JACK or Pulse Audio
In Linux all the hardware drivers are built into the Kernel and supplied out of the box (with a few exceptions like propriety third-party graphics drivers.) There is nothing to download.
The audio interface drivers nearly all live somewhere in the Kernel called ALSA (Advanced Linux Sound Architecture).
This magical place does more than just store all the drivers, it also has software and an API (special language computers speak to each other in) to communicate with the software above it.
Audio interface compatibility with Linux
The only thing the user needs to be concerned with really is getting an audio interface that has an ALSA sound driver in the Kernel. In the past, this has been a problem, but not so much today.
The great news is that something called USB Audio Class 2.0 compliant interfaces have become very common. This is a universal standard that means compatible interfaces will work out of the box in Linux.
Manufacturers who make their devices USB Audio Class 2.0 compliant to work on a Mac/iPad also automatically make it work on Linux, and nobody has to write any extra drivers for the Linux Kernel.
Here is a useful list of some USB Class 2.0 Compliant audio interfaces, but this is not completely up to date. The Linux musicians forum hardware section is also a good resource.
Pulse Audio and JACK explained
A big difference from Windows is that in Linux there are other layers above the hardware drivers (in ALSA) we need to know about. These layers are called Pulse Audio and JACK.
We need to understand they do different things that will either benefit using your computer as an audio workstation with DAW software, or as a general purpose computer watching YouTube and things like that.
Pulse Audio and JACK are technically “sound servers”, they can send and receive many audio channels to and from different applications.
Any application that wants to use sound will need to choose to communicate with ALSA directly, Pulse Audio or JACK to be heard.
99% of modern general applications will use Pulse Audio, and the remaining specialized professional audio ones will use JACK.
Pulse Audio, the equivalent of Windows WASAPI
Pulse Audio can be thought of as the mainstream standard for normal everyday software. It communicates directly with your hardware’s ALSA driver, and does all the stuff the operating system needs it to do.
Pulse Audio can mix the audio outputs of different software playing together, it can convert the sample rates, and it can talk in a friendly and easy to understand way to software.
Developers can make software that talks to Pulse Audio and they will know it is supported in all the Linux distributions out of the box.
JACK, the equivalent of Windows ASIO
JACK is made for professional audio applications like DAWs. It also communicates directly with your hardware’s ALSA driver.
It is usually not installed in Linux by default, but you can download it from your distribution’s software repository. It is built for low latency.
Only pro audio applications use JACK, so you will still need Pulse Audio for your time wasting YouTube binges on Firefox or Chrome.
So you now know:
- An audio framework can be thought of as scaffolding that lets applications send and receive sound from your computer;
- We need frameworks so the developer of an application does not have to write their own code just to send and receive sound;
- Our framework needs to be low latency and “bit perfect” for profession use;
- The hardware driver is the middleman between your audio interface and the audio framework in your operating system;
- In Windows you use ASIO drivers for professional audio which don’t need an extra sound server on top of them;
- In Linux you use JACK sound server for professional audio. This sits on top of ALSA which contains your hardware driver.
Things have changed a lot in the last years. Much crusty and pointless old software has been consigned to the recycle bin of history.
Pulse Audio became dominant, people hated it because it was full of bugs, but now it is all fixed and working well. JACK has remained the best and only choice for pro audio on Linux.
If you read online that Linux audio is a fragmented joke, refer them to this article.
Audio on Linux has never been better! Just plug in your USB Audio Class 2.0 compliant audio interface, load up your DAW using JACK, and get jamming and mixing!