Skip to content
Features

Editing audio in Linux

Looking for inexpensive alternatives to high-end, but expensive audio apps …

SpicyMcHaggis | 0
Story text

Linux beyond geekdom

Linux has always been a great operating system for programmers. Since the late 90s, however, there's been a big push to make Linux more attractive to people that don't have Mountain Dew addictions. Desktop environments have improved the user interface to the point where large institutions are rolling out Linux workstations, something that never would have happened with twm as the only desktop option. Meanwhile, the day-to-day tasks of business have been made doable by projects like OpenOffice and Mozilla.

Parallel to Linux's furthering of basic usability, there's been a lot of work put into high-end uses for Linux, like professional audio production. These applications require a certain amount of spiffy GUI and big helpings of performance and stability—which Linux is now more than ready to provide. Audio editing is also greatly helped by compatible hardware, which can give a more intuitive user interface and better performance. Unfortunately, Linux has always been plagued with hardware compatibility issues, and this is very evident when trying to do professional audio production.

Given Linux's strengths, weaknesses, history, and ideology, it's interesting to see where Free/Libre and Open Source Software (FOSS) competes well with proprietary software, where it falls behind, and where it provides novel innovation. The FOSS pro-level Digital Audio Workstation (DAW), Ardour, competes with industry-standard apps like ProTools, Logic, Nuendo, and Digital Performer. Audacity, on the other hand, is a more casual FOSS audio editor, but infuses the task with some distinctly geeky scripting facilities. SND, "modeled loosely after Emacs and an old, sorely-missed PDP-10 sound editor named Dpysnd," is a distinctly Linux audio app, complete with an ass-ugly interface, a mountainous learning curve, and the ability to wash your dishes if you know how to ask.

First things first: you don't know Jack

Meet Jack. Jack is where things start to get weird. Most operating systems, err, Windows and OS X, provide an invisible interface to your audio hardware. In these OSes audio applications output to a software mixer, which mixes the signals and streams it to the sound card. This approach is similar to how ESD works. Similarly, Jack is an audio daemon that sits between audio apps and ALSA. Where Jack differs is that, in the proud tradition of Linux, it is infinitely configurable. This allows lower latency and the ability to pipe any output to any input, like a rousing game of Twister, but with data. Ardour requires it. Audacity and SND can use it. Any ALSA program can use it via an ALSA plugin.

Technical details

For what it's worth, I'm looking at all this software on a Pentium 4 3.2GHz with a Creative Labs SB Audigy 2 and 1 GB of RAM. I played with a couple distros, including DeMuDi, but settled on my old favorite Debian testing (Etch). All of the applications I used were taken from the apt repositories, not custom compiled. The kernel (2.6.12) does not have realtime scheduling support built in, which is very popular with computer musicians. More on that later. Additionally, the hard drive is not tuned with hdparm, which is recommended for serious audio work.

Download the PDF
(This feature for Premier subscribers only.)

Ardour

Ardour attempts to stand next to the big boys of professional audio editing. Trying to compete with the likes of ProTools (which I'll use for comparison, since I'm more familiar with it than other DAWs) requires two main accomplishments: first, Ardour must be able to do everything the proprietary DAWs can do with comparable performance and ease of use. Secondly, and more speculatively, to become relevant Ardour must achieve market share; for any FOSS to be useful it has to be widely used. Otherwise, Ardour will not receive the community support required to keep such an ambitious project in development.

To get started, I looked at the Ardour manual, which is pretty well done, and should be, given that they hope to make money off it. A noticeable shortcoming was that there's no quick and dirty "here's how to edit and mix" for someone who's not a complete novice. Ideally, the GUI would make a certain amount of this obvious from the start, and Ardour does pretty well with that. Given that I used to use ProTools 5 quite a bit, it took me about 30 seconds to create a couple tracks and start editing. There are some notable novelties in Ardour's GUI, such as its very minimal menu bar, and heavy reliance on contextual menus. Ardour's approach to GUI takes a page from the Linux playbook (as well it should), and provides an interface that is extraordinarily quick and efficient, albeit a little difficult to learn at first (just take a look at the venerable Emacs and vi for other examples). For a Linux geek or an experienced audio producer this sort of interface is welcome.

Operations and performance were quick and snappy on my workstation, and left nothing to be desired. Normalizing a 45 minute interview took about 30 seconds. Throwing on some flange, pitch shifting, and heavy reverb pumped CPU usage up to a not-too-worrisome 10% during playback. As any Linux audio app should, Ardour uses the LADSPA (Linux Audio Developers' Simple Plugin API) plugin architecture. VST plugins may be used, but it can be hit and miss.


The requisite contrived screenshot of a busy session.
Notice the system performance meters in the upper right hand corner.

Ardour's plugin model highlights the generally modular FOSS approach to software. For example, Ardour doesn't offer any mastering and CD burning facilities, but instead easily interfaces with JAMin via Jack. Because FOSS is free, there's no impetus for developers to inhibit the use of third-party or competing software. The result of this modularity is much more capable software; rather than Ardour including a crappy mastering app, it leverages an existing project to do the job better. Importantly, this modularity provides a means to achieve radically more advanced software, since Ardour can interface with experimental and highly specialized software. An appropriate example is interfacing Ardour with Pure Data, a FOSS dataflow programming language that allows manipulation of audio in realtime.

Ardour isn't all sunshine and roses, however. (Here's where I try to get negative about something I am admittedly biased about.) Hardware is undoubtedly of primary concern to audio production, and Ardour, like every other piece of software ever written, doesn't support every piece of hardware ever made. Of concern are three types of hardware: the sound card, control surfaces, and specialized DSP hardware. Since Jack uses ALSA, any sound card that ALSA supports will work. ALSA has its own set of hardware support complications, but in general, support is pretty good and getting better all the time. As far as control surfaces go, MIDI surfaces that send transport controls over MMC are usable in Ardour. This includes a lot of hardware, though some fancier systems use more complicated protocols and aren't supported yet.

Support for dedicated DSP processors is somewhat controversial. A DSP processor is like a graphics card for audio—it can accelerate DSP operations, reducing the load on the main CPU. The problem here is that since DSP cards are such a niche market, the only ones available are proprietary add-ons for proprietary software. They use proprietary protocols, closed source, and are locked down to be used with only one piece of software, eg. ProTools. In the past, this has been a big issue, since mainboard CPUs could hardly handle the load of 32 tracks of audio, much less DSP in realtime on top of that. Applying an effect to a track required rewriting the track, which took expensive time and slowed down an otherwise fluid editing process. Nowadays, with an ordinary PC, dozens of tracks can be played and processed through filters in realtime without trouble. The argument against DSP hardware goes that with audio data becoming so easy to process, better results will be achieved by upgrading general purpose hardware, than by buying specialized hardware acceleration.

This is an interesting attitude, given the press for distributed processing models in the gaming world, what with video acceleration and now physics and AI hardware acceleration. Also interesting is a recent effort to utilize the GPU for DSP hardware acceleration. If such technology were to become commonly available, perhaps Ardour would come to rely on offboard DSP processing. Until then, the proprietary nature of available DSP hardware makes it highly unlikely that Ardour will support DSP cards.

If you can be convinced that commodity hardware will pack enough punch for your editing, Ardour delivers at the best cost of all: free. Free of monetary cost, free of guilt, and free as in freedom. But FOSS zealots have been trying to give away their completely serviceable software for decades without much success, and Ardour is no exception. Ardour may, arguably, be as good as any other DAW out there. However, learning a DAW can represent an enormous investment of time. As long as the industry continues to ignore apps like Ardour, that investment will turn away all but the most intrepid. Until it becomes practical for professionals to learn how to use an Ardour-based system, no one will care how amazing (or not) Ardour is.

Audacity

It's difficult to say where Audacity lies in relation to proprietary audio editors. It's not quite a DAW on par with ProTools or Nuendo. It's too versatile and feature laden to compare to the myriad of sub-US$100 editors out there. Additionally, the fact that Audacity can be scripted using a Lisp derivative called Nyquist, confirms the fact that Audacity is rather unlike anything available elsewhere.

For its opening act, Audacity dusts the various budget audio editors out there. With no limit on the number of tracks and up to 32bit, 96kHz audio, it's clear that Audacity isn't playing softball. To drive that point home, Audacity supports VST and LADSPA plugins, the latter of which gives a person lots of power and choice at no cost. Its editing performance is zesty, though not quite as speedy as Ardour. In short, it seems to have all the prerequisites of a perfectly capable audio editor. Its lack of a mixer, busses, and other useful session management tools makes it not quite a DAW, but it's fine for smaller projects.

A lot of focus seems to have been given to Audacity's GUI. Where Ardour seeks to make the GUI efficient, Audacity has sought to make it user friendly and easy to learn. Upon opening Audacity the transport controls are front, center, and almost overly large, allowing a novice to start recording with about the same ease as a handheld tape recorder. The rest of the GUI is clear, uncluttered, and unintimidating, allowing a shallow learning curve, perhaps at the expense of efficiency.

Audacity's user friendliness extends to its technical abilities. It allows one to open mp3, Ogg Vorbis, and various lossless audio files directly, as well as saving to those formats. Its suave looks are furthered by its snazzy looking spectrograms. And it comes packaged with a full complement of filters. Considering its Linux, OS X, and Windows support, Audacity is an audio editor that has a firm grasp on usability and appeal to the novice user.


Look at all them spectral components... purrrty.

But to title Audacity good for amateurs and to leave it at that would be an injustice. Audacity does have some features that would appeal to more hardcore types. In particular, I'm speaking of its use of Nyquist as a scripting language. Nyquist is a audio synthesis language based on Lisp and written by Roger B. Dannenburg. Using Nyquist, a user can write his or her own plugins, not unlike using Script-Fu and Perl-Fu in The GIMP. To do this, you place an ASCII file with the extension ".ny" in /usr/share/audacity/plug-ins (or elsewhere on Windows or OS X). For example, one of the plugins that comes with audacity looks like this

;nyquist plug-in
;version 1
;type process
;name "Delay..."
;action "Performing Delay Effect..."
;info "Demo effect for Nyquist by Roger Dannenberg.
;This effect creates a fixed number of echos."
;control decay "Decay amount" int "dB" 6 0 24
;control delay "Delay time" real "seconds" 0.5 0.0 5.0
;control count "Number of echos" int "times" 5 1 30

;; Note: this effect will use up memory proportional to
;; delay * count, since that many samples must be buffered
;; before the first block is freed.

(defun delays (s decay delay count)
  (if (= count 0) (cue s)
         (sim (cue s)
                        (loud decay (at delay (delays s decay delay (- count 1)))))))
(stretch-abs 1 (delays s (- 0 decay) delay count))

How cool is that? Nyquist could be used to help with repetitive tasks, or to create plugins the like of which have never been heard since Thomas Dolby.

A few months ago I was playing around with Nyquist apart from Audacity. The impression I got was that it is amazingly elegant, easy to use, intuitive, and fun, but dead. The mailing list gets about one post a month on average. Don't let that discourage you, though. With Audacity's new support and the Audacity-Nyquist mailing list, its quite possible that Nyquist is experiencing a renaissance. It appears that despite my personal experience there are quite a number of people developing plugins with Nyquist.

Nyquist could serve to open up a lot of possibilities for making Audacity more interesting to experienced users. For a casual editing Audacity is a pretty good meeting of capability and ease of use. Of course this achievement entails some compromises, and Audacity lacks the right features, performance, and user interface to be used for large production projects. Audacity makes up for this with some innovative features that can't be found in similarly accessible editors.

SND

When a sound editor seeks to emulate Emacs you know things are getting seriously odd. The question with SND is not whether its odd, but whether that's a good thing or a bad thing. In order to answer this question I'd like Emacs and vi users put aside their differences and realize that, compared to, say, MS Office, Emacs and vi have an awful lot in common. That being agreed upon, let's look at how SND, an audio editor with Mountain Dew running in its veins, can appeal to an audience wider than the rare programmer-gone-audio-producer.

First impressions are usually made by the GUI, barring a really horrendous installation procedure. SND's initial GUI is, shall we say, minimal.


You call that an interface?

Since the initial GUI is so, well, nonexistent, the obvious next step is to open a file. I was surprised that SND opened a 1GB wav file in less than a second, while other editors generally take a minute or two. In general, the minimal GUI does its utmost best to to get out of the way and display information quickly, efficiently, and with a marked lack of vector graphics zooming here and zagging there. However, where other decent editors can show spectrograms of the waveform, SND can show the spectral information in 3D using OpenGL. The general attitude of SND seems to be "oh, yeah, I can do that, and I can do it in 3D, and in realtime, and would you like a Mountain Dew with that?"


Now that's a spectrogram.
From Snd Tutorial by Dave Phillips

But eye candy, or lack thereof, is only first impressions—let's talk about second impressions. The interface, as you may have guessed from my mention of SND's desire to be like Emacs, is difficult to master. Likewise, it feels like a kid glove once you know it. To open a file, use C-x C-f. To move the cursor forward one sample use C-f, backwards is C-b. You get the picture.

SND's interface is indicative of a larger theme in SND, that being that it is a somewhat obscure tool that serves a rather obscure niche. SND can work well as a basic sound file editor, and once one is used to its interface, it can be much quicker and easier to use than something like Audacity. Using SND as a simple sound file editor, however, would be wasting its vast power and flexibility. On the other hand, SND, like Audacity, doesn't include many of the features that are common to DAWs, so it doesn't make sense for large audio production projects that Audacity would excel at.

What SND does have going for it is its scriptability and its integration with Common Lisp Music (CLM). Using Scheme, a user can control any part of SND, as one can use Lisp to script any function of Emacs. Furthermore, by invoking CLM, a user can synthesize sounds using one of the most advanced computer music languages available. The attraction of programming SND is twofold: on the one hand, it can make repetitive tasks much easier. On the other hand, it allows algorithmic, even intelligent (Lisp is known for AI applications) manipulation of audio.

Recently, SND added support for realtime scheduling (RTS), which was a big deal to some people like me. Basically, as I understand it, what RTS does is guarantees that audio will be processed in a very short amount of time, allowing one to use said audio for live performance. This is in contrast to systems where audio either needs to be rendered, which can take hours, or has no guaranteed latency (Nyquist). With this new addition to SND, not only can Scheme be used to script and access CLM, but it can be done on the fly. This elevates SND from an audio editor that can do an awful lot with an interface that (only) appeals to Linux geeks, to a tool for live DJ-ing, composition, and other music making activities. The fact that SND is tied into CLM creates a world of possibilities for live performance using advanced AI techniques.

As an audio editor, SND delivers to the geek crowd that likes obscure shortcut keys. For anyone else (i.e. almost everyone), Audacity has a similar box of tricks and an enormously friendlier interface. As far as heavy duty traditional audio production, SND takes another strike, with its complete lack of tools to manage a large session--though you may find its keyboard interface useful for large projects. However, if you're interested in an audio editor that's also a barely disguised interface to a powerful audio programming language (and who isn't?), SND ain't but another word for Hennessey.

What makes FOSS special when it comes to audio editing?

Ardour is relatively easy to compare to proprietary DAW offerings because of its developers' intention to make it similar to proprietary DAWs. Audacity, and even moreso SND, are difficult to compare because they don't have proprietary analogues. Editing audio in Linux may be becoming a significantly different occupation than editing using proprietary platforms.

When editing in a proprietary environment, a choice of software to a large extent determines possible choices of hardware and vice versa. Once a platform has been chosen it defines what can be done and how it will be done, because the code is finished and closed. To a great many people, indeed the vast majority, this is the preferred method to work because it is convenient. A closed system is built to perform a closed set of tasks, and (hopefully) it performs those tasks with alacrity. If all I want to do is edit an interview, ProTools provides a solution out of the box. The interface is familiar and installation requires no expertise, so I can focus on producing content and nothing else.

An open system is fundamentally different. FOSS software is, almost by definition, a work in process. If Ardour doesn't have a feature I need, I can code it myself. With this possibility, the software no longer defines what I can do—it's just a point of departure. Of course, to realize this ideal, expertise is required beyond the scope of audio editing. Of more interest to the nonprogrammer is the way that the openness of software can influence its development; the openness of FOSS applications goes beyond their code base to their design and architecture. Jack's endless configurability, Ardour's modularity, and Audacity and SND's scriptability are all examples of open architecture that allow both usable and infinitely extendable applications. A user can edit sound files in SND with their mouse just fine, but as they learn more the software will not hold them back.

On proprietary platforms, eventually you'll run into "you can't do that." On open platforms, you'll run into "you have to learn more to do that." This difference so permeates Ardour, Audacity and SND that it's difficult to compare them to more traditional applications. Yes, they can each perform well for editing audio projects of varying scales. On the other hand, they may, depending on what version you're using, be buggy, unstable, or lacking features. If you want to use their more advanced features, it may be necessary for you to do a little research. If all you need is to produce audio, proprietary solutions work out of the box in a way that FOSS can't always compete with. Open code influences the architecture of applications, and ultimately the ways and kinds of work done with them.

If you're a professional audio producer, it may not be attractive to you to learn Ardour, since it's highly unlikely you'll ever need to know it for a job. If you're looking at setting up a personal studio, Ardour's lack of a price tag may be attractive and worth the learning curve. People looking to start learning audio production could use Ardour to learn the fundamentals, and if you're not concerned about having a marketable skill-set, may be good as a long term solution.

The argument in favor of Audacity is a little easier, since it's designed to be easy to learn. It's a great solution for easy edits, mixing and crossfading mp3s, and doing field recordings with a laptop. Audacity can also serve as an introduction to basic tasks of editing, though you may quickly find yourself wanting a mixer like Ardour's.

SND provides a lot of artistic potential and loads of geek-cred, with an appropriately high requirement of expertise. SND doesn't really have too many facilities for regular editing that aren't supplied by Ardour or Audacity. On the other hand, SND does have a totally unique set of abilities that could be very interesting and fun to learn, though perhaps not too practical outside of art and academia.

0 Comments

Comments are closed.