Johannes Goebel - Listening and Hearing

Translated version (2015) of the original Zuhören und Hören – auch im Zeitalter audiovisueller Medien (1993), published in Johannes Goebel, Computer : Musik : Ästhetik (Schott, Mainz, 2006). The original German title alluded to “listening and hearing” being also of importance in the age of audio-visual media. It should also be mentioned that in German hearing (hören) and listening (zuhören) are only differentiated by the prefix “zu” (to).


Considering differences between hearing and listening may lead to a particular perspective on the developments in media technology over the past 100 years. The technical reproduction of acoustic events, electronic processing, and synthesis of sounds affect all of us: listeners, composers, sound artists, musicians, producers, and media companies.

The differentiation between hearing and listening I suggest may seem arbitrary. It may not align with a colloquial use of these words, nor is it scientifically “clean.” At the center of my considerations are two different attitudes in how we deal with what our ears absorb. I have assigned these two attitudes to hearing and listening.

The following thoughts about hearing and listening may allow perspectives on forms and contexts of acoustic events — how we perceive such events; which acoustic materials are used by an artist; and which consequences in perception are inherent to just the sound itself. And the specific acoustic conditions of media technology and media industry may be looked at under this perspective as well. And on a very personal level, the differentiation of hearing and listening may offer the opportunity to review one’s own habits.


I. Hearing and Listening: The Hearer and the Listener

I have a question and someone lends me her ear. I speak, another person listens. Every now and then she says something, which has to do with what I just said. She adds more. I listen, and then I continue. Both of us continue. The point of departure was my initial concern, but the destination is still unknown.

I would have noticed if the other person were not inclined towards what I said, if she was not listening. I would have noticed if she only heard me, if she had stopped listening.

Would she only hear me, she would be absent for me. She would be centered on herself and would pick up a key word from what I spoke to immediately talk about herself. And then it would be my turn again, following my own thread, and then she again, following hers. So we would hear and speak. And when we go our own way, each one of us would — after a while when the noise of the words had faded away — ask ourselves, “Wait — what did we talk about?”

Listening means reciprocity, hearing is always one-sided.

There is no listening without hearing. Listening starts after hearing.

Hearing starts immediately, when some acoustical event meets our ear. The ear as sensory organ begins processing the signal right away. It orders and groups the frequencies bombarding it, transforms them into signals appropriate for the nerve pathways, and sends them to the brain. The physiological basis for this chain of processing embodies what humanity adapted to and what humanity adopted as being important in what we call the “acoustical domain.” Embodied is to be understood quite literally: The ear as organ is the corporeal manifestation of what humanity deemed important in the acoustical domain, what served her best. The bandwidth for loudness perception, the preference given to certain ranges of frequencies and the different abilities to discriminate within such frequency bands, the ability to differentiate different sound colors (timbre) — for all this we bring the prerequisites to differentiate when we are born.

As we grow old, we loose the capability to hear the very high frequencies, and we can become hard of hearing. Together we listen to the night and I say “Do you hear the cricket over there in the grass?” but you can no longer hear its song.

The ability to listen, however, begins only beyond the ear. We may hope that we become better listeners with increasing age. A sudden, instantaneous hearing loss may show that it also can work in the opposite direction: For an individual the physiological capability to hear may break down, because of what happens beyond the ear, the listening and what it evokes, becomes unbearable. We quite often dull and deaden our senses, for just a moment or forever, since we cannot find a way to deal with the received impressions.

Hearing refers to a small and limited window of time. All sounds meet the ear simultaneously in a “single wave.” In a given situation, traffic noise, bird songs, jackhammers, espresso machines, people talking are combined into one single wave of air. The bird song does not reach the ear separately from the person talking next to me — they are riding with each other in one stream of air molecules as they reach the ear. And the hearing person filters from this one wave that combines all those simultaneously sounding events the one event of specific interest in that moment.

When we say that we hear something, we reference a very small time-window, which allows us to come to a quick interpretation of what we hear. Is the sound coming from a mouse or from a burglar? Since we can’t always decide right away what it might be, we hear a little more. We may not have enough time to listen if we have to react quickly.

Hearing aims at a very quick assessment, since we can hopefully react in time if need be. I hear an aggressively barking dog behind me and at the same time a car coming down the street at high speed in front of me. If I step back, I may get bitten; if I run forward, the car will hit me. So I stand still.

Hearing may even bring about events that have already passed back to the level of consciousness after we have heard them. We hear a certain sound and can search in our memory and determine the situation from which we remember it. Or we can revive a complex situation in front our inner eye and ear, and then we can say, “Oh yes, there, in the distance, I did hear a train go by.”

Listening moves within another frame of time and memory. Listening has hearing as a prerequisite, but it has left the specific conditions of hearing behind and found its own terms. Listening extends over a longer period of time in a conscious, focused, and inclined way. Together we listen to an orchestra piece. I ask, “Do you hear the French horn?” You say no and you start to search for the sound color, the timbre, of the French horn. And after a while of listening, you say, “But the melody sounds so sad.”

When we say we hear something in particular, then we say that we perceive something with a specific acoustic property. We can point towards the perceived.

When we say we listen to something specific, we say we perceive a context. We can point towards the “how of the perceived” as well as towards the “how of our perception.”

When we listen, we always embrace a longer period of time, which extends beyond the identification and short-term interpretation of an acoustic event. When listening, our perception is embedded in a constantly changing context, in which not only the acoustic events change, but our perception itself changes, the attitude towards the perceived. Even a steady acoustic event without noticeable changes can change as we listen to it as a result of a change in our perception.

Hearing can very well be trained, sharpened, and extended, though not in the moment of hearing itself. Ear training as part of musical studies focuses on the quick identification of certain sound constellations as well as on pattern recognition and the ability to identify something one heard in retrospect by “reviving” it to analyze it outside of the actual act of hearing.

The difference between hearing and listening is not a qualitative difference.

Hearing is based on patterns; it does not change after the fact that was just heard. We hear couple of sounds and say “9th Symphony” or “I can’t get no.”

Listening creates patterns. It changes after what was just heard. We are listening to a waterfall or to a piece of music, we have listened to often before, and we can listen to it nonetheless anew – everything is the same and yet again different.

Hearing references the same; listening references the other.


II. The Ear and the Loudspeaker

We are in a concert of a large symphony orchestra. At a given moment, the following sounds can be heard simultaneously: violins playing a pitch, which one can describe as the string of each violin moving 440 times per second back and forth, excited to that oscillation by the violin bow; oboes playing the same frequency number, which is identified as “standard music pitch,” sanctified per ISO 16:1975 (the air column in these wind instruments is excited to move back and forth with this frequency through the corresponding opening and closing of the two reeds, which the oboe player controls with the lips in conjunction with the length of the air column inside the oboe determined by which holes are open and closed by fingers); flutes playing a frequency that moves back and forth 554 times per second; and trumpets playing in the same window of time three sounds, a kind of medium length one, a quite short one, and a long tone. The first of these three sounds lasts 0.8 seconds and in this time they create 527 back and forth movements of air, the second pitch lasts 0.2 seconds with a total of 132 movements, and finally the third lasts one second and creates 880 movements in this time. A musician would say: “You hear an A major triad, with the trumpets playing the interval of a fourth to the higher octave with a dotted rhythm.”

All different frequencies, which the individual instruments of the orchestra create individually, meet in the air, superimpose and join, and reach our ears in one “wave.” Our eardrums move back and forth according to this one complex waveform, which is a compound of all individual frequencies. I clearly remember my amazement when I learned that the eardrum captures in just one movement all acoustic events, which happen simultaneously around me, and that we can still, for instance, tell apart which instruments play together at the same time.

The reverse case illustrates this phenomenon readily. Say you have a small radio with just one loudspeaker. (Well, this text was written in the early nineties of the last century — so for today, imagine you have your cell phone on your right ear with just the one speaker feeding the sounds into that ear.) This speaker is basically just a “piece of cardboard,” a membrane, that can be moved back and forth. A complex sequence of changing air pressure can be produced in this way. If the membrane is moved forward, it compresses the air in front of it; and when the membrane gets pulled back, a lower pressure of air is created. These changing air pressures, created by the movement of the loudspeaker membrane, reach our eardrum, which likewise is a membrane. And then the “miracle” takes place: out of this one sequence of changing air pressures we can differentiate different instruments, we can tell traffic noise apart from music played in the same moment, we can focus on what a specific person says in the middle of a party even though the person might be talking in a group of other people away from where we stand.

For now, back to before the miracle. This one movement of varying air pressure (for stereo two such waves, one for each speaker or ear) is directly engraved in a mechanical way into a vinyl record, or it is recorded in changing magnetic fields on a tape, or a digital measurement of the wave is stored on a CD or in a computer. All this is quite simple, since the complex movement of the one wave holding all acoustic events is at any given point in time at a clearly defined point of high or low pressure within the range between maximum and minimum pressure. A microphone is also just a membrane like the eardrum, and it converts the air changes to changes in electricity, which then is used to store the wave in the ways described above. And if one wants to listen to the recorded sound, one simply goes the other way. The loudspeaker membrane is driven to move back and forth by the changes in electricity, and thus it again creates changing pressures in the air, and we can hear that what was recorded previously.

The changes in the primary medium for acoustic frequencies — namely the air — is captured in another medium (like in vinyl or in digital form) to then be converted again into air movements changing over time, which then reaches our ears.

From this the requirement for an ideal loudspeaker can be easily deduced: the speaker is to generate the air pressure wave absolutely neutrally without adding or taking away any detail; speech, music, and sounds in general are to sound “like the original.”

So the best loudspeaker is the loudspeaker, which one does not hear as such.

In the previous section, I described the scenario of listening together to an orchestra piece, and I asked if you heard the French horn. I did not specify if we listened to a recording over loudspeakers or if we were in a live concert. If we were listening with loudspeakers, it would be quite difficult to pick out the horn, if one had not known its sound color (timbre) before and had not created a hearing pattern for it. If we were sitting in a live concert, and I asked if you heard the French horn, and you had no idea what a French horn was, you would most likely ask, “What does it look like?” And I would say, “The player of this brass instrument sits in front of the third percussion player, next to the trombones, and he has in his right hand the bell of the instrument.”

“Oh, yes, I can see him — and now I can also hear the horn.”

Hearing and seeing are closely intertwined in their complementary faculties. A few thoughts about the difference between hearing and seeing may make the changes that stem from the advent of the loudspeaker evident.

The field of vision is limited by the direction of our nose. The field of hearing is horizontally and vertically unlimited and surrounds us like a sphere. The intentionally steered selective seeing is more demanded than conscious selecting hearing. We most likely say more often, “Look over there” than “Do you hear that?”

We can estimate sizes and distances more precisely with our eyes than with our ears. But when we enter a large hall or a small room, the architectural volume of the space and what kind of surfaces are predominately in the space can be immediately identified by the ear as a general impression: long reverberation, echoes muffled, resonances like in a tunnel, dry, dampened, etc. The ears allow an immediate qualitative spatial orientation, while the eyes have to be moved around to gain a more exact picture of sizes, form, and materials.

The ear is directly bound to time; going back is only possible in our recollection. The eye can wander back and forth when objects do not move (when time stands still, when time seems to stand still in the objects). Reaffirming an impression our senses perceived is done less often with our ears than with our eyes.

The eyes can be closed, a deeply rooted reflex protecting them. The eye may be used to determine the border between life and death; there is nothing similar for the ear.

Getting into the heads of people from the outside can be done easier through the ears, but it happens more often and in more refined ways through the eyes. The ears monitor the area which is not covered by the eyes, and because of that they stand in a more direct connection to physical reactions. (“Stop! A car!” and not, “If you would care to look to the right, you will see a car approaching.”) The eyes have to let everything pass more unfiltered. While sleeping, the ear keeps watch more than the eyes; it monitors the surroundings of the sleeping; it differentiates in sleep known from unknown sounds, more important ones from less important ones, in much finer differentiations than while being awake, when the eyes steer the detailed selection of important events and can clarify what is being heard. The ear has to be less protected, because the eyes only have depth perception in a limited viewing angle, so cannot grasp the space, the surrounding area, at once as a whole.

Before the invention of the loudspeaker, man could only create acoustic events with the intensity similar to the reach of the eyes. The capability of the ear to “hear around corners” was utilized as with military trumpet signals or with drumming languages, which did not only exist in Africa, but also, for instance, in Germany where charcoal burners in the mountains used a kind of Morse code for communication between valleys by beating code on wooden planks hung from trees. The impossibility to simply close the ears was recognized quite extensively, for instance, when marching to drum beats. But an acoustical event could only be produced with the loudness which could be reached in a purely mechanical way; whispering with the volume of a big drum was not possible. Only with the loudspeaker as instrument, which can amplify and convert electrical frequencies into mechanical ones in the air, the full utilization, the full exploitation of the ear, which cannot close itself, was enabled. (One could say that the loudspeaker is the first step to electronic implants that are directly connected to the nerves.)

With the loudspeaker and the technical reproducibility of sounds, the ears were separated from the eyes. Now it is not necessary that the producer of sounds has to meet the listener. The social context of the complimentary abilities of eyes and ears was dissolved.

I am describing here only the separation of the ears from the eyes. The eyes were certainly separated from the ears already long ago: in writing, painting, photography, and film. As these media are positioned in a different frame of time that does not necessarily need to be reconstituted in a linear way to recognize something, which gives an indication of the original, the separation of the ears from the eyes is much farther reaching. Even writing — which equally takes an acoustic event, the spoken words, out of time and space, out of the immediate social context of speaker and listener being in the same location at the same time — proves to be different. Since music does not have something similar to the semantic binding force of language (however divisive and not-nonambiguous this binding force in language might be), language is taking place in much smaller and syntactically tighter modules. Written language is much more robust for non-sequential deciphering than music. This is also true for pictures and films. If I watch a film with just one frame per second, I will be able to create “a picture” of what it might be about, and there are certainly more limitations in interchanging sequences, words, paragraphs, or chapters. But this is not comparable to the conditions of music, which depends absolutely on the waveforms being “in place and time.”

The loudspeaker created a qualitatively totally changed environment for the ear and for the human attached to the ear. Even a film projector or video electronics do not possess the generality of the loudspeaker. Since the ear’s perceptive faculty embraces a spherical area and is only specialized in time-based processes, two speakers in headphones or four to six speakers at a rather large distance to the ears may suffice to synthesize (almost) any acoustical space, to “depict” such a space. In contrast to the ears being absolutely focused and specialized in relative changes over time (acoustic frequencies and amplitudes in their changing patterns), the eye is far more specialized in perceiving shapes and distances in their absolute dimensions and sizes. The temporal resolution of the eye in regard to movement is much coarser than the resolution of the eye (as we know from film and video where already 24 to 30 frames per second evoke a sense of movement for us.)

The loudspeaker can make acoustic events smaller and larger, without their proportions and resolution changing. In the visual domain we do not have (yet) a similar tool, which matches the resolution and spatial orientation of the eye.

Now one might assume that the separation of the ears from the eyes supports listening. Now we can hear or listen to everything our heart may desire — independent of what our eyes are busy with and where our body happens to be. But the recent decades show that only the quantity of acoustical events, from which we can choose, has increased. We hear more and more different things, but obviously that has no influence on our ability to listen.


III. Hearing and Listening: The Acoustic Event

In the first section, hearing and listening were juxtaposed as activities, which can be differentiated. In the second section, the acoustic process that reaches the ear was described: how hearing and seeing are complementary in relation to space and time, and how the loudspeaker enabled the separation of the ear from the eye.

In the following, I would like to focus on the acoustic events we humans create, and the consequences a differentiation of hearing and listening can have for the creation, design, and composition of such acoustic events.

I was not able to find another comprehensive term other than “acoustic event,” even though it sounds pale, neutral, or matter-of-fact. I do not want to use “acoustic event” in such a limited way. But I also do not want to reference only pieces of music or sound art. Equally inappropriate would be to speak of “acoustic objects” or “acoustic items.” “Object” or “item” are contradictory to “acoustic” since acoustic implies (quick, directly perceivable) changes in the passing of time; objects exist fundamentally only in time as well, but they can be turned back and forth. A picture can be hung upside down — playing a sound backward does not correlate to that in any way. Perhaps one could say that in our memory an acoustic event turns into a mental object. But such reflections lead beyond what I would like to concentrate on right now.

First, I would like to look into how the technical reproducibility of acoustic events influence hearing and listening. For this I will narrow the focus on “music.”

Allow me to remind you that a notated composition can only become music when it sounds, when it gets interpreted. A written score is not music, but a “set of explicit and implicit instructions to produce music.” The technical reproduction of a score has nothing in common with the technical recording and playback of the sound of music.

The technical reproduction of the interpretation of a piece of music seems to allow repeating a bygone moment again and again. Each time we hear the recording, we expose our ears to exactly identical acoustical wave patterns.

When a recording is played back, it is absolutely pointless to debate if the piece is the same or only similar each time we play it, and how we as listeners got older since we listened to it the last time and that the situation, in which the recording is played, may always be a different one. We know that a repetition only exists under scientific axioms identical with that which is repeated. In all other areas of our lives it is understood that a repetition in such a strict sense contradicts everything we experience and that such repetitions are not possible. (See for instance Repetition by S. Kierkegaard). In music it is understood as well that a repetition, like the two dots in classical music indicating to repeat the just-played section, has nothing to do with “the same,” but that the repetition creates a new context in which the previously heard — that which is repeated now — is a constituent of this new context; that the repeated is not identical with the repeated and is not meant to be identical with the repeated, but that the repeated creates a new context for that which is repeated. Based on this, it is clear that there are absolutely no symmetrical proportions in music or in time in general. A construction of symmetry on paper or in wood or stone or steel or concrete is of a totally different quantity and quality than a repetition that gets unlocked through listening, in perceived and experiential time. Time and symmetry are exclusive of each other, be it in music or more generally in any temporal processes.

Playing back a music recording is clearly different from reading a book another time, from looking at a picture again, or from re-watching a movie. When playing back a piece of music, time is an integral constituent of the piece. If I change the time reference, by for instance, changing the tempo technically without changing the pitch, or through stopping the playback and starting it again after a pause, I change the interpretation of the piece in substantial ways. Varying the speed of reading a book does not correlate to such an intervention as when playing back music with varying speed; and even slowing down or stopping and continuing the playback of a movie is a different situation. (I am talking about traditional movies with a narrative, not experimental films, which establish inherently a different relationship to time.) In our mind, we can compensate for a slowing down or acceleration, since we have a constant frame of reference in the visual domain in front of our eyes — the printed letters, the pictures. (And professional musicians can also read notated music or remember the sound of a piece and slow down or accelerate it just in their mind, in their imagination.) If the movie is played slower — and let’s assume for the sake of argument that we turned the sound down — our eyes can wander around in the picture and discover more details

Slowing down a music piece while keeping pitch and timbre constant through signal processing is not at all equivalent to slowing down the speed of a movie because we have no constant frame of reference as we do with “frozen images” (which film basically consists of) and slowed-down visual cues. In such slower play-back of music, agogical fluctuations (when small accelerations and decelerations deviate from an underlying tempo) or grace notes (quick notes played before the main note they are associated with) are stretched or accelerated in strictly mechanical proportions, which is absolutely not the same as if we changed the tempo of the piece when playing it live. When we play a piece live on an instrument, such minute changes in tempo would not at all be strictly proportionally scaled to the changed tempo. It would be a different interpretation and not just slower or faster by maintaining strict temporal proportions. Maybe the grace note would still be of the same duration as when playing the piece faster, but the minute details of how such small agogic changes or a wider reaching acceleration or slowing down is realized fundamentally change with a change of the underlying tempo. We have only the alternative, to play the piece again, now slower or faster.

Let’s imagine we had technology that allowed us to play a movie faster or slower with the sound equally slowed down or accelerated (without changing the timbres, the colors of sound); what would happen if we played the movie twice the speed or stopped it? At the latest when we stop the movie, it becomes obvious that sound cannot be separated from time. Surely that is true for anything in our world. But with the ear it is immediately obvious on the surface: As described in the previous section, the ear is specialized in processing complex waveforms for our perception in a relatively small window of time. If this frame of micro and macro movements is changed, we do not “understand” it any more or we perceive something totally different and new. The original context has been changed radically.

Sounds share a temporal property with heartbeats and breathing. All three are linked in a relatively narrow window of time to a “before” and “now” and an expected “next,” and they are directly perceived as such. Outside of this relationship between “before” and “now” and “next,” in which we perceive them directly, they transform into something very different.

A “technical reproduction” of sounds is only possible outside of their proper medium of “air moved in time.” The sounds are engraved in vinyl, changed into space through magnetized particles on tape, or measured in digital snap-shots. In this, the reproduction of sound is different from all other technical reproductions of sensorial impressions. Leaving aside digitization, images, pictures, sculptures, films, smells, and flavors can be captured and reproduced in their proper or extremely closely related medium (even though they may need to emanate in time again to be perceivable). But sounds cannot be stored as sounds, but only in a medium taken out of time; they need to be converted again from a static medium into air and time to become accessible — and then they are gone again as quickly as they came.

To put it in a pointed way: Sounds and music cannot be reproduced, they can only be preserved outside of their proper medium. Or to put it in a more general form: Time cannot be reproduced.

(An aside: However relative time passes in different contexts, time always eludes the grasp of reproducibility. Even the dream of Hans Moravec, director of the Institute for Robotics at Carnegie-Mellon University, that in the near future humans will not need their bodies anymore, but will live unendingly in computers as technoid entities, is founded on the premise of infinity being based on the irreversibility of time. Only the thought of something outside of our time, which negates, contains, and elevates (sublates) time in itself, can transcend the change in time. And a reproducibility of transcendence is obviously nonsensical.)

For a discussion of a fundamental difference between a hearer and a listener, it does not matter if an acoustical event is created “live” by a sound producer or if it is generated anew over loudspeakers from a technical reproducible storage. An acoustical event can only be perceived in its movement over and through time, independent of how the sounds are set into motion in the air.

The specific conditions for hearing and listening are certainly strongly co-determined by the acoustical environment, in which the sounds sound, and by the social context, in which we hear or listen. The previously discussed separation of the ears from the eyes, when one listens only over loudspeakers or headphones, has certainly a great influence on how one as listener relates to that which one hears, and if the listening is supported or made more difficult. Under good circumstances, listening to a technical reproduction may allow for greater concentration; but it may very well also be that the lack of a shared social space of performers and listeners may be detrimental to hearing and listening. But, as I will try to make clear, the difference between a live event and a technical reproduction is superseded by the difference between hearing that is based on the same, and listening that aims for the other. The acoustical event itself and the attitude of the hearer or listener are of greater importance than the differentiation between live or canned. Because sound is always live.

Hearing relates to the same, listening aims for the other. Hearing is based on patterns; listening creates patterns.

This implies that the acoustic events for hearing and listening may also have different properties. An acoustic event an author intended to be listened to may certainly never find a listener but only meet a hearer. And an event designed for hearing may indeed never be listened to.

Again, a situation of an exchange between two people: An old person we have known for a long time talks about an event from his childhood or tells us about his fundamental political convictions. We know it almost verbatim, we have heard it countless times. We do not listen anymore. With just the first introductory words, we already knew exactly what to expect. We accept it, and we do not listen anymore, the words no longer reach us, and we tell ourselves, “Well, I am not here to listen, but just to be here.” And repeating the speech again and again is indeed a signal that the speaker has removed himself from time. The reason for him speaking has nothing to do with his words. The speaker certainly is longing for an interested listener; but he has given up to be part of time present; and the reasons for this cannot be directly understood from what he says. So he is repeating his patterns. For me as hearer, a brief snippet is sufficient to identify the story as a pattern, and then I can drift away, every now and then catching a word or two to see if the next story has already begun.

The same is true for background music in shopping malls or in a doctor’s waiting room. It is designed for not-listening. The music is to create a certain atmosphere with the largest common denominator. The volume level is not to invite listening, and solo voices are to be avoided — this is especially true for music at the work place, as attention is not to be distracted from the task that is to be accomplished. Small snippets are to have an immediate effect. The fragment of an “Oldie but Goodie” melody, a certain chord sequence, or a specific relationship between melody and accompaniment are sufficient. The arrangements of such music are aimed at continuous non-listening.

Music for commercials is explicitly designed to be heard and not to be listened to, as well. Either a known acoustical pattern — for instance the beginning of Für Elise or of Yesterday — is bonded to a new image of a product, or a new pattern is created, like a specific birdcall combined with sounds from a waterfall, which morph into the sounds of beer being poured into a glass. This new pattern — be it specially composed music or the birdcall put in a new context — has to be very closely tied to a generally known pattern in the cultural background of the target audience while at the same time having an identifiable profile of its own. After a few times of exposure to the commercial, the first second will be enough to identify the acoustic event and with that, the product.

After these brief description of music, which is specifically designed for non-listening, a brief interlude on those events, which we choose for ourselves, because we do not want to listen to them.

Many people listen to music they selected themselves while they are working. When we select our own background music, we choose music with patterns we are used to. As a result we have radio stations (or now, streaming music services) that cater to a rather limited group of patterns, like Country, Sixties, Eighties, or Classical (currently with Pandora and competitors as one of the ultimate pattern matchers), which allow us to eliminate any surprises while offering an unforeseeable variety quite strictly within our chosen pattern. We are highly sensitive when boundaries of our chosen patterns are violated. (Thumb up or thumb down in Pandora.) For instance, if we choose “classical music,” it is self-evident that only classical music should be played, which already at the time of its composition fulfilled the requirement for creating patterns without disruptions. A classical composition from the same period, which indeed has ruptures in its music as compositionally intended, has to be so well known that one can quickly “overlook” these heavy moments in the process of hearing. The multitude of different music streams to select from has one common goal: There needs to be a pattern consistency within the stream we chose. From the perspective of the provider, this is seen as meeting the target audience; for the hearer it means being safe from surprises. We have to be safe from any changes, which might switch our hearing to listening.

For sure there was always music with the important function of being an entertaining background. And I have to repeat that the difference between hearing and listening is not a priori a difference in quality. But the disintegration of the social context of hearers or listeners on the one side and musicians on the other as a result of the loudspeaker does change the quantity of acoustic events available and accessible. And this larger quantity certainly serves more hearing, since the ability to listen does not automatically increase by hearing from a larger reservoir. (On the other hand, a larger quantity of listening may indeed increase the range to hear, since it, as mentioned, creates new patterns, which may become integrated into hearing.)


Let me now come to those events, which are made for listening.

Listening directs to the other.

In the first section I circumscribed the attitude of listening based on the perspective of the perceiving person. From this perspective, we cannot deduct directly required properties of the acoustic event we are listening to, for instance in the sense of “being new,” being “unheard of,” or “never have been used before.” One condition, however, is indispensable for an event to become worthwhile to listen to: It needs a depth, which enables us to discover the other. This depth perspective or multilayered perspective is meant in terms of the acoustic and psycho-acoustic complexity and then — in the conscious shaping of acoustic events in music — also content-wise. In the process of artistic shaping, this reciprocal relationship between the shaping of the material perceivable to our senses and the cultural meaning of the shaped material are the pivot around which everything turns.

This depth allows a listener different perspectives, views, and insights. We have to draw parallels to the realm of seeing. Landscapes for example have an inherent perspective of depth for us; the eyes can move and wander between close up front and far away; we can change our focus. When looking at a painting, an interplay of depth can be discovered between shapes, perception, and interpretation. The contexts of the painting are discovered and developed through renewed spending time with it, while letting the eyes wander back and forth over the painting. Equally a piece of music may open ever-new perspectives each time one hears it. This starts with the acoustic formations, which offer ever-new threads for focusing one’s attention. Traditionally this is accomplished through part writing (how lines are shaped individually and in context with the others) and through instrumentation (which instruments play which lines). In electronic music this may already start with the definition of timbres, of the spectral components of sounds. And this compositional process continues to the shaping of the form or the forming of the shape, of melodies with accompaniment or without, of chord or noise shapes, progressions or cuts, to repetitions, variations, larger and smaller sections … in short, the process of past and present, the game of memory and anticipation — or to put it into the words used here so far, the play with the same and the other.

The manner in which the depth of a music piece is shaped is determined by the interplay between craft and speculation, between imagination and execution. Analytical psychological (not psychoanalytical) criteria are only possible a posteriori and are equally unimportant for making the music as they are for listening to the music. Since the context established by the composer and musicians, by the music and the listener co-determine the potential for listening, it is not possible to define from, say an information-theoretical systems perspective, how many changes and alterations will yield an optimum in “listenability.”

Far more important is the analysis of the used, composed, and sounding materials from a crafts perspective to detect those parts of the piece where the context of the piece establishes in and by itself in relation to the context of the listener changes or breaks. For such reasons we can still discern different qualities in compositions from centuries past. If, for instance, in a symphony from the 19th century a certain change in a harmonic progression takes place and this change was revolutionary back then, but has become a well-used pattern by today, we can still experience this change as exceptional within the context the symphony establishes by itself. (This certainly requires experience in the specific cultural tradition.)

A merely quantitative change of events does not automatically yield depth perception. As we know from the music history of the 20th century, deterministic as well as chanced-based music can result in the same degree of “un-listenability” or boredom. Similarly, repetitive music can attract listening or deliver an acoustical wallpaper as backdrop.

The same and the other are determined by the specific scale of temporal resolution, which is referenced, shaped, and perceived. Such scale can be frequencies of pitch or of vibrato, beats per minute, drones, or non-metric sound fields; or a canon, a movement of a symphony, or an opera. They all have different temporal frames, which can have inherent different determinants for the same and for the other.

The same and the other are to be gauged within the particular temporal scale, on which a composer or musician works or on which we hear or listen, respectively. A dimension of depth cannot be directly linked with structural complexity. Also here it is true that The Same can change to become The Other and that The Other can become The Same.

As hearing relates to the same and listening relates to the other, already the acoustic events, which are intentionally shaped for listening, have to make the same and the other objects of their soundings. There is no global determination where the differentiation between the same and the other starts or ends. Exactly here is the essential realm of compositional work; musical-acoustical worlds, which ask to be listened to, arise out of this challenge.


IV. Hearing and Listening in the Age of Media Technology

When we speak of someone being gifted as a medium, we do not mean a gifted media artist — parallel to someone being gifted as a musician. We mean someone like the Pythia, the oracle of Delphi, who sat at a chasm in the rocks from which vapors rose and who divined the future. She was a medium, the carrier of something Other. The same way a piece of paper with letters written on it is a carrier, a medium, of something, which is brought to life for us when we read it.

Our senses are specialized for specific forms of energy, which they can perceive. The term media goes beyond what can be described in terms of the natural sciences. In as far as media are referred to as carriers of something, implicit within the term medium is the tension between its physical appearance and the content it carries, its meaning and interpretation. And in our culture it emerged as meaningful to differentiate between the media and the content they carry (which, by the way, has nothing to do with the questions of “form and content,” a constant topic in aesthetics — the media are not form as juxtaposed to content, but they enable form, among others).

There were always media. I would like to limit them for this discussion to those which we can perceive with our senses, which need a medium: hearing, seeing, smelling, tasting. The sense of touch does not need a medium. Without media, physical touch would be the only means of communication.

When we speak of media technology we are referring to media that get their energy from electrical sockets. To be able to perceive what such media transmit, it has to be converted again into those media, which our senses can perceive.

At this point though we should not forget the development to bypass our sensory organs by feeding electrical impulses directly to our nerves. And indeed, once all sensory organs can be bypassed, only the physical touch may be left as delivering the only direct impressions. I see a group of people, who perceive their sensory inputs with implanted receivers. Their skin and their nerves, which convey the signals of their bodies, are still needed for reasons of survival. So they huddle together closely like a herd of frightened sheep in order to perceive the others and themselves “for real.” This image is depressing. So I would not want to expand on it.

Up to the invention of electrical energy, we always worked with those materials, which could evoke the corresponding sensory perceptions: we played instruments, which moved our eardrum directly via the air; we mixed paints, which met our eyes directly; we invented weapons, which penetrated the body physically. We wrote and printed with ink on paper. We mixed fragrances, which indulged our senses; we refined cooking recipes, generated warmth with fire, and created sculptures, which we could walk around to view them from all sides. We molded the materials, modulated the carriers, which were suitable to our sense organs.

With the invention of electrical energy, our sense organs were not altered yet — but the level on which we can shape sensory perceptions indeed changed.

I would like to reiterate that I limit myself in this context to the senses, which we can experience directly in our short-term memory. I do not speak of extrasensory perception, not of long-term chemical changes taking place in our bodies which all of the sudden become perceivable. I am not speaking of microwaves, radioactive exposure, or homeopathic doses. And I do not speak of perceptions, which we generate inside ourselves. I speak of perceptions, which reach us through our senses via media, which we can shape.

In the 20th century, media technology exploded through electricity. The traditional technologies of reproduction and production for shaping sensory impressions — like books, pictures, theater, sculptures — were overlaid with electronic technologies, which can convert what can be perceived by the senses into an electrical signal and then convert it back again to be perceived by the senses.

Electricity enabled it so that converted signals for our senses can be brought to any point on the planet with the speed of light. As soon as the electrical, respectively electronic, media of reproduction were invented, they were used as media for production. A reproduction is not needed to shape it differently with electricity. It is possible to shape directly in the electrical medium.

The past ten years (note: this text was written in 1992/1993, so I am referencing the 1980’s) brought a revolution in the consumer market through the possibility to digitize the full range of what the ears can perceive. This happened so quickly that no one could really comprehend what was happening: audio CDs, CDs with image and sound, digital radio transmission, and the new digital audio tools became cheaper and cheaper.

In the next five years (prediction for the 1990’s), the same development will have taken place in the visual domain. The digital tools for image processing (for the mass market) will come to par with those of audio.

And so we speak today of media technology in a moment when the eye and the ears can equally be confronted with sensory impressions, which are shaped in the common digital medium. Our senses cannot directly perceive this medium, and our hands cannot shape it directly. The digital medium can be called a meta-medium. Sensory perceptions can be abstractly coded in this meta-medium, and they can undergo abstracted processes and operations, detached from the sensory perceivable media. To control these processes with our senses, we need tools for translations in order to work in the un-perceivable medium. These may be programming languages, graphic interfaces, sequencers for visual animations or sounds, joysticks, data gloves, or graphic tablets.

Up to now (1993), media technology in a commercially exploitable scale is limited to eyes and ears. In the experimental laboratories other systems are developed, which also mediate tactile and force-feedback perceptions. But hearing and seeing are a much better fit for commercial exploitation, since they do not need a direct mechanical connection, they can be brought to the senses via “non-touch” media. Thus far, tactile and force-feedback computer systems will most likely be used only for special applications.

The electrical media for reproduction became immediately media for production. Briefly after the invention of the loudspeaker, electrically generated (synthesized) sounds for music were created (Thaddeus Cahill, 1897). After the invention of optical sound on film, a first composition of recorded sounds (without images) was cut and spliced (Ruttmann, 1930). Briefly after the construction of the first general-purpose digital computers, they were used for artificial images and sounds (1950s).

Digitized recordings of what we can perceive with eyes and ears are not any longer the requirement for creating in the digital realm something our eye and ear can perceive. The symbolic formulation in the computer, in the meta-medium, is sufficient. But only converting to a medium, our senses can perceive, allows the control through experience.

If the separation of the ears from the eyes as a result of the loudspeaker seemingly reversed through the use of sound in film and television, the digital encoding seems to enable again a holistic experience. All information is present in the same medium, in the digital meta-medium.

The technology seems to suggest, that the common foundation in the digital encoding may also be seen as communality on the content level. Words, numbers, data, graphics, images, sounds — all can be linked at first only in the digital medium, but why not also for our senses? It can be deduced directly from the juxtaposition of eyes and ears in the previous chapter, that the connection in the meta-medium, in which everything is available in bits, has nothing to do with an automatic structural or content-based connection between them since the senses are complementary and underlie different temporal conditions. One can observe in many multimedia artworks reduplications between the two sensory areas (a large dark blue circle moving over the screen from bottom left to top right accompanied with a low ascending tone; or the maps of the stars transformed to shorter and longer bleeps depending on their brightness), which do not acknowledge the conditions and different potentials for differentiation of each sense and pass over them in a superficial approach. Certainly this existed already in non-digital times. But the existence of a uniform meta-medium seduces to speculations, which do not also have to take into account a crafts-based concept of the material, which meets our senses to be perceived.

“Multimedia” is a buzzword everywhere (1993), used now for the digital connection of images, sounds, and texts, which can be sequenced on a timeline and which the end-user can control within a predefined branching network. Also in this scenario, eye and ear can only be meaningfully employed if the specific conditions of the different media of delivery to the senses are consciously incorporated. Up to now (1993), often fragmentations of units of meaning are in the foreground, where the point of departure is the simultaneous presentation of images, sound, and text already implies a benefit.

Another current (1993) buzzword in media-technology is telepresence. Our senses are connected with other people, independent of the physical location of each one connected. We have an exchange via eyes, ears, and touch through the electronically established connection. Brave new world? No, most definitely a coming reality. Not that this would be something fundamentally new. We have used the telephone already for a long time. The ability to listen most likely did not improve through using the telephone. Besides the exchange of information, the telephone serves mostly to reinforce emotions in the moment, the confirmation of the same. What has been gained through the telephone is a larger and more rapid throughput of information; the advantage does not lie in the area of listening in the sense of a creative restructuring.

One could take the position that all developments of media technology of the past one hundred years have already been passed through paradigmatic settings with which we have experience. From such a perspective, interesting insights may be obtained. The gain of media technology has been in the area of an expanded potential for the distribution of information. And then — if we would like to — we may discover that the increased quality in technical reproduction and production has not resulted in a greater clarity or potential for the differentiation of our senses or of the sense we create through our senses.

Hearing is based on the same, listening aims for the other.

With media technology and the resulting disposition of almost arbitrary changes of temporal sequences, “the Same” and “the Other” may have changed positions in everyday life. The quantity of the Other turns into the Same, and the same becomes allegedly part of our nature, a necessity we believe we cannot live without.

Listening cannot be a goal of media technology. A gentle feeling of fondness, an affective bowing to another person, to lend my ear to what someone else says or plays — all these are properties not exploitable by market forces. Listening and looking closely are human qualities, which require life-long fostering; they cannot be occupied, and they cannot be shed.

But on the other hand, listening and looking closely do very well have a place in the environments shaped by media technology — namely the place which we allow it to have or which we allow others to create. And this has not changed over the history of humanity.

And there may be artists who create something other with media technology — not as anticipation of the new and coming The Same, but as events that invite to listen and to look closely.


Thank you to Argeo Ascani, Kaitlyn Zafonte and Nathan Wooley for having worked through the translation with their expertise and their native English minds. All inconsistencies and potholes go to the account of the author (