Audio Visual Image Blog: October 2012

Wednesday, 31 October 2012

Week 5 - The Human Ear

The human ear, somewhat important with regard to audio as it's what allows us to sense sound at all. It is comprised of multiple parts, which I'll go into brief detail about in this post.

The ear itself is as complicated as you may expect for anything related to biology, but it can be broken down to three major areas; the inner ear, middle ear and outer ear.

Outer Ear

The outer ear is the visible part from the outside with the most prominent feature being the pinna. The pinna acts as a funnel, which focuses sound down to the auditory canal, essentially amplifying the sound. The shapes of the ear lobe are not an accident, the shape filters the incoming sound allowing the frequencies that humans are most concerned with (the ranges in which human speech lay) to pass into the ear canal more easily.

The main purpose of the ear canal is actually to provide defensive measures against infection, but it also acts as a passageway for sound to travel through in order to strike the eardrum. Much like a digital sound system, the brain can't comprehend sound from sound waves. Like digital systems, it needs sound to be convered into a form it can process, although unlike digital systems this form is still an analogue form. The ear drum, also known as the tempanic membrane, is the only part of the ear that deals with actual sound waves. It
s vibrated by them. This converts the sound into a form that the middle ear then has to deal with.

The Middle Ear

The middle ear consists of a mechanism made up of the three smallest bones in the human body. These three bones are the malleus (hammer), incus (anvil) and stapes (stirrup). The purpose of this mechanism, along with the tempanic membrane, is to convert sound waves into a mechanical form which is then passed on to the inner ear.

So why is this step even necessary? It's due to the nature of the inner ear. The inner is filled with fluid, while the middle and outer ears are usually filled with air. A sound travelling through air will be largely reflected off the surface of something much denser, such as a liquid. The result is very little of the sound makes it into the liquid itself. The middle ear circumvents this by directly stimulating the inner ear with mechanical vibrations.

The Inner Ear

The inner ear consists of the cochlea, whose task it is to turn the mechanical vibrations from the middle ear into electrical signals that are sent to the brain to be processed. It also needs to be able to transmit different frequencies along different parts of the nerves in order for the brain to quickly discern frequencies.

The basilair membrane, which resides inside the cochlea

The main component that achieves this effect is the basilair membrane. It is a non-uniform membrane which varies in two ways from base to apex; the base is thinner and stiffer than the apex. This allows vibrations of differing frequencies to vibrate the membrane in different ways, which in turn allows it to stimulate different arrays of tiny hairs which in turn stimulate different nerve endings. The base vibrates stronger with higher frequencies. The whole membrane actually vibrates at lower frequencies, but the peak of the vibrations occur towards the apex, allowing the actual frequency to be determined.

Below is a series of images roughly demonstrating the response of the membrane to different frequencies. The coloured shape represents the size of the vibrations at given points.

This is the physical limiter to the range of frequencies that we can hear. If the frequency is too high for any part of the membrane to vibrate, or too low to determine where the peak is, then we simply won't hear the sound.

To pick out multiple frequencies from a given wave is easy, as the brain can discern where each of the peaks reside on the membrane.

The electrical signals generated from this are then processed by the brain as sound.

http://library.thinkquest.org/05aug/00386/hearing/index.htm
http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398672&section=2.1
http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398672&section=3.3

Wednesday, 24 October 2012

Week 4 - Goldwave

This entry will be discussing the use of Goldwave and its various features.

To start off, here is an image of the UI with a sound file loaded into it.

Loading a file into Gold Wave is as you would expect for most software in Windows, by simply clicking on File -> Open and then navigating to the appropriate file. The wave form is displayed in the center in green.

The main display of the sound is little more than a simple chart, with an X and Y axis. In this case, the X axis represents time in hours, minutes and seconds and the Y-axis is a normalized scale ranging from -1.0 to 1.0. In this situation, normalization refers to putting values from one scale into another scale, in this case from 1 to -1 (http://mathforum.org/library/drmath/view/60433.html). You can change what the X and Y axis measure with by clicking Options -> Window. Changes that you make to the Y axis will not alter the waveform (even in the display), only the units that the amplitude is measured with.

This is the control window, which is used for playback purposes. The green play button will play the entire sound from the current position of the cursor. The yellow play button will play from the beginning to the end of the currently selected portion of the waveform, regardless of the position of the cursor.

This is simply showing what a selected portion looks like. To highlight a part of the sound, simply left click and drag over the portion of the sound you wish to focus on. The yellow playback button will only concern itself with this portion of the file until you de-select it. This is also necessary if you want any edits from simple ones (cut, paste) to more complicated ones (altering amplitude, filters) to only affect a given part of a sound, instead of the entire thing.

You can also save the selection through File -> Save Selection As.

This portion is for control and monitoring of the sound. The top bar is Volume, which is self explanatory. The second is balance, which is to say how much to split the volume between Left and Right. Center is balanced. The third bar is speed, which is again self explanatory. These are all things that can alter the sound while it's playing back, without actually editing the sound itself.

The final bar is the VU meter (Volume Unit meter). This concerns itself with the loudness of the sound at present, the bar itself will push a line which has a delay before it descends back to the left whose purpose is to indicate the peak loudness of the sound. Red indicators are at the left of the bar and will light up if a wave's amplitude is so large that clipping occurs (which means information is lost).

This selection of buttons in the main toolbar control zoom levels. The left-most is "View All", which will fit the entire sound into the available window space. The second button, "View Selection", zooms until the current selection fits within the available window space. Previous Zoom just returns to the last zoom setting. Zoom in and Zoom out are self explanatory. Zoom 1:1 will zoom the view up so that each pixel on the screen represents 1 sample. (http://hbkurs.gymnasium.sundsvall.se/HB_05_Le_mum1210/lj87togr/Projekt%20X/GoldWave/GoldWave.htm#Zoom 1:1).

Effects of Inverting and Reversing

The Effect menu has a number of functions in it, but this section will only concern itself with the effects of two of them. A particular part of the wave form is selected for the purpose of this.

After applying the invert effect (Effect -> Invert), the wave now looks like this:

Visually, the wave is quite different. However, it doesn't actually sound any different. This is because all invert does is alter the phase of the wave by 180 degrees. The order of the fluctuations in frequency is still the same however, so the resultant sound is completely unaltered.

After applying the reverse effect (Effect -> Reverse):

Now of course the sound is completely altered, as the fluctuations in frequency have the inverse order to the original wave.

Attenuation

For this section I'll be editing the below pictured part of the wave form. The sample is the word "equality".

Chosen as there is a very clear difference in the amplitudes of various sections, the middle is far larger than either side.

It can be altered to be as large as the other parts through the Effect menu, namely through Effect -> Volume -> Change Volume. So, selecting that part of the wave and then bringing up the menu:

Some relevant values:

-6dB reduces the sound by 50.12%
Increasing it by 3dB increases the sound by 141.25%
Increading it by 6dB inreases the sound by 199.53%

It's close enough to determine that a 6dB rule applies here, where a sound's volume doubles every 6dB gain and halves every 6dB reduction.

So let's reduce the wave to match the rest of the waveform's level close enough.

There is an audible difference, but it's only slight. This could be useful for making sure a given wave form has a consistent loudness level, which might be necessary when mixing it with other sounds.

So how would I go about bringing this sound to the maximum loudness level possible while avoiding clipping? I could manually adjust the volume over and over until I get it right, or I could just go to Effect -> Volume -> Maximize Volume. This will raise the volume for the selection until the wave with the highest amplitude hits maximum. The result is the loudest this sound can get before information would be lost due to clipping.

The result:

Wednesday, 17 October 2012

Week 3 - Storing Sound Digitally

This entry will discuss how to store sound waves digitally.

The challenge of storing a sound into a computer has multiple steps to overcome. First, a sound has to be transformed into an electrical signal which can be done through a microphone. Next, this analog electrical signal needs to be transformed into something that a computer can store.

The below image is just a simple representation of how sound is stored in a digital form</

The reason that such conversion needs to happen is due to the fact that an analog signal is continuous. A computer can only store binary information, which can not represent continuous data (you would need an infinite amount of bits to do such a thing). So in order to store it, a continuous signal must be turned into a discontinuous form that a computer can store, a digital form.

This is achieved by essentially taking snapshots of the sound wave as represented by an analog electrical signal. There are a number of factors that come into play which will determine the precision of the snapshot however, which this section will now go over.

Sample Rate

The sample rate is the number of samples of an analog signal captured per second, expressed in Hz. The greater the sample rate, the more precisely the wave is captured. A higher sample rate will also require more space to store each second of audio, this is known as the bitrate (number of bits per unit of time, usually seconds). Sample rate isn't the only thing to affect bitrate, but I will go over that later.

In this example, the analog wave (blue) is sampled at a regular rate. Each sample is marked with a red square, representing a value that is stored digitally. This is the only thing that gets stored in the computer, anything that takes place between the points is lost. The green lines represent a rough figure of a wave that would be output from the samples taken. As can be seen, a fair amount of precision is lost. For example the deepest part of the first valley is cut off. Notice how it more accurately captures the slightly longer wave.

In this second example, the sample rate is doubled. While some information is still lost, the resultant wave in green still resembles the original wave much more closely than at the previous sample rate. The cost of course is twice the number of dots, which means twice the amount of data required to store it.

Finally, this example is half the sample rate of the first example. The first half of the sound is almost completely lost, but the longer wave is captured at least in part.

As can be seen, lower frequency waves can be captured with a lower sample rate. Lower sample rates are however completely incapable of representing higher frequencies, so to capture a wider range of frequencies will require a higher sample rate. There is in fact a minimum sample rate required to capture frequencies accurately.

Sample Rate (kHz)	Maximum Frequency (kHz)
8	3.6
11.025	5
22.05	10
32	14.5
44	20
48	21.8
64	29.1
88.2	40
96	43.6

Source: http://wiki.audacityteam.org/wiki/Sample_Rates

The above table shows a trend. The sample rate is always just over twice as much as the maximum frequency it can capture accurately. Human hearing has a range of 20-20kHz, which means that the minimum sample rate required to capture sounds that humans can hear is 44kHz. Anything higher than that would have an almost indistinguishable increase in quality for humans, less than that however the difference will be noticeable.

Quantization

The amount of data stored per sample is the other factor that goes towards precision, as well as the size of the resulting file. This is also known as bit-depth, word size or resolution.

The value of a sample at any given point is a representation of the level of the analog signal it captured. This is usually an electrical signal that will have some range, say for example from 5 to -5 Volts. An analog signal is continuous however, thusly the actual range of possible values it could have are infinite. As a computer cannot store an infinite range of values, it has to be rounded to some value that a computer can store. As such, the range of 5 to -5 volts can be split up into a number of steps. The number of steps that can be represented per sample is dependant on the number of bits you're willing to delegate such data to. The higher the number of bits, the greater the precision and also of course the larger the size of the sample in bits.

This image illustrates the lack of precision that can be caused by low quantization. Here, each value that can be stored in a sample are listed along the left side of the chart. The red line represents a sample. The actual analog value of the third red dot could be something like 3.4092.. etc. This can't be stored in the illustrated system however, so it is instead rounded down to 3. This would audibly alter the sound, so to get a more accurate capture a larger number of steps would be required.

The number of steps it is possible to store is directly related to the number of bits per sample.

8-bit quantization can represent 255 voltage levels.

16-bit quantization can represent 65,536 voltage levels.

24-bit quantization can represent 16,777,216 voltage levels.

Bit Rate

The bit rate is just the number of bits per second that a given piece of digital audio has. It's worked out from the sample rate and the quantization as well as the number of channels (for example, stereo sound has 2 channels).

Bit Rate = sample rate * bit depth * channels

A 44.1kHz stereo sound with 16-bit quantization is thusly:

44100 * 16 * 2 = 1,411,200 bits per second.

The file size of a piece of audio can be worked out from this if you also add in the duration.

Filesize = sample rate * bit depth * channels * seconds

If the above example was 3 minutes long then

Filesize = 1,411,200 * 180

Filesize = 254,016,000 bits. Or 31,752,000 bytes or around 30MB.

Of course this is only valid for uncompressed audio. The higher the level of compression, the lower the bitrate but potentially also the lower the quality of the sound upon playback.

Source: http://www.dolphinmusic.co.uk/article/120-what-does-the-bit-depth-and-sample-rate-refer-to-.html

Thursday, 11 October 2012

Week 2 - Wavespeed, Interference and Loudness

This entry will be concerning itself with the findings of researching wave. In particular, wave properties, how waves can interact with each other and how they are then measured and interpreted.

Calculating Waves

Most arithmetic involved in waves concerns itself with 3 main properties.

Wavelength, represented by λ (lambda)
Speed, represented by c
Frequency, represented by f

Any property can be worked out by knowing the values for the two other properties via the formula:

λ = c / f

This means that you can calculate the speed of the wave by multiplying the wavelength with the frequency, however this leads to the incorrect assumption that altering the frequency affects the speed of the wave.

Assume a 1KHz sound wave is travelling through air at 20*C at sea level, which carries sound at 343m/s. The wavelength of a 1KHZ sound wave is 34.2cm, or 0.343m.

Speed = Wavelength * Frequency

Speed = 0.343 * 1000

Speed = 343m/s

Assume the above conditions, but for a 10Hz wave. The wavelength of a 10Hz sound wave is 34.3m

Speed = 34.3 * 10

Speed = 343m/s

As you can not change the frequency without also changing the wavelength, you can not alter the speed of a wave by changing its frequency.

The speed of sound itself is dependent on the material through which it is travelling. Here is a table of figures, taken from http://www.engineeringtoolbox.com/sound-speed-solids-d_713.html

Medium	Velocity
Medium	(m/s)	(ft/s)
Aluminum	6420	21063
Brass	3475	11400
Brick	4176	13700
Concrete	3200 - 3600	10500 - 11800
Copper	3901	12800
Cork	366 - 518	1200 - 1700
Diamond	12000	39400
Glass	3962	13000
Glass, Pyrex	5640	18500
Gold	3240	10630
Hardwood	3962	13000
Iron	5130	16830
Lead	1158	3800
Lucite	2680	8790
Rubber	40 - 150	130 - 492
Steel	6100	20000
Water	1433	4700
Wood (hard)	3960	13000
Wood	3300 - 3600	10820 - 11810

Mathematically, what effect does this have on the wavelength on a given frequency? Let's try the 1KHz sound wave again, this time through a typical hard wood.

Wavelength = 3960 / 1000

Wavelength = 3.96m

For the same frequency then, the wavelength is longer if the medium through which the sound is travelling propagates sound more quickly. The inverse is also true, if a wave is travelling through a medium that propagates a wave more slowly the wavelength for the same frequency is shorter.

The following is a link to an experiment which supports this notion: http://www.ap.smu.ca/demonstrations/index.php?option=com_content&view=article&id=147&Itemid=85

Interference

When two waves propagating through the same medium pass through each other, interference will occur. The effect of the interference depends on which parts of the phase of each wave are colliding in a given area. All following cases assume the waves are of the same frequency and amplitude.

If the waves are in the exact same phase as each other then the effect is the amplitude of the resultant wave is doubled.

The blue wave is two sound waves of the precise same amplitude, wavelength and frequency occupying the same space and in the same phase. The green wave is the resultant wave from the interference.

If one wave is meeting another halfway through its phase, then the result is the waves effectively cancel each other out.

This is because the resultant wave is the sum of the amplitudes of the waves that are superimposing on each other during interference. The interference is said to be constructive if the amplitude of the resultant wave is higher than that of the interference waves. The interference is said to be destructive if the resultant wave's amplitude is lower.

As the type of interference is dependant on the particular phases the waves happen to be meeting on, take a scenario where you have two speakers emitting the same sound as each other. If you walk away from the speakers you will notice the sound get louder as you are in an area where the peaks of the waves meet and get quieter as you move into an area where peaks are meeting valleys. If you continue to move, you'll enter another area where the phases of the waves line up and again the sound will be louder. I confirmed the effect for myself by moving around the lab while a 1KHz tone was being played through two speakers.

One bizarre effect of this phenomenon is that if you are in an area where phases are out of line with each other and thusly can't hear anything, covering a speaker will remove the interference and actually cause the sound to get louder.

Amplitude and Loudness

Amplitude, typically the vertical scale when illustrating a longitudinal wave, is the magnitude of change in a particle's position, at least when talking about sound. It can be measured as either peak amplitude (0 to the wave's peak) or peak-to-peak (crest to valley).

A higher amplitude is associated with a louder wave, but this isn't entirely straightforward as this section will explain. Amplitude is the measure of the amount of force applied over an area, it is related to intensity and both are related to the power of a sound.

Amplitude can be measured in Newtons per square meter (N/m2)
Intensity can be measured in Watts per square meter (W/m2)
Power can be measured in Watts

Intensity is directly related to the distance from the source via inverse square law. As a wave spreads from its source its energy is spread out over a wider area, thus its watts per square meter will be lower. A more powerful initial sound will be able to travel further than a less powerful one before becoming inaudible.

Loudness is a matter of perception. Firstly, doubling the power of a wave isn't nearly enough to double the volume given the sheer range of powers that the human ear can detect. The threshold of pain requires a sound that is around 1 billion times more powerful than a whisper. A whisper itself is 1000 times more powerful than the threshold of human hearing.

The below table was extracted from http://www.sengpielaudio.com/TableOfSoundPressureLevels.htm, in order to support the above paragraphs.

Table of sound levels L (loudness) and corresponding sound pressure and sound intensity
Sound Sources (Noise) Examples with distance	Sound Pressure Level L_p dB SPL	Sound Pressure p N/m² = Pa sound field quantity	Sound Intensity I W/m² sound energy quantity
Jet aircraft, 50 m away	140	200	100
Threshold of pain	130	63.2	10
Threshold of discomfort	120	20	1
Chainsaw, 1 m distance	110	6.3	0.1
Disco, 1 m from speaker	100	2	0.01
Diesel truck, 10 m away	90	0.63	0.001
Kerbside of busy road, 5 m	80	0.2	0.000 1
Vacuum cleaner, distance 1 m	70	0.063	0.000 01
Conversational speech, 1 m	60	0.02	0.000 001
Average home	50	0.006 3	0.000 000 1
Quiet library	40	0.002	0.000 000 01
Quiet bedroom at night	30	0.000 63	0.000 000 001
Background in TV studio	20	0.000 2	0.000 000 000 1
Rustling leaves in the distance	10	0.000 063	0.000 000 000 01
Threshold of hearing	0	0.000 02	0.000 000 000 001

Dealing with such a wide range of digits isn't easy for the human mind to cope with, so instead of expressing sound levels in sound intensity it's expressed in Decibels (sound pressure level) which is a logarithmic scale. A logarithm is just a way of expressing large numbers by writing how many powers of a given base a number is. For example, the logarithm of 1000 is 3. A power level of 1,000,000,000 can be expressed as 9.

Decibels are a measurement of the ratio of two sounds, it can be worked out with the following formula:

dB = 10 * log10(Power1 / Power2)

A sound of power 1000W / 1W produces:

dB = 10 * log10(1000 / 1)
dB = 10 * log10(1000)
dB = 10 * 3 = 30

If we double the power..

dB = 10 * log10(2000 / 1)
dB = 33.01 rounded

http://www.indiana.edu/~emusic/acoustics/amplitude.htm

So doubling the power raises the sound by 3dB. Likewise a 6dB raise corresponds to 4 times the power. As it happens, 6fB appears to be the amount you need to increase a sound by in order to double its percieved volume (http://www.practicalpc.co.uk/computing/sound/dBeasy.htm).

There are two other factors that affect percieved loudness, however. Time and frequency.

The human ear levels a sound over a 600-1000ms window. Anything after that will not appear to get louder, but sounds shorter than 600ms will appear to be quieter, even though their dB are the same.

The other factor is Frequency. Human ears can, on average, detect between 20-20,000Hz sounds. It does not however hear all these sounds at equal percieved loudness. Rather than sounds suddenly cutting to silence if their frequency was to wander off either end of the scale they appear to fade out.

This can be demonstrated on an equal loudness graph, where the sound pressure level (dB) required to achieve an equal loudness across a range of frequencies is plotted out. The lower the dB, the more sensitive to that frequency the human ear is.

The chart indicates that human ears are most sensitive to frequencies in the 2-4KHz range.

Decibel calculator for power levels: http://www.radio-electronics.com/info/formulae/decibels/dB-decibel-calculator.php