Audio Visual Image Blog: 2012

Tuesday, 11 December 2012

Week 10 - The Sad Duck

Well Swan, really. Maybe that's why it is sad? It is the ugly duckling, er, duck. This is my video for the final lab of the module, where I had to edit together some raw footage to attempt to convey some sort of narrative and emotion. As per specification, the clip is exactly one minute long.

On this project I used Windows Movie Maker for editing the video and Goldwave for editing the sound.

For this I went for a sadder sort of video, with loneliness as the main theme. It was inspired when I saw the opening shot of the movie, with the cameraman staring at this lonely swan. Initially I wanted to try to make something dramatic and dark about this swan, but didn't believe that would pan out. Using that clip as the starting point, I looked through other parts of the raw footage to see if I could find other shots of this solitary swan. Rather fitting how in one shot the swan lowers its head while stepping forward, such a depressed little fowl.

The pitch editor. This can be found by opening the Effects menu and selecting Pitch

Now, none of the music as it was would certainly fit this theme, so I needed to edit it. I listened through the music samples to pick on any part of the music that I thought would sound good after shifting the pitch down a few semitones. Think I found a rather fitting piece, though I can't help but feel I could have made it fit the video more through editing. In particular the music seems to kind of wander off the end of the clip, but I couldn't work out a good way of cutting the music without altering the rhythm.

The final shot is a complete coincidence. While looking through this clip of ducks quacking away, only to go to deadly silence as soon as the swam came into the shot was too perfect to miss. A fitting ending I thought, seeing the ducks all swimming away from the Swan, poor thing is so lonely. This is why I left the sound in for this portion and I think it fits.

Conclusion

What can be taken away from this exercise is that from footage like this, where there may not really be a whole lot going on, narrative and meaning can be created through selective editing and with the right audio track.

Sunday, 9 December 2012

Week 9 - Convolution Matrixes

This entry will be looking at how to manipulate images with generic filters, using what is known as a convolution matrix.

Convolution Matrix

A convolution matrix is a grid of number values. When applying a convolution matrix to a given pixel, the result will depend on the values in the matrix. This is in essence all a matrix looks like:

The center is a different color in this as it's the center of the matrix, which concerns itself with the current pixel the matrix is being applied to. In this case the matrix loosely means "take 100% of the value of the center pixel and apply it to the current pixel", which means this matrix would do absolutely nothing if applied to an image.

Consider a bitmap as a series of values on a grid, with each value representing a pixel. For the purposes of demonstrating what a convolution matrix does I'll use a 5x5 grid of pixels which will be grayscale for simplicity.

For the purposes of this entry, 10 will be the maximum value (white) and 0 the minimum (black). This is for simplicity, as in a real bitmap considerably more values are used represent grayscale images (255 or more).

Blurring

Consider the following example bitmap:

Now, let's consider we're to apply the following matrix to it:

This matrix translates roughly to "take the values of the center pixel and all pixels in the immediate vicinity and apply it to the center pixel. So let's apply it to the above bitmap, starting with the top-left pixel and see what happens.

Now this one only has 3 pixels to work with, so the entire matrix can't be applied to it. However we can still take the below, below right and right pixels.

center = 0

below = 10

below_right = 0

right = 10

new_pixel = center + below + below_right + right

new_pixel = 0 + 10 + 0 + 10

new_pixel = 20

Well that isn't right. The limit is 10 so the pixel would just cap out at 10, or white in this case. Apply this to every pixel in the image and the result would be the same, they'd all turn white. There is an extra step we have to take here and that is to divide by the number of entries in the sum, or normalise it.

new_pixel = (center + below + below_right + right) / 4

new_pixel = 20 / 4

new_pixel = 5

It turned grey! This is because the white pixels essentially bled into it as their values influenced the center pixel when the matrix was applied to it. If we then apply it to the next pixel...

new_pixel = (center + left + below_left + below + below_right + right) / 6

new_pixel = (10 + 5 + 10 + 0 + 10 + 10) / 6

new_pixel = (45) / 6

new_pixel = 7.5

The black pixel has bled a little into the white in return! Of course, this is a bitmap that can only be made up of values between 1-10 as solid numbers only. As a result in this instance we would round that up to 8. Continuing on to apply this to the rest of the image.

The above was manually worked out and it doesn't look entirely right to me. I'm going to chalk that up to human error either in the calculation of any given pixel (which would screw up the rest) or possibly a rounding error on my part. Regardless, I can check by just applying the above matrix to an actual 5x5 bitmap with nothing but a diagonal black line using GIMP. This gives me the following result, naturally scaled up for visibility:

Much tidier than my attempt. Although like mine, the corner pixels ended up the darkest. The rough result is however the same, the values of the pixels have essentially bled into their neighbouring pixels, resulting in a blurring effect.

Let's apply this to an actual image. For this we'll use the example image given for the lab.

Once we apply the matrix used in this section to it:

Sure enough, the image is blurred. Re-applying this filter multiple times would increase the blurring effect.

Convolution Matrix Filter in GIMP

The convolution matrix can be found in the filter menu under "Generic Filters". The path to it is pictured in the below screenshot.

Clicking on that will take you to the following screen.

Note that I've highlighted the Normalise checkbox in green. There is a reason for this, you need to make sure that it's checked. If you don't, observe the result of the earlier blur matrix.

This proves what I said earlier, that failing to normalise will cause the pixels to become white.

Edge Detect

So we blurred the bitmap earlier, let's try to get that edge back. This can be sort of achieved with the following matrix:

Here we have negative values, which means we take the negative of those pixels (if the value were 255 it would now be -255 in the sum). Applying this to our earlier result in the actual bitmap gives us the following result:

The edge is a little better defined. Repeating the filter again gives the following result:

This is as close to the original edge as this filter will give us. This is thicker than the original edge, but it is inarguably an edge. This filter was pretty well designed to find a diagonal edge like this however. If we use a slightly different edge-detecting matrix:

Like above, this was repeated twice for consistency. Interestingly in this case it's actually flipped the line. Finally, let's use a convolution matrix better suited to straight lines and see what it does.

Nothing like the original edge.

The conclusion to take away from this section would appear to be that you will need to figure out which edge-detecting matrix will produce the best results for your image.

Conclusion

Digital images can be manipulated in a number of ways through mathematics. Blurring, sharpening, edge detecting and embossing can be achieved by applying a convolution matrix to each pixel.

Image processing programming:

http://archive.gamedev.net/archive/reference/programming/features/imageproc/page2.html

Gimp Manual on the convolution matrix:

http://gimp.open-source-solution.org/manual/plug-in-convmatrix.html

Week 8 - Speechtone.wav

This week's lab required us to edit a waveform to improve it as best as we believe we can. This entry will be talking about the process I took to try to improve the quality of the waveform.

Let's first look at the unedited waveform:

Don't really need to play the sounds to realise something isn't right there. The bulk of the waveform seems to be at a consistent amplitude with the odd spike. This isn't normally how sounds look, not unless there's a constant tone.

Removing the tone

In this case of course, there is. A continuous tone with the faint hints of what might be speech buried in there. I have no idea what the frequency of the tone is, but my first point of approach was to see how it looks in the spectrum filter.

There is a clear spike between 400Hz and 500Hz. Like with the previous exercise, we create anchors on the spectrum filter and reduce where I believe the pure tone is.

After some tweaking with the positions of the anchors I can no longer hear the pure tone on the preview. After applying the filter the waveform now appears as follows:

Looks much better, at least from this distance. Playback sounds a little low quality however, not to mention quiet. So let's raise the volume.

Removing Noise

The spectrogram of the result of this operation shows that there's a fair degree of noise in the sound. In an attempt to reduce the amount of noise I used the equaliser filter in order to allow the more common frequencies in human speech (between 300 and 3000Hz) to come through.

Certainly it's reduced some of the noise, as can be seen by comparing the spectrograms with each other. Although it did also make the speech sound a little tinny, so I went with slightly different settings to achieve what I think is a higher quality sound.

Reverb - Making it sound like it's in a church hall

Making the speech sound like it was recorded in a church hall is relatively easy in GoldWave. All you do is apply a reverb through Effects -> Reverb in the top menu.

I elected to use the above settings, starting with the Concert Hall preset and reducing the reverb time slightly as a Concert Hall tends to be larger.

Monday, 26 November 2012

Week 7 - Spectrum Filtering

This entry will look at how a sound can be manipulated by altering its spectrum.

First, let's create a sine wave. This can be done through the expression evaluator as done in the previous blog entry.

In this above image the sine wave is displayed at a very close zoom level. Close enough to make out the individual samples quite clearly. Although the wave form is jagged, it's very consistent and enough samples are created to faithfully recreate a sine wave at 1KHz. This will be important later, as it's the wave form we intend to recover from noise.

To that end, we'll need some noise! You can generate white noise in Goldwave by once again using the expression evaluator, only of course this time choose "white noise" in the presets. It will add an expression like "rand(2)-1" to the expression field.

Pictured below is the white noise, as well as how it looks in the Spectrum Filter.

The filter on the X axis of the graph is logarithmic of course, so what's seen is a roughly equal spread of frequencies for the generated wave.

Mixing

Now to mix this with the 1KHz waveform we generated earlier.. We do this by going to Edit -> Mix.

Note that you'll need two wave forms open at once in order for Mix to be selectable. The result of mixing the waves, as well as the resultant view in the Spectrum Filter can be seen below.

There is a clear spike in the 1KHz region, for fairly obvious reasons. It's also quite obvious in the spectrogram below:

The 1KHz pure tone stands out as a bright green line among the blue white noise.

Restoring the tone - Equalizer

It's possible to largely restore the original tone, the rest of this entry will be concerning itself with methods of how this might be achieved.

This first attempt uses an equalizer to do the job. All sliders have been turned down to -24 apart from the 1KHz slider. The result can be clearly seen in the spectrogram on the far right, as the white noise has gone from blue to purple indicating that the intensity of the noise is much less than it was. It's still there, however ad as the equalizer doesn't cover precise ranges the noise in the immediate vicinity of the pure tone is still quite prominent.

When we look at the wave form up close, while it does resemble a sine curve overall there is a lot of jitteryness in it. These variations are not consistent at all, so the white noise will be quite audible still.

Restoring the tone - Spectrum Filter

The spectrum filter can also be used for this task. This is achieved by altering the yellow line in the filter, which can boost or reduce areas of the spectrum. Clicking on the line will create an anchor, which allows for specific parts of the spectrum to be altered in different ways.

In this case, anchors are placed and dragged to form something like a triangle centered on the tone that's to be recovered from the noise.

The result is quite clear in the spectrum, the noise isn't even visible any more. Let's see what effect this has had on the waveform up close.

There is still some jittery-ness in the waveform, but it's much more uniform than it was post equalizer.

Conclusion

There are a number of ways that a digital wave can be manipulated in order to either pick out particular tones or to do the opposite and remove frequencies from the spectrum. It can be use to remove unwanted noise or frequencies that may not be considered relevant or needed for a given waveform.

Week 6 - Harmonics

This post will be exploring the effects of applying harmonics to a given waveform, as well as generating wave forms from mathematics.

For the purpose of this experiment we'll need to work on a very small waveform of only a few cycles in length. Far too short a time period for anyone to hear it, so the sample must be played on a loop. You can do this by clicking on the button indicated below in the control window.

This will open up the window displayed below. Set it as pictured and it'll loop away on playback with the yellow playback button.

Creating a Waveform

Now that we've ensured we'll be able to appreciate any waveform we've created, let's make a waveform. To do this, we will need to use the expression evaluator. You can find it in the menu pictured below.

Once in it, you can just put an equation into the expression field and make a wave based on that, but you can also choose from a list of presets that will generate a formula for you. For our purposes we want a sine wave at 500Hz. Clicking on it in the presets will give us the result below.

With that entered, click OK and the waveform will be generated. The resultant wave is pictured below

The expression evaluator can be used to add multiple harmonics of a wave. The values entered into the expression field in the below screenshot add the 3rd, 5th, 7th and 9th harmonics to the same sine wave we created before.

The resultant waveform is pictured below.

What was a collection of sine waves is starting to strongly resemble a square wave.

The wave's shape differs depending on the harmonics used. This next wave form is built up from the 2nd, 3rd, 4th and 5th harmonics of the original 500Hz sine wave.

The result strong resembles a sawtooth waveform.

Wednesday, 31 October 2012

Week 5 - The Human Ear

The human ear, somewhat important with regard to audio as it's what allows us to sense sound at all. It is comprised of multiple parts, which I'll go into brief detail about in this post.

The ear itself is as complicated as you may expect for anything related to biology, but it can be broken down to three major areas; the inner ear, middle ear and outer ear.

Outer Ear

The outer ear is the visible part from the outside with the most prominent feature being the pinna. The pinna acts as a funnel, which focuses sound down to the auditory canal, essentially amplifying the sound. The shapes of the ear lobe are not an accident, the shape filters the incoming sound allowing the frequencies that humans are most concerned with (the ranges in which human speech lay) to pass into the ear canal more easily.

The main purpose of the ear canal is actually to provide defensive measures against infection, but it also acts as a passageway for sound to travel through in order to strike the eardrum. Much like a digital sound system, the brain can't comprehend sound from sound waves. Like digital systems, it needs sound to be convered into a form it can process, although unlike digital systems this form is still an analogue form. The ear drum, also known as the tempanic membrane, is the only part of the ear that deals with actual sound waves. It
s vibrated by them. This converts the sound into a form that the middle ear then has to deal with.

The Middle Ear

The middle ear consists of a mechanism made up of the three smallest bones in the human body. These three bones are the malleus (hammer), incus (anvil) and stapes (stirrup). The purpose of this mechanism, along with the tempanic membrane, is to convert sound waves into a mechanical form which is then passed on to the inner ear.

So why is this step even necessary? It's due to the nature of the inner ear. The inner is filled with fluid, while the middle and outer ears are usually filled with air. A sound travelling through air will be largely reflected off the surface of something much denser, such as a liquid. The result is very little of the sound makes it into the liquid itself. The middle ear circumvents this by directly stimulating the inner ear with mechanical vibrations.

The Inner Ear

The inner ear consists of the cochlea, whose task it is to turn the mechanical vibrations from the middle ear into electrical signals that are sent to the brain to be processed. It also needs to be able to transmit different frequencies along different parts of the nerves in order for the brain to quickly discern frequencies.

The basilair membrane, which resides inside the cochlea

The main component that achieves this effect is the basilair membrane. It is a non-uniform membrane which varies in two ways from base to apex; the base is thinner and stiffer than the apex. This allows vibrations of differing frequencies to vibrate the membrane in different ways, which in turn allows it to stimulate different arrays of tiny hairs which in turn stimulate different nerve endings. The base vibrates stronger with higher frequencies. The whole membrane actually vibrates at lower frequencies, but the peak of the vibrations occur towards the apex, allowing the actual frequency to be determined.

Below is a series of images roughly demonstrating the response of the membrane to different frequencies. The coloured shape represents the size of the vibrations at given points.

This is the physical limiter to the range of frequencies that we can hear. If the frequency is too high for any part of the membrane to vibrate, or too low to determine where the peak is, then we simply won't hear the sound.

To pick out multiple frequencies from a given wave is easy, as the brain can discern where each of the peaks reside on the membrane.

The electrical signals generated from this are then processed by the brain as sound.

http://library.thinkquest.org/05aug/00386/hearing/index.htm
http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398672&section=2.1
http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398672&section=3.3

Wednesday, 24 October 2012

Week 4 - Goldwave

This entry will be discussing the use of Goldwave and its various features.

To start off, here is an image of the UI with a sound file loaded into it.

Loading a file into Gold Wave is as you would expect for most software in Windows, by simply clicking on File -> Open and then navigating to the appropriate file. The wave form is displayed in the center in green.

The main display of the sound is little more than a simple chart, with an X and Y axis. In this case, the X axis represents time in hours, minutes and seconds and the Y-axis is a normalized scale ranging from -1.0 to 1.0. In this situation, normalization refers to putting values from one scale into another scale, in this case from 1 to -1 (http://mathforum.org/library/drmath/view/60433.html). You can change what the X and Y axis measure with by clicking Options -> Window. Changes that you make to the Y axis will not alter the waveform (even in the display), only the units that the amplitude is measured with.

This is the control window, which is used for playback purposes. The green play button will play the entire sound from the current position of the cursor. The yellow play button will play from the beginning to the end of the currently selected portion of the waveform, regardless of the position of the cursor.

This is simply showing what a selected portion looks like. To highlight a part of the sound, simply left click and drag over the portion of the sound you wish to focus on. The yellow playback button will only concern itself with this portion of the file until you de-select it. This is also necessary if you want any edits from simple ones (cut, paste) to more complicated ones (altering amplitude, filters) to only affect a given part of a sound, instead of the entire thing.

You can also save the selection through File -> Save Selection As.

This portion is for control and monitoring of the sound. The top bar is Volume, which is self explanatory. The second is balance, which is to say how much to split the volume between Left and Right. Center is balanced. The third bar is speed, which is again self explanatory. These are all things that can alter the sound while it's playing back, without actually editing the sound itself.

The final bar is the VU meter (Volume Unit meter). This concerns itself with the loudness of the sound at present, the bar itself will push a line which has a delay before it descends back to the left whose purpose is to indicate the peak loudness of the sound. Red indicators are at the left of the bar and will light up if a wave's amplitude is so large that clipping occurs (which means information is lost).

This selection of buttons in the main toolbar control zoom levels. The left-most is "View All", which will fit the entire sound into the available window space. The second button, "View Selection", zooms until the current selection fits within the available window space. Previous Zoom just returns to the last zoom setting. Zoom in and Zoom out are self explanatory. Zoom 1:1 will zoom the view up so that each pixel on the screen represents 1 sample. (http://hbkurs.gymnasium.sundsvall.se/HB_05_Le_mum1210/lj87togr/Projekt%20X/GoldWave/GoldWave.htm#Zoom 1:1).

Effects of Inverting and Reversing

The Effect menu has a number of functions in it, but this section will only concern itself with the effects of two of them. A particular part of the wave form is selected for the purpose of this.

After applying the invert effect (Effect -> Invert), the wave now looks like this:

Visually, the wave is quite different. However, it doesn't actually sound any different. This is because all invert does is alter the phase of the wave by 180 degrees. The order of the fluctuations in frequency is still the same however, so the resultant sound is completely unaltered.

After applying the reverse effect (Effect -> Reverse):

Now of course the sound is completely altered, as the fluctuations in frequency have the inverse order to the original wave.

Attenuation

For this section I'll be editing the below pictured part of the wave form. The sample is the word "equality".

Chosen as there is a very clear difference in the amplitudes of various sections, the middle is far larger than either side.

It can be altered to be as large as the other parts through the Effect menu, namely through Effect -> Volume -> Change Volume. So, selecting that part of the wave and then bringing up the menu:

Some relevant values:

-6dB reduces the sound by 50.12%
Increasing it by 3dB increases the sound by 141.25%
Increading it by 6dB inreases the sound by 199.53%

It's close enough to determine that a 6dB rule applies here, where a sound's volume doubles every 6dB gain and halves every 6dB reduction.

So let's reduce the wave to match the rest of the waveform's level close enough.

There is an audible difference, but it's only slight. This could be useful for making sure a given wave form has a consistent loudness level, which might be necessary when mixing it with other sounds.

So how would I go about bringing this sound to the maximum loudness level possible while avoiding clipping? I could manually adjust the volume over and over until I get it right, or I could just go to Effect -> Volume -> Maximize Volume. This will raise the volume for the selection until the wave with the highest amplitude hits maximum. The result is the loudest this sound can get before information would be lost due to clipping.

The result: