LABORATORY2019INTRODUCTION TO SOUND PROCESSING IN COCHLEARIMPLANTSIntroduction:Building an implantable bionic device is really only part of the work that must be done in order forpatients to gain benefit from the device. The question of what to do with the implant once it isimplanted is extraordinarily important. This is true for all implantable bionics, irrespective of theirapplication. In this introductory laboratory, we will explore sound processing for an auditoryneuroprostheses such as the cochlear implant.For many years, multi channel cochlear implants for the hearing impaired have remained largelythe same. Today they are more sophisticated, and have more features such as reverse telemetry andmechanisms to record the neural responses they elicit, but they remain with the same number ofstimulation channels as they had two decades ago. Electrically, they behave almost unchanged intheir stimulus waveforms.In stark contrast to the electrical behaviour of cochlear implants, the nature of how this electricalbehaviour is utilised has changed significantly over this same period. Indeed the same 20 year-oldimplanted device can now be instructed to give electrical stimulation that delivers information tothe auditory system with truly remarkable results. Where once the ability to converse over thetelephone using a cochlear implant was rare, it is now commonplace. The changes that have takenplace are largely to do with sound processing that happens within the speech processor. Anintroduction to this vast field of research is the subject of this laboratory – that is, how does onetake sound and turn it into electrical stimulation patterns? This week, Dr Swanson gave us somereally good insights on how a normal ear processes sounds and how a cochlear implant tries tomimic the various elements of processing (i.e. filtering, loudness etc). In this lab we willendeavour to conduct some of that processing. Understanding the concepts of sound processing aswell as the different strategies to do so will significantly help you complete the assignment.Part A: Exploring FrequenciesSampling and the Nyquist Rate:The healthy, young, human ear can resolve frequencies within a range of roughly 20 to 20,000cycles per second (Hertz or Hz). As we grow older, our ability to hear the extremes of this rangediminish.Students of mathematics will recall the so-called ‘fourier transform’ which allows for thetransformation of time-domain signals such as a ‘.WAV’ sound recording which, if plotted, wouldtypically be expressed in terms of amplitude and time in the time-domain. A frequency-domainplot will show how much energy within a signal (e.g. the same ‘.WAV’ file as before) comes froma particular frequency or frequencies.For a pure frequency (e.g. a sinusoid at a particular, constant frequency), the frequency-domainplot will show only one frequency as the contributor to the signal. A sound such as the word,“Hello” will contain many frequencies simultaneously and is a combination of multiple purefrequencies occuring at once.Page 1Procedure:Open Matlab and write a script to produce a pure, 5 Hz signal that lasts for two seconds. Note, whilethe code has been mostly provided to you in this lab, it is imperative that you understand howeach piece of code functions.% Hz% signal frequencyfrequency = 5;% duration of the signal
duration = 2;
% Sampling frequency (the number of datapoints per second)
fs = 10;
% Set up a time vector with the correct number of data points
time = [0:(duration*fs)-1];
% x is some signal, in this case it is a frequency Hz cosine.
x = cos(frequency*2*pi*time/fs);
plot(time/fs,x,’-x’),title(buffer), axis([0 2 -1 1]);
title(‘TIME DOMAIN PLOT OF PURE FREQUENCY SIGNAL’);
Save this plot for safe keeping. You should do this with all plots you generate within this lab
(once you’ve finalised their form).
WARNING: You will not be warned when overwriting existing files! (except for right now).
1. Does the plot look like a sinusoid? If not, what can be changed to make it look right?
2. Why is fs = 10?
3. Plot, save and upload the same sinusoid when fs = 5, 9, 11, and 50.
As always, save all your plots, and note all your observations in your lab book.
What you have observed in the exercise above, highlights the Nyquist Rate – that is, the rate at
which one must sample in order to unambiguously resolve all elements of the signal (cycles in this
case). The Nyquist rate is twice the frequency of a given signal (or the highest frequency of the
signal). Note that while the sinusoidal shape is lost, each cycle was saved in the fs = 10 plot. In the
fs = 5 plot, all cycles were lost. Sampling at the Nyquist rate doesn’t guarantee a reconstruction of
the signal, it only guarantees that you won’t miss the the number of cycles. In fact, unless the
sampling frequency is infinite, any attempts at reconstruction of an original, non-trivial waveform
will result only in an approximation of that signal. In practice, it is beneficial to sample at a rate at
least 2.5 times the highest frequency of interest to ensure adequate reconstruction.
As noted above, a normally-hearing human can hear frequencies between (roughly) 20 and 20,000
Hz. In order to be able to capture the 20 kHz frequencies, the sound must be sampled at (at least)
40 kHz. Ever wonder why compressed music files (e.g. ‘MP3s’) are often sampled at 44 kHz?
However. speech signals tend to only only contain frequencies up tp ~8,000Hz. This is why the
cochlear implant filter bank does not need to sample all frequencies of human hearing.
If the sinusoid we made above were to be fed into a speaker, we probably wouldn’t be able to hear
it because it is sub-sonic. We will now make a sound that we (hopefully) can hear.
Page 2Procedure:Page 31. Modify your script above to produce a signal of two seconds duration at a pure frequency of 4kHz. Sample it at 44 kHz. Plot and save the first 0.005 s.2. Save the sound as a ‘.WAV’ file.HINT: audiowrite3. Play the sound in a media player or equivalent – try not to annoy your neighbours.4. Find the frequency spectrum by taking the absolute value of the fourier transform of the soundvector x and plot it.HINT: fftThe fast fourier transform (FFT) produces real and imaginary components. As such, themagnitude of the signal (the square root of the sum of the squares of the real and imaginaryparts of the FFT) is capable of being plotted on a simple 2D plot. Plotting as-is will produce aplot with units of “FFT magnitude” and “sample number”. Since “sample number” isn’tparticularly meaningful, this should be changed to something more sensible. What the FFTreally contains is the contribution each frequency makes towards the overall sound. Therefore,scaling the horizontal axis to read “frequency” instead of “sample number” makes a great dealmore sense. The sampling frequency, ‘fs’ has units of samples per second. Each element of theFFT has units of cycles per sample. Multiplying these causes “samples” to cancel, leavingbehind cycles per second or Hz.frequency_range = fs * [0 : length(x)-1]/length(x);plot(frequency_range,fft_x);title(‘FREQUENCY DOMAIN PLOT OF PURE FREQUENCY SIGNAL’);xlabel(‘Frequency (Hz)’);ylabel(‘FFT Magnitude’);Finally, the only part of the frequency range that is of interest is the Nyquist Range, so wetruncate the plot at half the sampling frequency, and set the maximum amplitude to the maximumof fft_x.axis([0 fs/2 0 max(fft_x)]);5. Plot and save the frequency spectrum.Questions/Post Processing:1. What is the peak frequency of the sound?2. Why are there no other frequencies present?Part B: FormantsWhat is a formant?The Fourier analysis illustrated in the sections above is problematic in capturing the relevantfeatures of speech because it not only requires a relatively large sample, but it also fails to findtime-varying parameters that make up the subtleties of speech. Therefore, simply extracting thepeak frequencies of a given signal provides insufficient information to represent the original soundunless it is a pure sound as we have seen above. Another way is needed.If one were to construct a model of the human speech system, the logical approach would be to finda source of vibration to represent the glottis – an electronic buzzer for instance, and place thisbuzzer at one end of a tube to represent the vocal tract. The tube would ideally be of a soft andbendable material so that its geometry could be subtly changed to represent movements of themouth, tongue, and lips. The buzzer would ideally have the capacity to change its intensity andfrequency to replicate the glottis’ capabilities.As the buzzer “buzzes” and the sound is emitted from the tube, there are resonant frequencieswhich characterise the sound. These resonant frequencies are referred to as formants. The waywe’ll use to discover the formants of a sound is by way of a filtering technique called linearprediction which, conveniently, is contained within the lpc() function available in Matlab. Sufficefor the purposes of this laboratory, we shall use this function as a tool to find the formants and notdelve further into the issue. Should you wish to explore speech analysis in more detail, a morethorough study of linear prediction methods would be a good place to start .The vowel sound “a” (as in “say”) is shown in the figure above. In the upper panel, the timedomain signal is shown. In the lower panel, the frequency-domain signal (FFT) is shown (thejagged signal with many peaks and troughs) along with a smooth, linear prediction coefficient filterplot with some distinctive peaks at 350.4, 951.3, 2626.8, 3253.7, 4423.7, etc. and another at about 9kHz. These peaks (some of which are rather subtle) are known as the formant frequencies and arereferred to as f1, f2, etc.The essence of the sound can be extracted from these formants and it is these formants that formthe basis of speech processing strategies that have allowed cochlear implants to become sosuccessful.Page 4Recall that cochlear implants are ‘tonotopically mapped’ – thatis, the position of each electrode within the cochleacorresponds to a particular frequency range. In the frequencyspectrum on the previous page, the horizontal axis could bedivided into several ranges, each range (with some overlapbetween them) would represent the location and therefore thefrequency range covered by a single electrode within thecochlea. While modern speech processing techniques aremuch more sophisticated than what we will explore here, thebasics are:1. formants correspond to electrodes;2. by determining the formants of a given sound, the formants may be ‘mapped’ toindividual electrodes for stimulation;3. the charge to be delivered by the stimulus is determined by the amplitude of theenvelope of the original signal such as in the CIS strategy or the energy of the signalpassed by each filter bank when choosing a subset of banks (SPEAK/ACE).Procedure:1. On the in the laboratories section, there are several sounds available for you to use in “.wav”format. Choose one (or more) of these for use in exploring formants in the next steps of thislaboratory.2. Within the same section in another folder, there are some .m files. These are the filesnecessary to avoid the need for specialised toolboxes to do the sound processing. These filesare distributed under the GNU license, and the authors are credited within the code. The filesare contained within a zip file – unzip the file in to the working directory you’re using forMatlab.3. Build a new “.m” file to plot the sound in the time-domain and its FFT. To read a ‘.wav’ filesound, use the following.HINTS:% clear memory in case matrices are already defined – very important!!!clear;% read the .wav file; x = waveform vector; fs = sampling frequency[x, fs] = audioread(‘filename.wav’);% Make sure that the file is in the correct format before proceeding[n, nChan]=size(x);if nChan > 1fprintf(‘The type of the wave file must be mono (not stereo)n’);fprintf(‘Converting matrix to mono now …n’)x(:,2) = ;endQuestions/Post Processing:Save the plot.Page 5PART C: EpochsThe plot generated in the last section shows the time-domain plot for the entire sound. As sound isa dynamic flow of changing frequencies over time, we cannot represent an entire sound all at oncewith the formants and expect to glean much (or any) useful information from it. As such, we needto consider small time-steps or epochs or frames of a given sound, find the formants of each epoch,and deliver the important features of the sound within each epoch to a speaker or a cochlear implantin a step-wise fashion.The sounds available are typically two seconds in duration. If we (somewhat arbitrarily) choose 20ms to be our epoch, then we will have 100 epochs within our two-second sound. The objective ofthis section of the laboratory is to divide the sound into 100 epochs, determine the formantfrequencies for each epoch, and reconstruct the sound using only formant frequencies. Note that fora two second sound sampled at 44 kHz, each epoch will have 880 samples. As all this is durationdependent, the number of samples per epoch might differ between the different sounds.Building upon the .m file you started above, break up the sound into 100 epochs by adding thefollowing to your .m file:% How many epochs?N=100 ;% How many datapoints in each epoch?points_per_epoch = length(x)/N ;% Ditch any decimals if length(x)/N is not an integer!points_per_epoch = floor(points_per_epoch) ;for i=1:N% get an epochepoch(:,i) = x(((i-1)*points_per_epoch)+1:i*points_per_epoch);endPage 6PART D: Formants of the EpochsNow that we have 100 epochs, we need to find the formants of each epoch so that these can beused in the formation of electrical stimulation patterns to be delivered (preferably in real time) tothe implant and be recognised as part of the longer sound.The details of the following code are beyond the scope of this laboratory, but suffice to say that itwill fit a curve to the spectral peaks of the sound contained within each epoch, and find theformants. To do this, add the following to your .m file.PART E: Loudness of the EpochsAt this point, the formants for each epoch are stored in the matrix: ffreq. It is now possible togenerate sounds based upon these pure formant frequencies but first we need to know how loudeach epoch should be. There are numerous ways to do this, some more sophisticated than others,but suffice for the purposes of this laboratory, we will determine the loudness based on the loudestsound within a given epoch of the original sound. Modify the .m file to include the following. Theappropriate location to place this bit of code within the .m file is left to you to determine.amplitude(i) = max(workingepochham);% Find the formants%% The following is a ‘rule of thumb’ of formant estimationnumber_of_coefficients = floor(2+fs/1000);for i=1:N%look at each epoch and if it contains all zeros then disregardworkingepoch = epoch(:,i);if max(workingepoch)==0ffreq(i,j) = 0i=i+1;else%apply hamming window to get rid of high frequency artifactsworkingepochham = workingepoch.*hamming(length(workingepoch));%find linear prediction coefficientsspectral_peaks = lpc(workingepochham,number_of_coefficients);r=roots(spectral_peaks); % find roots of this polynomialr=r(imag(r)>0.01); % only look for + roots up to fs/2% convert the complex roots to Hz and sort from low to highformants=sort(atan2(imag(r),real(r))*fs/(2*pi));% print first five formants (the rest are still stored in formants)for j=1:5ffreq(i,j) = formants(j); % save for laterfprintf(‘Epoch(%d), Formant(%d) =%.1f Hzn’,i,j,formants(j));endendAfter debugging, one may wish to avoid the formants from being printed. It is safe to comment outthe fprintf line above by placing a % symbol at the start of the line.PART F: Building the soundsThe formants we have found above represent pure frequencies. Using just one of the formants tobuild a sound will of course present itself as an epoch of a pure frequency followed by anotherepoch of another pure frequency – and so-on until the end of the sound. Building this sound israther simple. Recalling that N is the number of epochs in the sound.Page 7endendAt this point we have N, pure frequency sounds. Adding these together in series will approximatethe original sound using only the first formant frequency. This can be done as follows:% Add all the sounds together in series to build a new sound, 2s long, % based only on the pure formantfrequencieswave1 = 0; % an empty matrix that we’ll add to below% Piece together all the epochs into one soundfor j=1:Nwave1 = [wave1 sound1(j,:)];end% Save the sound to disk for subsequent playingaudiowrite(‘f1.wav’,wave1′,fs); % x’ = transpose of x (rows vs cols)Play the sound in a media player. Depending upon the nature of the original sound (e.g. whether ornot it contained vowel or consonant sounds, etc.) the approximated sound using only the firstformant may sound terrible, and bear little resemblance to the original. However, if we were toadd another formant to the sound – that is, another pure frequency to each epoch, the sound mayimprove. Adding a second formant to the sound is rather simple:sound2(i,:) = amplitude(i) * (sin(2*pi*ffreq(i,1)*time) + sin(2*pi*ffreq(i,2)*time)));This produces a sound containing the first two formants. Modify the code in this section to buildfive sounds (sound1, sound2, …sound5) and save these sounds as f1.wav, f1f2.wav,…f1f2f3f4f5.wav.Questions/Post Processing:1. Play the sounds starting with the f1 sound, and working up to the f1f2f3f4f5 sound.2. What do you observe as you increase the number of formants?3. How many formants are needed before the sound is decipherable?4. Sounds unintelligible even with all 5 formants? Try doubling the number of epochs.5. Use a microphone and the in-built windows sound recorder (or your mobile phone) to recordyour own two-second sound bytes. Try your own sounds with different vowels, sharpconsonants (e.g. “bye-bye”) and make observations on these sounds and the processedsounds that come about as a result of the above.6. Contribute sounds to the course. If you speak other languages, please contribute two-secondsound bytes in these languages. Note that tonal languages such as Chinese Mandarin areparticularly problematic for cochlear implant sound processing. Can you comment on whythis is the case?Page 8for i=1:N% time of each sampletime=(0:length(workingepochham)-1)/fs; % sampling times % Construct a signalusing f1 only to represent this epochsound1(i,:) = amplitude(i) * sin(2 * pi * ffreq(i,1) * time);● stimulation of electrode 1 elicits sounds in the rage of 50 to 750 Hz● stimulation of electrode 2 elicits sounds in the range of 500 to 1700 Hz● stimulation of electrode 3 elicits sounds in the range of 1500 to 4000 Hz● stimulation of electrode 4 elicits sounds in the range of 3500 to 11,000 HzConsider the sounds generated above, each containing epochs of 1, 2, 3, or 4 frequencies thatrepresent the formants of the sound. These formant frequencies can be used to determine whichone, two, three or four electrodes should be used in the delivery of electrical stimulation. Whilethere are several approaches that can be taken, one is to fix a pulse width, and vary only thestimulus amplitude in order to convey the sound.Questions/Post Processing:1. From what you now know about formant frequencies and sound processing, write a Matlabscript that takes a two second .wav file, breaks it up into 100 epochs, finds the formantfrequencies of each epoch, and determines which electrodes to use to stimulate the patientwhose frequency data is described above. The output of the file should be a list ofstimulation of the form:PULSEWIDTH, INTER-PHASE DELAY, INTER-STIMULUS DELAYELECTRODE, AMPLITUDEELECTRODE, AMPLITUDE…ELECTRODE, AMPLITUDEPage 9PART G: Electrical StimulationIf an electrode array comprising four electrodes were to be implanted into a deaf person’s cochlea,it has the capacity to restore useful auditory sensations.From what has been discovered above, we have most of the necessary information to send electricalstimulation to the implant.Recalling the mapping of electrode position to frequency (or a range of frequencies) in the cochlea,we can expect that this mapping varies from patient to patient based upon the depth of insertion ofthe electrode, and several other factors. Also, the sensation of some frequencies will be elicitedfrom more than one electrode (perhaps those associated with positions within the cochlea that liebetween two electrodes. For a particular patient, let us assume that some experimentation hasdetermined that: Atal, B. S. and Hanauer, S. L. (1971) “Speech Analysis and Synthesis by Linear Prediction ofthe Speech Wave”, J. Acoust. Soc. Am., 50, 637-655. J.D.Markel and A.H.Gray, 1976, Linear prediction of speech, Springer-Verlag, Berlin, 1976Page 102. From the list of stimulation parameters, produce an “electrodogram” (time on the verticalaxis, electrode on the horizontal axis, amplitude colour encoded). This will require someprogramming, but the extensive hints shown here should help.HINTS: Produce a vector of zeros and call it something (e.g. stimulus):stimulus = zeros(N, Etrodes);where N is the number of epochs, Etrodes is the number of electrodes.Using if or switch/case statements, determine which electrode(s) will deliver the sound to thecochlear implant electrodes during each epoch. For instance, if the formant frequency f1 is 572Hz, then the first electrode will be stimulated. Remember that each electrode covers a range offrequencies and more than one formant could require stimulus from the same electrode and agiven formant may require representation on more than one electrode. If f1 = 572 Hz in epoch 10,following change would need to be made to the stimulus vector:stimulus(10, 1) = amplitude(10);If f2 in epoch 10 was 720 Hz, then the same electrode would be stimulated. If f3 in epoch 10 was1511 Hz, then,stimulus(10, 2) = amplitude(10);And so on up to the fourth formant. Some epochs will have one, some will have two, others three orfour electrodes that will need to be stimulated in order to generate the sound.The command to produce the electrodogram plot is imagesc. For the above example:imagesc([1,N], [1:Etrodes], (stimulus’));Should do the trick. Remember to label your axes, save your plot in your notes.colorbar;Adds an elegant touch to the plot.SHOW THE DEMONSTRATION OF YOUR CODE TO A STAFF MEMBER BEFOREYOU LEAVEReferences:
Assignment status: Already Solved By Our Experts
(USA, AUS, UK & CA PhD. Writers)