Free Essay

Tonality Estimation Using Wavelet Packet Analysis

In:

Submitted By vib5252
Words 17208
Pages 69
UNIVERSITY OF MIAMI

TONALITY ESTIMATION USING WAVELET PACKET ANALYSIS

By Vaibhav Chhabra

A Research Project Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Master of Science

Coral Gables, Florida May 2005

UNIVERSITY OF MIAMI

A research project submitted in partial fulfillment of the requirements for the degree of Master of Science

TONALITY ESTIMATION USING WAVELET PACKET ANALYSIS

Vaibhav Chhabra

Approved:

________________ Ken Pohlmann Professor of Music Engineering

_________________ Dr. Edward Asmus Associate Dean of Graduate Studies

________________ Colby Leider Assistant Professor of Music Engineering

_________________ Dr. Paul Mermelstein Professor of Electrical Engineering

DEDICATION They say that one’s experience is what defines an individual. After all, you are what you are because of your experiences. On that note I would like to dedicate this work to all those who have contributed to my experience in this journey. For what I have learned has laid the foundation for what I will learn. I would also like to thank my family who has always been supportive of me, my brother Ruchir who is a natural send-master, Papa and Ma thanks for keeping the faith. All the Chacha’s, Chachi’s and cousins, thank you all for the support. Next on my thank you list are my Tae Kwon Do buddies. Sensei Jeff thanks for all of your advice, some day I’ll be teacher like you. Rico, training with you was an honor (congratulations on your black-belt). Nat (Ryu) sparring with you was almost like dancing, I’m sure you’ll be an awesome martial artist. Sensei Chikaco, Sensei Gerard, Sensei Kat, Lora, Lila, Becky, Erwin thank you all, I’ll miss you guys. Last and definitely not the least are my friends in Miami. Jon (silent warrior) for teaching me all his DSP jutsu’s, Vishu(2nd best table-tennis player at UM), Marc (water), Lindsey (Defense-Master), Rian (Gentoo), Becky(Mini Marina), Jess, Jose, Doug (me and my beer), Drew (Send-Master), Neri (p3), Tiina(Jo-master), Rob.R, Rob.B (adidasw), Kai(thanks for taking care of my knee), Joe Abbati (Rathskellar) ...

Rene Descartes - “I think therefore I exist” Vaibhav Chhabra (Meno) - “I exist therefore I oscillate” SEND WITH RESPECT ^_^

ACKNOWLEDGMENT I would like to especially thank my advisor and mentor, Ken Pohlmann, who has generously given of his precious time and provided me with several great opportunities during my time at UM. I would also like to thank my thesis committee for being patient in spending time with me and guiding me at time when I needed them the most. I hope we keep in touch and have opportunities to work again.

Table of Contents
Chapter 1: Introduction......................................................................................................................................................................1 1.1 Masking.................................................................................................................................................................1
1.1.1 Threshold of Masking...............................................................................................................................2

1.2 Tonality..................................................................................................................................................................3
1.2.1 Common Tonality Classification Methods................................................................................................4 1.2.2 Typical Psychoacoustic Model....................................................................................................................4 1.2.3 Psychoacoustic Model 1...............................................................................................................................5 1.2.4 Psychoacoustic Model 2...............................................................................................................................8

Chapter 2: Signal Representation................................................................................................................................................12 2.1 Periodic Signals...............................................................................................................................................12 2.2 Fourier Series....................................................................................................................................................13 2.3 Fourier Transform...........................................................................................................................................14
2.3.1 Fourier Transform Derivation...................................................................................................................15 2.3.2 Dirac Delta Function...................................................................................................................................17 2.3.3 Fourier Coefficients.....................................................................................................................................18 2.3.4 Fourier Coefficients Derivation................................................................................................................19

2.4 Hilbert Transform...........................................................................................................................................22
2.4.1 Analytic Signal.............................................................................................................................................22

2.4.2 Hilbert Transform Theory....................................................................................................................23 2.4.3 Phase Rotation..........................................................................................................................................24
2.4.3 Complex Envelope......................................................................................................................................26 2.4.4 Advantages of the Complex Envelope....................................................................................................27

2.5 Summary.............................................................................................................................................................30 Chapter 3: Time to Frequency Mapping.................................................................................................................................31 3.1 Quadrature Mirror Filter (QMF)...............................................................................................................31 3.2 Aliasing and Imaging.....................................................................................................................................32 3.3 Distortion Transfer Function.......................................................................................................................34

i

3.4 Polyphase Decomposition............................................................................................................................34
3.4.1 Perfect Reconstruction................................................................................................................................35

3.5 Paraunitary Property......................................................................................................................................36
3.5.1 Unitary Matrix..............................................................................................................................................36

3.6 Summary.............................................................................................................................................................37
3.6.1 Advantages of Paraunitary Filter Banks..................................................................................................37

Chapter 4: Short-Time Fourier Transform (STFT)............................................................................................................38 4.1 Analysis of the STFT equation..................................................................................................................39 4.2 STFT as a Bank of Filters............................................................................................................................40 4.3 Effects of Windowing....................................................................................................................................41
4.3.1 Choice of the Best Window.......................................................................................................................42

4.4 Summary.............................................................................................................................................................43 Chapter 5: The Wavelet Transform..........................................................................................................................................44 5.1 Weakness of the STFT..................................................................................................................................44 5.2 STFT to Wavelets...........................................................................................................................................45
5.2.1 Modifications on the STFT........................................................................................................................46

5.3 Inverse Wavelet Transform.........................................................................................................................48 5.4 Orthonormal Basis..........................................................................................................................................49 5.5 Wavelet Packet Analysis..............................................................................................................................50
5.5.1 Discrete Wavelet Transform......................................................................................................................51

5.6 Wavelet Packet Tree Representation.......................................................................................................52
5.6.1 Energy Representation................................................................................................................................52 5.6.2 Index Representation...................................................................................................................................53 5.6.3 Filterbank Representation..........................................................................................................................54

Chapter 6: Analysis and Results..................................................................................................................................................56 6.1 Detection Scheme...........................................................................................................................................57
6.1.1 Frequency Breakdown................................................................................................................................59 6.1.2 Detector Pseudocode Methodology.........................................................................................................60 6.1.3 Detection Process.........................................................................................................................................61

6.2 Node Reconstruction......................................................................................................................................63

ii

6.3 Tonality Estimation........................................................................................................................................64
6.3.1 Auto-Correlation Function.........................................................................................................................65 6.3.2 Auto-Covariance..........................................................................................................................................67 6.3.3 Type-I Analysis............................................................................................................................................68 6.3.4 Type-II Analysis..........................................................................................................................................69

6.4 Tonality Index (Time-Domain)..................................................................................................................73 6.5 Tonality Index (Frequency-Domain).......................................................................................................75
6.5.1 Comparison with Model 2..........................................................................................................................76

Chapter 7: Conclusions and Recommendations.....................................................................................................................81 References............................................................................................................................................................................................83 Appendix...............................................................................................................................................................................................85

iii

List of Figures:
Figure 1.1: General block diagram of a perceptual coder........................................................................................3 Figure 1.2: General block diagram of a psychoacoustic model............................................................................5 Figure 1.3: Tonal components identified in Model 1..................................................................................................6 Figure 1.4: Maskers Decimation in Model 1...................................................................................................................7 Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1.............................................................................8 Figure 1.6: Example of a predicted masking threshold for a masker...............................................................10 Figure 1.7: General block diagram of Model 2............................................................................................................10 Figure 2.1: A continuous-time signal.................................................................................................................................12 Figure 2.2: Continuous-time sinusoidal signal.............................................................................................................13 Figure 2.3: Discrete-time unit impulse (sample).........................................................................................................17 Figure 2.4: The Dirac Delta Function................................................................................................................................18 Figure 2.5: Dirac Delta in Time-Domain.........................................................................................................................18 Figure 2.6: Dirac Delta in Frequency-Domain.............................................................................................................18 Figure 2.7: Frequency Response of Rectangular Pulse............................................................................................19 Figure 2.8: Periodic Square Wave.......................................................................................................................................20 Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave...............................................................21 Figure 2.10: Cosine Wave Properties.................................................................................................................................24 Figure 2.11: Sine Wave Properties......................................................................................................................................24 Figure 2.12: Rotating Phasors to create a sine wave out of a cosine................................................................25 Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative frequencies by +90°......................................................................................................................................................................26 Figure 2.14: Spectral Properties of the Complex Exponential............................................................................26 Figure 2.15: Spectral Properties of the s(t).....................................................................................................................28 Figure 2.16: The Modulated Signal and its Envelope...............................................................................................28 Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal..........30 Figure 3.1: QMF filter-bank....................................................................................................................................................31 Figure 3.2: Aliasing......................................................................................................................................................................32 Figure 4.1: FFT Block Diagram............................................................................................................................................38 Figure 4.2: STFT Represented in terms of a Linear System.................................................................................40 Figure 4.3: Rearranged STFT Representation in terms of a Linear System................................................41 Figure 4.4: STFT viewed as a Filter-Bank......................................................................................................................41 Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples...........................................................42

iv

Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed function v(t)......................................................................................................................................................................................44 Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b)................47 Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©.......................................................................................48 Figure 5.4: 3-level Wavelet decomposition tree..........................................................................................................51 Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of discrete wavelet transform..............................................................................................................52 Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal.........................................................................................53 Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal.............................................................................................54 Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....54 Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....55 Figure 5.10: Discrete wavelet packet tree (analysis stage)....................................................................................55 Figure 6.1: General block diagram of the proposed model...................................................................................57 Figure 6.2: level-1 Wavelet Packet Decomposition of a signal having multiple tone (4kHz, 10kHz, 15kHz)................................................................................................................................................................................58 Figure 6.3: level-3 Wavelet Packet Decomposition of multiple tones (4kHz, 10kHz, 15kHz)........58 Figure 6.4: level-3 Wavelet Packet Index Tree and the Coefficients of the Terminal Nodes...........60 Figure 6.5: level-2 Wavelet Packet Energy Tree Detector Code Pointers....................................................60 Figure 6.6: level-2 Wavelet Index Tree used to trace the Nodes that are sent to the tonality analyzer...............................................................................................................................................................................................61 Figure 6.7: level-2 Wavelet Packet Energy Tree Detector Stage-I: nodes (4), (5) and (6) are analyzed first....................................................................................................................................................................................62 Figure 6.8: level-4 Wavelet Packet Energy Tree Detector Stage-II: nodes (4), (5) and (6) are analyzed first; green lines represent the nodes that are going to be analyzed by the tonality analyzer...............................................................................................................................................................................................62 Figure 6.9: level-4 Wavelet Packet Energy Tree Detector Stage-III: nodes (3) is analyzed; green lines represent the nodes that are going to be analyzed by the tonality analyzer......................................63 Figure 6.10 Wavelet Energy Tree: The white-arrows showing the two nodes used to calculate our tonality.................................................................................................................................................................................................64 Figure 6.11: Auto-correlation Function of a Pure Tone..........................................................................................65 Figure 6.12: Auto-correlation Function of White Noise.........................................................................................66 Figure 6.13: Auto-correlation Function of Band limited Noise (0-22kHz)..................................................66

v

Figure 6.14: Energy Tree, where the blue lines represent the nodes from which the tonality value is calculated.......................................................................................................................................................................................68 Figure 6.15: A 4kHz tone with selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]......68 Figure 6.16: A 4kHz tone with -0.9dB white-noise added, selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of autocovariance [right figure]............................................................................................................................................................69 Figure 6.17: A Snare Crash......................................................................................................................................................70 Figure 6.18a: Auto-Covariance of White-Noise..........................................................................................................70 Figure 6.18b: Auto-Covariance of Band-limited 0-22kHz Noise......................................................................71 Figure 6.18c: Auto-Covariance of Pure-Tone (1kHz)..............................................................................................71 Figure 6.19: A Snare crash analysis(a) Wavelet Tree, (b) Auto-Covariance..............................................72 Figure 6.20: Snare Crash (Last Frame) Auto-Covariance......................................................................................72 Figure 6.21: Tonality Index (Time-Domain) with Input Signal consisting of 1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB............................................................................................................73 Figure 6.22: Time-Domain plot of test signal (1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB)...................................................................................................................................................................................74 Figure 6.23: Tonality Index (Time-Domain) with Input Signal consisting of white noise (power 20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)....................74 Figure 6.24: Time-Domain plot of test signal of white noise (power -20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)................................................................................75 Figure 6.25: Frequency Map of Wavelet Tree: The red arrows represent the generated path which consist of an array of nodes from which the last node value are taken (blue lines) to map................76 Figure 6.26a: Tonality Index – Model 2 (1kHz)..........................................................................................................77 Figure 6.26b: Tonality Index – Proposed Model (1kHz)........................................................................................77 Figure 6.27a: Tonality Index – Model 2 (4kHz)..........................................................................................................78 Figure 6.27b: Tonality Index – Proposed Model (4kHz)........................................................................................79 Figure 6.28a: Tonality Index – Model 2 (6kHz) .........................................................................................................79 Figure 6.28b: Tonality Index – Proposed Model (6kHz)........................................................................................80

vi

VAIBHAV, CHHABRA Tonality Estimation using Wavelets Packet Analysis

(M.S., Music Engineering Technology) (May 2005)

Abstract of a Master’s Research Project at the University of Miami. Research project supervised by Professor Ken Pohlmann. No. of pages in text: 124 Abstract: Perceptual audio coding is a novel approach to compress audio by taking advantage of models of the human auditory system also known as psychoacoustic models. The quality and efficiency of the encoding process depends highly on how these models accurately characterize the nature of the audio signal, in particular its tonality attributes. This paper explores various analysis techniques using wavelet packet tree decomposition to accurately estimate tonality by exploiting energy and statistical information. More specifically, the tonality estimation is based on the correlation information of the nodes and uses wavelets such as Haar and Daubechies 1 for decomposing the signal.

vii

Chapter 1: Introduction In recent years, several advancements have been made in the field of audio coding. One must realize that no matter how many advancements we make, the ultimate receiver of the analysis-coding-transmission-decoding-synthesis chain is the human auditory system. In fact, all perceptual audio coders or lossy audio coders solely rely on the exploitation of this system. A model based on this system, also known as the psychoacoustic model, exploits properties and tolerances of the human auditory system to remove irrelevant components of the audio signal which are those components that do not contribute to the auditory impression of the acoustic stimulus. Thus, these irrelevant components of the audio signal may be removed from the initial stages of the signal communication chain (analysis/coding) resulting in more information capacity, which can be used to code relevant audio components. This operation is called irrelevancy reduction and is based on the concept of masking. 1.1 Masking Masking refers to the total and relative inaudibility of one sound component due to the presence of another one [Ferreira, A.J.S, 1995], with particular relation to amplitude, frequency, time [Zwicker, E, Fastl, H, 1990] and space [Blauert, J, 1993]. There are two types of masking: frequency masking (simultaneous masking) and temporal masking. Frequency masking can also be looked as the excitation in the cochlea’s basilar membrane that prevents the detection of a weaker sound being excited in the same area in the basilar membrane, whereas temporal masking takes place without the presence of the masker and maskee. One might think of it as the auditory path delay

1

between auditory neuron to the brain or the time taken by the individual to give meaning to the auditory information. Among the usually considered aspects of masking, simultaneous masking is, by far, the most important source of irrelevancy reduction. It is because of this that most perceptual audio coders involve time to frequency domain mapping in the form of subband or a transform filterbank. 1.1.1 Threshold of Masking In the context of frequency domain audio coders, the masker is the input audio signal, containing coherent (tone-like) or incoherent (noise-like) components, and the maskee is the quantization noise. It should be noted that the ultimate goal of a perceptual coder is to generate a good estimate for the profile of the quantization noise that does not cause noticeable impairments when actually added to the original signal [Ferreira, A.J.S, 1995]. In other words, the noise profile, also called Threshold of Masking [Johnston, J.D,1988], should be optimally shaped in frequency, time and space in such a way that the quantization noise can be efficiently masked. Several studies [Moore, C.J.B, 1982][Hellman, R.P, Harvard University] reveal that the threshold of masking may vary substantially as a function of noise-like or tone-like nature of audio signals. As a consequence, this aspect has a significant influence on the quality and efficiency of the audio encoding process [Ferreira, A.J.S, 1995], as shown in Figure 1.1

2

digital in

Analysis filterbank

Quantization and coding

serial bitstream multiplexing

bit stream

Calculation of masking threshold based on psychoacoustics

Figure 1.1: General block diagram of a perceptual coder Introduction to Digital Audio Coding and Standards by Marina Bosi, Richard E. Goldberg

1.2 Tonality One of the key components in the psychoacoustic model is the calculating of tonality, whose values are used to calculate the Signal to Mask ratio which determines the absolute masking threshold of the input signal. Different values for masking have been reported in [Hellman, R.P, Harvard University 02138] for tone masking noise versus noise masking tone. In [Hellman, R.P, Harvard University 02138] and [Zwicker, E, Fastl, H, 1990] it is clear that a narrow band of noise masks a tone much more effectively than a tone masking it. In fact, the masking effects of tone and noise of equal intensity vary by 20dB. It is interesting to note that bandlimited noise with constant SPL and varying bandwidth flattens the masking function whereas increasing the SPL and keeping the bandwidth constant narrows the masking function [Hellman, R.P, Harvard University 02138]. In particular, a signal can be quantized using more or less bits according to its tonality properties, which emphasizes on the importance of accurately estimating tonality leading to improved bit allocation.

3

1.2.1 Common Tonality Classification Methods Tonality in most audio coders is generally evaluated by taking a short segment of audio samples (eg. 512 or 1024 samples) and making a spectral analysis, using for example a FFT (fast Fourier transform). For example power and phase evolution of each spectral component are examined, making it possible to infer the tonal behavior of the signal at different regions of the spectrum. An average tonality measure for the whole analyzed signal segment can be computed using the Spectral Flatness Measure (SFM). The SFM is defined as the ratio of the geometric mean of the power spectrum to the arithmetic mean of the power spectrum [Ferreira, A.J.S, 1995]. Once calculated, its values are converted to dB values with the reference set to SFMdBmax=-60dB as an estimate for tone-like signals. The SFMdB is finally converted to a tonality coefficient α whose values range from [0,1]. The lower values indicate a global noise-like behavior and the higher value a global tone-like behavior. This particular method is used in psychoacoustic model-2 MPEG-AUDIO standard to classify tonality. 1.2.2 Typical Psychoacoustic Model A typical psychoacoustic model consists of modeling a cochlea filter during its initial stages which, models the energy or phase information based on the ear. This is accomplished by applying the spreading function. A spreading function is a function that models the spreading of masking curves thus, modeling the energy excitation along the basilar membrane. This information is then passed to the tonality estimator which determines the relevant and irrelevant components of the signal and helps in the estimation of the masking threshold which eventually gives rise to the absolute threshold as shown in Figure 1.2

4

Cochlear Filter Modeling

Tonality Estimation

Threshold Estimation

Absolute Threshold

Figure 1.2: General block diagram of a psychoacoustic model Johnston, J.D

1.2.3 Psychoacoustic Model-1 The psychoacoustic model 1 in the MPEG standard performs an FFT to the input data which is windowed using a Hanning window of length 512 samples for layer I and 1024 for layer II and III. An overlap of N/16 is done between the adjacent frames, where N is the number of samples in that frame. After applying the FFT, the signal level for each spectral line k is calculated

Lk = 96dB + 10 log10 (4 / N 2 X [k ] 2 8 / 3) for k = 0,...N/2-1

(1.1)

where 1/N2 factor comes from the Parseval’s theorem and takes in account the positive frequencies components of the spectrum. The other factor of 2 deals with the scaling of the amplitude of the spectral components to 1 from ½, followed by the 8/3 factor which is due to the reduction in gain of the Hanning window [Bosi, M., Goldberg, R., 2003]. Once the spectral components are calculated the sound pressure level in each subband is calculated Lsb[m] corresponding to maximum amplitude FFT spectral line. The FFT spectral line is chosen in such a way that it corresponds to the maximum scale factor (scf).
Lsb [m] = max{Lk ,20 log10 ( scf max [m]32,768) − 10dB}

(1.2)

Having calculated the sound pressure level, we next compute the mask threshold in order to calculate the signal to mask ratio (SMR) which leads us to the tonality estimation

5

process where the model identifies peaks that have 7dB more energy than it neighboring spectral lines as it tonal components [MPEG Standard, ISO11172-3],

Lk − Lk + j ≥ 7dB j is the index that varies in central frequency.

(1.3)

This is based on the assumption that local maximum within a critical band represents a tonal component, as shown in Figure 1.3.

Figure 1.3: Tonal components identified in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.

if Lk represents a tonal component then adjacent spectral components centered at k are added to define a tonal masker LT, the other components are summed to gives us the noise maskers LN. Based on this information the spread of the masking curves are defined by applying the spreading function [MPEG Standard, IS011172-3]. Having defined the tonal and non-tonal components, the number of maskers is reduced prior to computing the global masking threshold by eliminating maskers whose levels are below the level of the threshold in quiet. Also, maskers extremely close to

6

stronger maskers are eliminated. If two or more components are separated in frequency by less than 0.5 bark then only components with the highest power is retained [Johnston, J.D, Brandenburg, K, 1990, MPEG Standard, IS011172-3, Bosi, M., Goldberg, R., 2003].

Figure 1.4: Maskers Decimation in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.

Based on this information the individual masking thresholds are calculated and summed along with the power of the threshold in quiet to give us the global masking threshold, which leads to the calculation of the SMRs in each sub-band. This is done by taking the difference between the maximum sound pressure level along with the minimum global masking level of that sub-band. A general block diagram of the whole process is shown in Figure 1.5

7

FFT Compute SPL in each sub-band Estimate Tonal and Non-Tonal Compare to Loudness Curve Masker Decimation Masking Thresholds Global Masking Threshold SMR Calculated

Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1 MPEG Standard ISO11172-3

1.2.4 Psychoacoustic Model-2

The psychoacoustic model 2 in the MPEG standard also performs an FFT to the Hann windowed input block, of size 1024 for all layers, however for layers II and III the model computes two FFTs for each frame. This model uses the output of the FFT analysis to calculate the masking curves and its associated signal to mask ratio for the coder sub-bands [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3] For the SPL calculation the model groups frequency lines into “threshold calculation partitions” whose widths are roughly 1/3 of a critical band. For a sampling rate of 44.1 kHz one single masker SPL is derived by summing the energies in each partition [MPEG ISO11172-3, Annex D, Table 3-D.3b]. The total masking energy of the input audio frame is then calculated by convolving a spreading function with each of the maskers in the signal, which leads us to the tonality calculation.

8

The tonality index in the model revolves around the core concept of how predictable a signal is from the prior two frames [Brandenburg, K, Johnston, J.D 1990]. For each frame ‘m’ and for each frequency line ‘k’, the signal amplitude, Am[k], and phase, ϕm[k], are predicted by linear extrapolation from the prior values as follows: [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3] ′ Am [k ] = Am [k ] + {Am −1 [k ] − Am − 2 [k ]} ′ θ m [k ] = θ m −1 [k ] + {θ m −1 [k ] − θ m − 2 [k ]} (1.4)

′ ′ where Am [k ] and θ m [k ] represent the predicted values. These values are then mapped into an “unpredictability measure” defined as:
C m [k ] = ′ ′ ′ ′ {Am [k ] cos θ m [k ] − Am [k ] cos θ m [k ]}2 + {Am [k ] sin θ m [k ] − Am [k ] sin θ m [k ]}2 ′ Am [k ] + Am [k ]

(1.5)

where Cm [k ] is equal to zero when the current value is exactly predicted and its equal to one when the power of either the predicted or actual signal is dramatically higher than the previous frames [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3]. This unpredictability measure is then weighted with the energy in each partition, thus giving us the partitioned unpredictability measure, which is then convolved with the spreading function. The result of this convolution is normalized with the normalizing coefficient (normb) derived from the spreading function normb = 1 b max bb = 0

∑ sprdngf (bval

(1.6) bb , bvalb )

and then mapped onto a tonality index which is a function of the partition number whose values vary from zero to one. t bb = −0.299 − 0.43 log e (cbb ) (1.7)

9

The tonality index is then used calculate the masking down-shift ∆(z) in dB which dependents of the tonal characteristics determined by the tonality index tbb . This downshift value is different for tonal and non-tonal signals as shown in Figure 1.6. ∆tone masking noise = 14.5 + z dB [‘z’ is the bark value from 0-24] ∆noise masking tone = C dB [C varies from 3-6dB] (1.8)

Figure 1.6: Example of a predicted masking threshold for a masker Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”

Once the global masking threshold is calculated it is eventually used for calculating SMR values in each partition by comparing itself to the threshold in quiet and then taking its maximum. A general block diagram of the whole process is shown in Figure 1.7
FFT
Unpredictability Levels (Cw) Spread Unpredictability Level (Ctb)

Signal Levels (eb)

Tonality Indices (tbb)

Spread Signal Levels (ecb,,en,nbb,nbw)

Masking Levels (thrw) SMR per Sub-band

Figure 1.7: General block diagram of Model 2 Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”

10

In general, these two psychoacoustic models are very similar in their masking threshold calculations but vary in their tonality classification scheme. The basic problem that arises in the classification of tonality in current perceptual models is the analysis tool that is used to analyze the frequency content of the incoming audio segment data, also know as the Fourier transform. Its design necessitates the trade-off between time domain and frequency domain resolution. The more the frequency resolution the more spectral components are used resulting in a masking function that can be estimated with better accuracy. On the other hand, a higher spectral resolution yields lower time resolution. The solution to this problem would be to replace the tools used to analyze the audio segment data, with tool that has better flexibility in adapting to the signals coherent state. These requirements are met by a variant of the short-time Fourier transform also known as the Wavelet transform. The purpose of this thesis is to explore tonality estimation using wavelet packet analysis (wavelet transform in a tree structure) based on the coherence and energy distribution of the input audio segment, which could be eventually used in a psychoacoustic model to increase coding efficiency and quality. The first theory section is presented in Chapters 2 and 3 which develops relevant theory used in the later chapters. The second section, Chapters 4 and 5, discusses Fourier analysis and introduces Wavelet theory, which leads us to Chapters 6 and 7 that focuses on the tone-detector, tonality analyzer and experiments that compliment their performance.

11

Chapter 2: Signal Representation

To understand the basic concept of Fourier analysis we must understand how it is used to represent a signal. A periodic signal can oscillate with a time period T and frequency f. We proceed to our first step in the analysis of complex exponentials and sinusoidal signal and see how both of them are related to each other. Once this gap has been bridged we can further draw conclusions on how the Fourier series leads to the Fourier transform and also state certain interesting characteristics of the Fourier transform.
2.1 Periodic Signals

A signal can be classified as periodic or aperiodic. A periodic continuous-time signal x(t) has the property that there is a positive value of T for which [Oppenheim, A.V & Willsky, A.S, 1996]

x(t ) = x(t + T ) in other words if the signal were shifted to the left it would repeat itself with a fundamental period T as shown in Figure 2.1.

(2.1)

Figure 2.1: A continuous-time signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

It is this property that exponentials share, specifically, x(t ) = e jω0t .This can be easily shown when we equate the above equation with

12

x(t + T ) = e jω0 ( t +T )

(2.2)

e jω0t = e jω0t e jω0T e jω0T = 1

(2.3)

Based on this result, we can conclude that a complex exponential is periodic for any value of T if ω= 0 and if ω ≠ 0, then it has fundamental period of To (the smallest positive value) equal to


ωo

similarly, the sinusoidal signal x(t ) = A cos(ω o t + φ ) is periodic with

a fundamental period of To =



ωo

, as shown in Figure 2.2

Figure 2.2: Continuous-time sinusoidal signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

2.2 Fourier Series

Sinusoidal waves and complex exponentials are periodic signals with fundamental period To =


ωo

and fundamental frequency ω 0 =

2π . We can therefore extend our To

expression of complex exponential by associating the signal with a set of harmonic related exponentials.

13

φ k (t ) = e jkω t = e
0

jk (

2π )t T

, k = 0,±1, ±2,...

(2.4)

Each of these signals are multiples of the fundamental frequency ωo hence being periodic too. When k = +1 and k = -1 both signals have a fundamental frequency equal to ωo which are collectively referred to as the first harmonic components. Similarly, when k = +2 and k = -2 they refer to the second harmonic components. More generally, the components for k = +N and k = -N are referred to as the Nth harmonic components. A complex sinusoidal signal can be represented as a linear combination of harmonically related complex exponentials of the form:

x(t ) = ∑ C n e
−∞

+∞

jkω0 t

= ∑ Cn e
−∞

+∞

jk (

2π )t T

(2.5)

This representation is also known as the Fourier series representation. Note that the complex exponential can be written as sines and cosines x(t ) = e jwot = cos(ω 0 t ) + j sin(ω 0 t ) , therefore making the Fourier series: x(t ) =

n = −∞



+∞

An cos(ω 0 t ) + j ∑ Bn sin(ω 0 t ) n = −∞

+∞

(2.6)

Where An are the coefficients of the cosines and Bn the coefficients of the sines C n = An + jBn and C − n = An − jBn
2.3 Fourier Transform

Before we derive the Fourier transform from the Fourier series let’s understand what a transform is and why we need it. A transform is a mathematical operation that takes a function or sequence and maps it into another one. In our case the Fourier transform maps a time domain function or sequence in to the frequency domain. Transforms are useful; they may give us additional or hidden information about the 14

original function. Most of the time transform equations are easier to solve than the original equation. They may require less storage space and hence be used for data compression or reduction. Operations such as convolution are easier to apply on a transformed function, rather than the original function. Fourier said “An arbitrary function, continuous or with discontinuities, defined in a finite interval by an arbitrarily capricious graph can always be expressed as a sum of sinusoids”. This is seen in the Fourier series. It is by manipulating this series we can derive the Fourier transform.
2.3.1 Fourier-Transform Derivation

This can be shown by multiplying both sides eq. (2.5) with e − jnwot to obtain [Oppenheim, A.V & Willsky, A.S, 1996]

x(t )e

− jnω0t

=

k = −∞

∑C e k +∞

jkω0t

e − jnω0t

(2.7)

Integrating both sides from 0 to T =
T T +∞



ω0

, we have

∫ x(t )e
0

− jnω0t

dt = ∫

0 k = −∞

∑C e k jkω0t

e − jnω0t dt

Here, T is the fundamental period of x(t), and consequently, we are integrating over one period. Now interchanging the order of integration and summation yields:
T +∞ ⎡T ⎤ x(t )e − jnω0t dt = ∑ C k ⎢ ∫ e j ( k − n )ω0t dt ⎥ ∫ k = −∞ 0 ⎣0 ⎦

(2.8)

The evaluation of the bracketed integral is straightforward. Rewriting this integral using Euler’s formula, we get:

15

T

− jnω t ∫ x(t )e 0 dt = ∫ cos(k − n)ω0tdt + j ∫ sin(k − n)ω0tdt 0 0 0

T

T

(2.9)

Since the integral may be viewed as measuring the total area under the functions over the interval and we are integrating over an interval (of length T), we see that for k≠n, both the integrals on the right-hand of eq. (2.9) are zero. For k=n, the integrand on the lefthand side of eq (2.9) equals 1, and thus, the integral equals T (the right-hand side). We therefore have:
T

∫e
0

j ( k − n )ω0 t

⎧T , k = n dt = ⎨ ⎩0, k ≠ n

and consequently, the right-hand side of eq (2.8) reduces to Cn giving:

1 C n = ∫ x(t )e − jnω0t dt T0

T

(2.10)

Note, that this equation looks very similar to the Fourier Transform:

F (ω ) =

+∞

−∞

∫ f (t )e

− jωt

dt

(2.11)

Here, we have written an equivalent expression for the Fourier series in terms of the fundamental frequency ωo and the fundamental period T. Equation (2.5) is referred to as the synthesis equation and eq. (2.12) as the analysis equation. The set of coefficients of
{Cn} are often called the Fourier series coefficients or the spectral

coefficients of x(t).

These complex coefficients measure the portion of the signal x(t) that is at each harmonic of the fundamental component. It’s interesting to note that when n=0 then eq. (2.9) becomes:

1 C0 = ∫ x(t )dt T0

T

(2.12)

16

This is a simple average value of x(t) over one period.
2.3.2 Dirac Delta Function

One of the best ways to understand Fourier analysis is to analyze a square pulse train. A square pulse can be viewed as a magnified version of a Dirac delta function δ(t) which is defined in the continuous domain. The equivalent version of the Dirac delta in the discrete domain is known as the Unit Step or Kronecker Delta, as shown in Figure 2.3

δ [ n] = ⎨

⎧0, n ≠ 0 ⎩1, n = 0

(2.13)

Figure 2.3: Discrete-time unit impulse (sample) (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

The Dirac delta function δ(t) is zero for t ≠ zero, but is infinite at t = 0 in such a way that its integral is unity. This function is one that is infinitesimally narrow, infinitely tall, yet integrates to unity. Perhaps the simplest way to visualize this is as a rectangular pulse from a −

ε
2

to a +

ε
2

with a height of

1

ε

as shown in Figure 2.4. As we take the limit of

this, lim ε → 0 we see that the width tends to zero and the height tends to infinity as the ε →0

total area remains constant at one as shown in Figure 2.5. The impulse function is often written as δ(t) [Selik, M. & Baraniuk, R]. 1 = ∫ δ (t)dt
−∞ +∞

(2.14)

17

1/ ε

−ε / 2

ε /2

Figure 2.4: The Dirac Delta Function (Selik, M. & Baraniuk, R., The Impulse Function, Connexions)

Since it is quite difficult to draw something that is infinitely tall, we represent the Dirac with an arrow centered at the point it is applied.
2.3.3 Fourier Coefficients

The relationship between time and frequency representation are mutual. A sharp spike in the time domain, represented by a unit dirac delta function, is represented as a superposition of all frequencies with equal amplitudes in the frequency domain and vice versa in the time domain [Calvert, J.B]. This is shown in Figures 2.5, 2.6 below

Figure 2.5: Dirac Delta in Time-Domain (Calvert, J.B., Time and Frequency)

Figure 2.6: Dirac Delta in Frequency-Domain (Calvert, J.B., Time and Frequency)

The use of complex exponential in the Fourier transform is very convenient, since complex coefficients generated by it can be expressed using magnitude and phase. As 18

mentioned earlier, analyzing the square pulse in the frequency domain yields more insight into this relationship. When we have a signal of certain duration, such as a rectangular pulse, the frequency representation is no longer like that of the dirac delta. Interestingly, the frequency response of a rectangular pulse is a sinc function whose central lobe’s width is inversely proportional to the width of the rectangular pulse. This is seen in Figure 2.7
⎧1 ⎪ sin c = ⎨ sin x ⎪ x ⎩ for x = 0 otherwise, (2.15)

Figure 2.7: Frequency Response of Rectangular Pulse (Calvert, J.B., Time and Frequency)

2.3.4 Derivation of Fourier Coefficients

To confirm the above relationship and see the mathematical beauty behind it, let’s consider a periodic square wave over one period as shown in Figure 2.8[Oppenheim, A.V & Willsky, A.S, 1996] ⎧1, t < T1 ⎪ x(t ) = ⎨ ⎪0, T1 < t < T / 2 ⎩

(2.16)

19

x(t)

-T0

-T0/2

T0/2

T0

t

Figure 2.8: Periodic Square Wave (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

This signal is periodic with fundamental period T and fundamental frequency ω 0 =

2π . T

Due to its periodic nature, let’s analyze the pulse centered at t=0, where –T/2 ≤ t < T/2. It is this interval over which the integration is performed. Using these limits of integration we have n=0 and therefore eq. (2.10) becomes

2T 1 1 C 0 = ∫ dt = 1 T −T1 T

T

(2.17)

As mentioned earlier C0 is interpreted as a dc or constant component, which in this case equals the fraction of each rectangular pulse during which x(t)=1. For n ≠ 0, eq. (2.10) becomes:

1 1 − jω 0 t 1 Ck = ∫ e dt = − e − jkw0t T −T1 jkω 0T
This can be rewritten as

T

T1 −T1

Ck =

⎡ e jω0T1 − e − jω0T1 ⎤ ⎢ ⎥ Kω 0 T ⎣ 2j ⎦ 2

(2.18)

Noting that the term in the brackets is sin(kωoT1), we can therefore express the Fourier coefficients as:

20

Ck =

2 sin( kω 0T1 ) sin( kω 0T1 ) = , where ω0T = 2π kω 0T kπ

(2.19)

In Figure 2.9 the coefficients are plotted for a fixed T1 and several values of T. Although our time domain signals are real, the frequency domain representations may be complex (Ck coefficients). For this specific example, the Fourier coefficients are real and consequently, they can be depicted graphically with only a single graph. So, as we change the interval length of the square wave T, we also in turn change the width of the rectangular pulse width which affects the width of the center lobe of the sinc function. The narrower the width of the rectangular pulse the wider the width of the center lobe of the sinc function becomes, since area under the region has to be conserved.

Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave: (a) T0=4T1; (b) T0=8T1; (c) T0=16T1 (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

21

2.4 Hilbert Transform

The complex exponential is a vital component of the Fourier and the short time Fourier transform. It acts as a kernel and extracts phase and magnitude information from the analyzed signal, therefore it is important to get a good perspective of the exponential term of the Fourier transform eq. (2.11) and how it is related to the phase. The ability of the complex exponential to act as a modulator and frequency shifter helps to understand the filter-bank structure of the short-time Fourier transform (STFT). In eq. (2.11) we have:
F (ω ) =
+∞

−∞

∫ f (t )e

− jωt

dt where:

e − jωt = cos(ωt ) + j sin(ωt )

(2.20)

This eq. (2.20) can be interpreted as an analytic signal of a cosine.
2.4.1 Analytic Signal

An analytic signal is a complex signal created by taking a signal and then adding in quadrature its Hilbert Transform. It is also called the pre-envelope of the real signal [Langton, C, Signal Processing & Simulation Newsletter]. It can be defined as: g + (t ) = g (t ) + j g (t )


(2.21)

Substituting cosωt for g(t) in eq. (2.21) we get: g + (t ) = cos(ωt ) + j sin(ωt ) = e jωt Before we go any further let’s understand the Hilbert transform and how it is related to the analytic equation.

22

2.4.2 Hilbert Transform Theory

The Hilbert transform is related to the Fourier series, which is a representation of a signal as a summation of sines and cosines eq. (2.2.3). By analyzing the building blocks of the Fourier series we can understand the Hilbert transform. In general, the Hilbert transform acts as a filter that changes the phase of the spectral components depending on the sign of their frequency. It only effects the phase of the signal and has no effect

on the amplitude [Langton, C, Signal Processing & Simulation Newsletter]. Let’s take a look at how we have come to this conclusion. Recall that the Fourier series can be written as: x(t ) =

n = −∞

∑A

+∞

n

cos(ω 0 t ) + j ∑ Bn sin(ω 0 t ) n = −∞

+∞

(2.22)

where:

Cn = An + jBn and C− n = An − jBn

(2.23)

An and Bn are the spectral coefficients of cosine and sine waves. The phase of the signal

is calculated by

ϕ = tan −1

Bn An

(2.24)

Cosine waves are 90° out of phase compared to sine waves and vise versa. So if a wave is strictly in terms of cosines then the Bn component of eq. (2.6) is zero, therefore the phase of the signal is zero. One way to look at the phase is the angle between real and imaginary axis, which implies the spectral components of the signal to lie on the real axis as shown in Figure 2.10

23

v(t) t

A/2

Q-

[V]

[V]

Q+
A/2

A/2

A/2

f

Real

-f

+f

Frequency

Spectral Amplitude

Magnitude Spectrum

Figure 2.10: Cosine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

Similarly, the sine terms have its An component of eq. (2.6) as zero, therefore the phase of the signal is 90°. In other words the phase of the sine terms is not symmetric where it has a +90 for positive frequencies and -90 for negative frequencies. This symmetrical concept is clearly presented by the variable Q in Figure 2.11
[V] v( t ) t [V]
A/2 A/2

QA/2 A/2

f

Q+
Real

-f

+f

Frequency

Magnitude Spectrum Spectral Amplitude

Figure 2.11: Sine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

2.4.3 Phase Rotation

In the above section we described important characteristics of sin and cosine terms in the spectral domain. The term Q is directly related to the phase of the signal. So, if we were to turn the cosine into a sine, we need to rotate the negative frequency (-Q) component of the cosine by +90° and the positive frequency component (+Q) by -90° In

24

other words we need to multiply the –Q component by j and the +Q component by –j as shown in Figure 2.12 [Langton, C, Signal Processing & Simulation Newsletter]
[V]

+90°

Q-

A/2

-90°
Real

A/2

Q+

Figure 2.12: Rotating Phasors to create a sine wave out of a cosine (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

Therefore for any signal g(t) its Hilbert Transform is:
G( f ) =
^

− j for f > 0 j for f < 0

(2.25)

(The hat over G(f) is a typical way of representing a time domain signal as a Hilbert Transform) For example, applying the Hilbert transform on a cosine term gives us a sine term. Applying it again gives us a negative cosine term and further application gives us a negative sine term and then at last our original cosine. cos ωt → sin ωt → − cos ωt → − sin ωt → cos ωt

For this reason Hilbert transform is also called a “quadrature filter” [Langton, C, Signal Processing & Simulation Newsletter]. As seen in the following Figure 2.13

25

[φ]

+90°

-90°

Real

Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative frequencies by +90° (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

2.4.3 Complex Envelope

Based on our knowledge of the analytic signal and the Hilbert transform we can now analyze the complex exponential. The analytic signal of a cosine, knowing that its Hilbert transform is a sine, is given by: g + (t ) =cos(ωt ) + j sin(ωt ) =e jωt (2.26)

We know that the spectral components of a cosine term lie on the real axis and the spectral components of a sine term are asymmetrical in nature and lie on the imaginary axis (sec. 2.4.2). It is interesting to note that the analytic signal of a cosine (complex exponential) has its spectral components all in the positive domain of the real axis even thought it consists of both cosine and sine terms, as shown in Figure 2.14

v(t) s q r t (2) *

[V]

s q r t (2 ) *

t

f

Real

.

Figure 2.14: Spectral Properties of the Complex Exponential (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

26

We now can define the complex envelope as: ~ g + ( t ) = g ( t ) e j 2 πf c t
~

(2.27)

where, g (t ) is the complex envelope of the signal g (t ) . Rewriting this equation (eq. 2.28) and taking its Fourier transform (eq. 2.29) reveals the complex envelope is just a frequency shifted version of the analytic signal [Langton, C, Signal Processing & Simulation Newsletter]: ~ g (t )= g + (t )e − j 2πf ct (2.28)

⎧2G( f − f c ) for f > 0 ~ ⎪ G ( f ) = ⎨ G( 0) for f = 0 ⎪ 0 for f < 0 ⎩ It is this feature that is used in linear system theory, where e-jωt acts as a

(2.29)

modulator. We will see its application in the coming sections of the short-time Fourier transform. One might ask why do we need this representation, is there an advantage? Here is an example which shows the advantages of complex envelopes.
2.4.4 Advantages of the Complex Envelope

To illustrate the advantages of the complex envelope let us consider an example where the signal s(t) is a base-band signal: s (t ) = 4 cos 2t − 6 sin 3t (Note: for simplification purposes the 2π factor is omitted) with phase and magnitude properties are shown in Figure 2.15. (2.30)

27

3 2 2 3

-3 -2

Figure 2.15: Spectral Properties of the s(t) (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

Now lets multiply the following signal with cos(100t) to modulate and make it a bandpass signal to give us: g (t ) = s (t ) cos100t g (t ) = 4 cos 2t cos100t − 6 sin 3t cos100t
^ ^

(2.31)

It is important to note that the envelope of the modulated signal is the information signal. In Figure 2.16 the solid line represents this information signal.

Figure 2.16: The Modulated Signal and its Envelope (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

After simplifying the signal using trigonometric identities we will take the Hilbert Transform of g (t ) and create its analytic signal. The steps are as follows: The trigonometric identities used to simplify eq. (2.31) are: sin A cos B = sin( A + B ) + sin( A − B ) 2 cos( A + B ) + cos( A − B ) 2 28
^

cos A cos B =

Using these identities we get: g (t ) = 2 cos(2 + 100)t + 2 cos(2 − 100)t − 3 sin(3 + 100)t − 3 sin(3 − 100)t Applying the Hilbert Transform to each term gives us: g ( t ) = 2 sin( 2 + 100 )t + 2 sin( 2 − 100)t + 3 cos(3 + 100 )t + 3 cos(3 − 100)t
^

^

(2.32)

(2.33)

Now we will create an analytic signal by adding the original signal eq. (2.32) with its Hilbert Transform eq. (2.33):
) g + (t ) = g (t ) + jg (t ) g + (t ) = 2 cos(2 + 100)t + 2 cos(2 − 100)t − 3 sin(3 + 100)t − 3 sin(3 − 100)t + j (2 sin( 2 + 100)t + 2 sin( 2 − 100)t + 3 cos(3 + 100)t + 3 cos(3 − 100)t )

(2.34)

Rearranging eq. (2.33) using the Euler representation gives us: g + (t ) = (4 cos 2t − 6 sin 3t )e j 100 t

(2.35)

It is interesting to see that eq. (2.4.12) and eq. (2.4.7) are similar except for the modulator (ejωt). On analyzing the complex envelope s(t) and the analytic signal eq. (2.35) in the frequency domain it becomes clear why this representation is advantageous when the analytic signal is viewed as a pass-band signal. Now taking the Fourier Transform of eq. (2.35) and (2.30), It is clear that the analytic signal based on the sampling theorem needs a higher bandwidth than the complex envelope. So, with this method of coding it is easier to separate the information signal s(t) from the carrier. This concept is used in time to frequency transforms when STFT is represented as a bank of filters or pass-bands.

29

6 4 4

6

2 3 Frequency Frequency

102 103

Spectrum of the Complex Envelope s(t)

Spectrum of the Analytic Signal

Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

2.5 Summary

In this chapter we have seen the basic building blocks that are required to represent a signal. We have seen that a complex function can be represented through its simple building blocks, also known as the basis function. Using these building blocks we form a compressed representation of a complex function. One generalized view of a complex function is of the form: Complex Function = ∑ (weight )i • (Simple Function )i i In our case sinusoids (complex exponential) are the building blocks of the Fourier transform, where for each frequency of the complex exponential the sinusoids at that frequency are compared to the signal. Based on the analysis the frequency correlation are determined. The spectral coefficients are high if the correlation is high and vice versa. Along with this we have also seen how the complex exponential term acts as a modulator (frequency shifter) and how it affects the signal as a complex envelope. This gives us insight into how the Fourier transform can be viewed as linear system. Taking this knowledge we shall proceed to the next chapter where we will discuss the short-time Fourier transform and the conditions that gave rise to such a concept.

30

Chapter 3: Time to Frequency Mapping 3.1 Quadrature Mirror Filter (QMF)

The QMF filter-bank gained its popularity with the introduction of decimators and expanders in its structure. The system was introduced in the mid seventies [Croisier, et al., 1976] and has been since studied by researchers. A simple QMF filterbank consists of two banks, typically low-pass and high-pass bandlimted to a total width of π . The input signal x(t) is then filtered by H0(z) and H1(z) which are further decimated by a factor of 2 (down-sampled by factor 2, where odd samples are removed and even ones are kept) resulting in v0 (n) and v1 (n) . These decimated signals are then sent through expanders (up-sampled by the same factor as decimated, in our case 2) which are passed down to filters F0(z) and F1(z) whose purpose is to cancel all types of distortions.

x (n)

x0 ( n )

v0 ( n)
2 2

y0 ( n )
F0 ( z )

H 0 ( z) x1 ( n)

v1 (n)
2 2

y1 (n)

H1 ( z )
Analysis bank

F1 ( z )
Synthesis bank

x ( n)

^

Figure 3.1: QMF filter-bank

There are four types of distortions caused by the filterbank structures. They are aliasing, amplitude distortion, phase distortion and quantization effects. It was found that in the case of M channel filter banks, the conditions for alias cancellation and perfect reconstruction are much more complicated. This was the reason pseudo QMF techniques were introduced [Nussbaumer, 1981], as means of approximating alias cancellation. Vetterli and Vaidyanathan later showed that the use of polyphase components leads to considerable calculation simplification in filter-bank theory. A technique for the design of

31

M channel perfect reconstruction systems was developed [Vaidyanathan, 1987a,b], based on polyphase matrixes with the so-called paraunitary property. This same property also finds application in the theory of orthonormal wavelet transforms [Vaidyanathan, P.P,
1993].

3.2 Aliasing and Imaging

In theory, a perfect ideal filter is one that has high stop-band attenuation which has the least aliasing effects. Aliasing can be defined in a broad sense as frequency confusion cause by decimation of a signal. The decimation (down-sampling) process causes overlap between the adjacent sub-bands, as shown in Figure 3.2 H 0 (z)
H 1 (z)

overlap
0 ω

π
2

π

Figure 3.2: Aliasing

When substantial energy for a bandwidth exceeds the ideal pass-band region, aliasing has greater effect on the integrity of the signal. In principle it is possible to choose filters that do not overlap, but this causes severe attenuation in the region of no overlap. Boosting frequencies in that region will result in severe amplification of noise (coding noise, channel noise, filter roundoff noise). A solution to this problem might be increasing the filter order but this can be expensive computationally. The overlapping response is therefore more practical. Even though this causes aliasing, the effect can be cancelled by carefully designing the synthesis filters [Vaidyanathan, P.P, 1993]. Let’s examine Figure 3.1 in the z-domain to get a better understanding of the process.

The input signal x(n) can be expressed as: 32

X k ( z) = H k ( z) X ( z)

(3.1)

where k = 0,1. The z-transform of the decimated signal v k (n) can be expressed as:
Vk ( z ) =
1 1 − ⎤ 1⎡ X k ( z 2 ) + X k ( z 2 )⎥ ⎢ 2⎣ ⎦

(3.2)

for k = 0,1. Upsampling the decimated signal yields: 1 Yk ( z ) = Vk ( z 2 ) = [ X k ( z ) + X k (− z )] 2 The reconstructed signal is (3.3)

X ( z ) = F0 ( z )Y0 ( z ) + F1 ( z )Y1 ( z ) Substituting eq. (3.2.3) in eq. (3.2.4) we obtain: X ( z) =
^

^

(3.4)

1 [F0 ( z ) H 0 ( z ) + F1 ( z ) H 1 ( z )]X ( z ) + 1 [F0 ( z ) H 0 (− z ) + F1 ( z ) H 1 (− z )]X (− z ) 2 2

In matrix form
^ ⎡ H 0 ( z ) H 1 ( z ) ⎤ ⎡ F0 ( z )⎤ 2 X ( z ) = [ X ( z ) X (− z )]⎢ − z ) H (− ⎥ ⎢ F z ) ⎥ ⎣ H 0 (4244z )⎦ ⎣4 ( 4⎦ 14 4 1 4 11 3 3 2

(3.5)

H ( z)

f ( z)

Here the matrix H(z) is know as the alias component matrix. It can be noted that X (− z ) = X (e j (ω −π ) ) takes into account aliasing due to decimation and imaging due to expanders. So, it is clear that we can cancel aliasing by choosing the filters such that the quantity H 0 (− z ) F0 ( z ) + H 1 (− z ) F1 ( z ) is zero. Implying that F0 ( z ) = H 1 (− z ) , F1 ( z ) = − H 0 (− z ) in order to satisfy the above condition.

33

3.3 Distortion Transfer Function

Apart from aliasing, amplitude distortion and phase distortion are related to the distortion transfer function T(z). The distortion transfer function or ‘overall’ transfer function is defined as: T ( z) = 1 [H 0 ( z ) F0 ( z ) + H 1 ( z ) F1 ( z )] 2 (3.6)

and is related to the input signal: X ( z) = T ( z) X ( z) Expressing T(z) in terms of magnitude and phase: T ( e jω ) = T ( e jω ) e jφ ( ω ) we can represent eq. (3.7): X (e jω ) = T (e jω ) e jφ (ω ) X (e jω )
^ ^

(3.7)

(3.8)

(3.9)

If T (e jω ) ≠ 0 then we have amplitude distortion. Similarly if T(z) does not have linear

phase, X(z) suffers from phase distortion.
3.4 Polyphase Representation

Examining the matrix representation eq. (3.5) we can in principle cancel aliasing by solving for the synthesis filters from f ( z ) = H −1 ( z )t ( z ) , where t ( z ) = H ( z ) f ( z ) but this results in calculating f(z) explicitly as: f ( z) = AdjH ( z ) t ( z) det H ( z ) (3.10)

This is possible unless the determinant is not equal to zero. Also the zeros of the quantity det H(z) are related to the analysis filters Hk(z) in a very complicated manner, thus making it difficult to ensure if they are inside the unit circle which is necessary for
34

stability of Fk(z) [Vaidyanathan, P.P, 1993]. It is for this reason that the polyphase representation comes in handy.
3.4.1 Perfect Reconstruction

We now can express Hk in eq. (3.1) as an M channel filter-bank, where: H k ( z) =
M −1 l =0

∑z

−l

E kl ( z M )

(3.11)

Similarly, the synthesis filters Fk can also be expressed as: Fk ( z ) =
M −1 l =0

∑z

− ( M −1−l )

Rlk ( z M )

(3.12)

Examining eq. (3.11) and eq. (3.12) we can generate the matrix E(z) and R(z), which are related to each other as:

R( z ) E ( z ) = I

(3.13)

Our aim is to obtain the reconstructed signal unchanged, and this is only possible if each matrix nullifies the effect of the other. In other words the product of the two polyphase matrix equals an identity matrix. The condition still holds if we replace eq. (3.13) with: R ( z ) E ( z ) = cz − m0 I (3.14)

Since, the output of the filter-bank structure is just the delayed version of the input, we can make this kind of modification to the equation. A matrix representation of this case eq. (3.14) is:
I ⎤ ⎡0 R ( z ) E ( z ) = cz − m0 ⎢ −1 M − r ⎥ ⎣z I r 0 ⎦

The reconstructed signal is of form x(n) = cx(n − n0 ) , where n0 = Mm0 + r + M-1 for some integer ‘r’ with 0 ≤ r ≤ M-1 and m0.

^

35

3.5 Paraunitary Property

In the above section we expressed an M channel filter in terms of polyphase matrices E(z) and R(z). It should be noted that if these filters are FIR and the filter-bank has perfect reconstruction property, then the polyphase matrix E(z) has to satisfy the condition that the determinant of E(z) must be a delay [Vaidyanathan, P.P, 1993] where: det E ( z ) = αz − k , α ≠ 0, K= integer (3.15)

A causal transfer matrix H(z) is said to be lossless or paraunitary if (a) each entry Hkm(z) is stable and (b) H(ejω) is unitary. Before we examine the paraunitary property we must understand what a unitary matrix is.
3.5.1 Unitary Matrix
A complex matrix A is said to be unitary if A§A = I, for example:

A =

1 ⎡1 − i ⎤ ⎢ ⎥ 2 ⎣i − 1⎦ 1 ⎡− 1 i ⎤ ⎢ ⎥ 2 ⎣− i 1⎦

A* =

1 ⎡1 i ⎤ ⎢ ⎥ 2 ⎣− i − 1⎦

(3.16)

A‐1 =

A§ = A*T =

1 ⎡1 − i ⎤ ⎢ ⎥ 2 ⎣i − 1⎦

(3.17)

then A§ = A‐1 This property complements the paraunitary property which states that a matrix function H(z) is said to be paraunitary if it is unitary for all values of the parameter z H T ( z −1 ) H ( z ) = I , for all z ≠ 0

(3.18)

eq. (3.18) define a lossless system to be causal, stable and paraunitary.

36

3.6 Summary 3.6.1 Advantages of Paraunitary Filter Banks

The advantages of applying the paraunitary property to E(z) is that no matrix inversion is involved in the design. The synthesis filters are FIR and have the same length as the analysis filters, and can be obtained by time-reversal and conjugation of the analysis filter coefficients. If the paraunitary matrix E(z) is implemented as a cascade structure then the perfect reconstruction property still holds in spite of multiplier quantization. The cascade paraunitary structure also ensures that the computational complexity is low. Last but not the least the filter banks with paraunitary E(z) can be used to generate an orthonormal basis for wavelet transforms. The orthonormal basis property is discussed in Chapter 6 which leads to orthonormal basis tree structured filter banks, also know as wavelet packets.

37

Chapter 4: Short-Time Fourier Transform

In section 2.3.3 we observed the relationship between time and frequency representation. We observed that if an impulse in the time domain was viewed as a very narrow rectangular pulse then its frequency representation would be a sinc function whose central lobe is affected by the width of the impulse in the time domain. This in turn reflects on the localization property of the Fourier transform which rejects the notion of “frequency that varies with time.” According to Fourier analysis, a single frequency is always associated with infinite time duration, as shown in Figure 4.1. To deal with this time localization problem, the sampled signal can be windowed. The basic mechanics of the discrete Fourier transform is to multiply the analyzed signal with an impulse train of a certain sampling frequency, abiding by the sampling theorem. Assuming the sampled signal is harmonic over N samples we then window the digitized signal. Starting with the fundamental frequency we multiply the signal by a complex exponential and perform summation (calculating the area under the curve) of the result. Recall that the Fourier transform equation is the summation of f (t )e − j 2πft over an interval. This term can be interpreted as the block diagram, in Figure 4.1, where the cosine and sin multipliers are part of the complex exponential. f (t ) cos( 2πt )

cos(2πt )

f (t )

f (t ) sin( 2πt )

sin(2πt )

Figure 4.1: FFT Block Diagram

38

When the cosine and sine multipliers are multiplied, the area of the resultant

signal is considered. If the resulting area is zero then there is no correlation. Harmonic frequencies of sine and cosines are multiplied and Fourier coefficients are thus derived. Though this approach may seem trivial, the discrete Fourier transform has drawbacks based on it formulation. For it work flawlessly it must have a sample space ranging from negative infinity to positive infinity, which is not practical. So, in order to tackle this problem, a discrete set of samples was windowed and then applied to the Fourier transform. The window is then shifted in uniform amounts and the above computation is repeated. This is also known as the short-time Fourier transform.
4.1 Analysis of the STFT Equation

The short-time Fourier transform consist of three main components. The signal to be analyzed x(n), the window function v(n), and the Fourier transform kernel or the basis function e − jωn . First the signal is multiplied with the window signal, which is typically of a finite duration. After this is done the kernel is applied to the product x(n)v(n) to calculate the Fourier transform. The window is then shifted and the process is repeated again. X STFT (e jω , m) =

n = −∞

∑ x(n)v(n − m)e



− jωn

(4.1)

The function X STFT (e jω , m) has two variables ω and m. The frequency variable ω is

continuous and ranges from ‐π ≤ ω < π. The shift variable m is typically an integer multiple of some fixed integer. Essentially the window captures features of the signal around m and helps to localize time domain data.

39

4.2 STFT as a Bank of Filters

Since most signal processing is done using linear time-invariant signals, it is beneficial to explore this representation of STFT. Furthermore, this interpretation helps us to generalize the STFT to obtain more flexibility [Vaidyanathan, P.P, 1993]. In section 2.4.3 we discussed the complex envelope and showed how the complex exponential acts as a modulator that performs a frequency-shift. More specifically, it shifts the Fourier transform towards the left by f c . The STFT can be looked as a bank of band-pass filters. The basic block diagram of a single frequency channel as shown in Figure 4.2 x(n)

∑ s(n)
−∞



y (n) = X STFT (e jω0 , m)

v ( n − m ) e − jω 0 n

Figure 4.2: STFT Represented in terms of a Linear System

To gain further insight, of this let’s modify the eq. (4.1) by multiplying it with e − jωm : X STFT (e jω , m) = e − jωm

n = −∞

∑ x(n)v(n − m)e



− jω ( m − n )

(4.2)

This equation represents an LTI system as shown in Figure 4.3, where m represents the center of the STFT window. Although k is not mentioned in the above equation it is related to m in such a way that m is the integer multiple of k. So, if the window were to shift it would be from v(n), v(n-k), v(n-2k) and so on. In this example let k = 1, so the output is constant like a traditional Fourier transform. The impulse response of the LTI system is a band-pass filter of the form v(−n)e jω0 n whose frequency representation is V (e − j (ω −ω0 ) ) . The output sequence t0(n) is therefore a band-pass filter,

40

whose pass-band is centered around ω0. The modulator acts as a frequency shifter which re-centers the frequency response around zero [Vaidyanathan, P.P, 1993]. x(n) t 0 ( n)

v ( − n ) e jω 0 n e − jω0 n (modulator)

y0 ( n) = X STFT (e jω0 , n)

Figure 4.3: Rearranged STFT Representation in terms of a Linear System

Examining this in the frequency domain we see that the STFT reduces to a filterbank with M band-pass filters with response H k (e jω ) = V (e− j (ω −ωk ) ) , as shown in Figure 4.4. The pass-band of H k (e jω ) is centered around ωk, where k = 0, 1, 2 ..., M-1 ω0
H0
H1



H2
●●●

H M −1

H0

ω1

ω2

ωM −1

ω

Figure 4.4: STFT viewed as a Filter-Bank

4.3 Effects of Windowing

Unlike the traditional Fourier transform the STFT is uniquely defined based on the type of window chosen v(n). The choice of window governs the tradeoff between time localization and frequency resolution. It is interesting to see that as the window function gets wider the frequency information gets more localized and vice versa. Figure 4.5 shows how a small window of 512 samples has wider lobes compared to the larger window of 2048 samples. This confirms our previous statement that wider windows have better frequency resolution or better information localization in the frequency domain.

41

Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples

4.3.1 Choice of the Best Window

Earlier we know that a narrow window in the time domain leads to a broader frequency transform and vice versa. To make this concept more precise the rms (rootmean squared) duration of a signal was introduced [Gabor, 1946], [Papoullis 1977a]. The two non-negative quantities Dt and Df are defined as the duration energy in the time and frequency domain: D 2t = 1 2 2 t v (t )dt E −∫ ∞ 1 2 2 ∫∞Ω V ( jΩ) dΩ 2πE −
+∞ +∞

(4.3)

D2 f =

(4.4)

Where E is the window energy, that is E = ∫ v 2 (t )dt and v(t) is the signal in the time domain. Interesting enough, the rms duration of a triangular waveform is smaller than that of a rectangular one even though they both share the same duration. This is because of the t2 factor in the definition of D2t increases for non-zero values of v(t). It turns out based on the uncertainty principle the product of Dt D f cannot be arbitrarily

42

small. Dt D f ≥ 0.5 if and only if v(t ) = Ae −αt , α > 0 . Therefore, the optimal window would take the form of a Gaussian waveform, with its ideal length being infinity [Vaidyanathan, P.P, 1993].
4.4 Summary

2

In this chapter we have seen the transition of the traditional Fourier transform to the short-time Fourier transform. We have come to realize that the windowing function is what uniquely defines the STFT, which underlines its weakness. The STFT is the result of the evolution of the Fourier transform in order to gain better flexibility in localizing time and frequency. In the next chapter we shall explore a new way of analyzing a time signal in order to gain better frequency resolution.

43

Chapter 5: The Wavelet Transform

The short-time Fourier transform is a convenient way to analyze the frequency information of a signal. However, we know that audio signals and most of the signals that exist in the real world are very dynamic in nature. Information can be hidden by means of modulation. To get a better understanding of the weakness of the STFT, let’s consider Figure 5.1 which shows two cases.

v(t )

t x(t )v(t )

t

(a)

(b)

Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed function v(t)

5.1 Weakness of the STFT

In the first case, x(t) is a high-frequency signal and v(t) is the window function. It is apparent from part (a) of the above figure that the window function captures many cycles of the input signal x(t) as compared to part (b). Thus, the accuracy of the estimated Fourier transform is poor at low frequencies, and improves as the frequency increases [Vaidyanathan, P.P , 1993]. To gain more information about the signal it would be appropriate to have a window whose width adjusts with the frequency of the input signal. An ideal filter bank structure would have narrow bandwidths (wider windows) at low frequencies and wider bandwidths (shorter windows) at high frequencies. Keeping this

44

concept in mind one can tackle this problem by replacing the window function v(t) with a function of both frequency and time, so that the time domain window gets wider (narrow bandwidth) as frequency decreases and vice versa. This way the window function captures the same amount of zero crossings of the input signal irrespective to the change in frequency. Furthermore, as the window gets wider, it is also desirable to have wider step sizes for moving the window. This also means that the decimation ratio also increases as you go higher in frequency.
5.2 STFT to Wavelets

When the STFT is viewed as a bank of filters it consists of band-pass filters of equal bandwidth, which are obtained by modulating component e − jωm . It is this that restricts the time resolution of the STFT. To overcome this, one must abandon this modulation scheme and replace it with a function of both frequency and time, Thus obtaining filters hk (t ) where a is greater than one and k is an integer: hk (t ) = a − k / 2 h(a − k t ) Here k plays the role of frequency, thus frequency scaling the response rather than frequency shifting it (STFT case). The scale factor a − k / 2 in eq. (5.2.1) is meant to ensure


(5.1)

that the energy

−∞

∫h

k

(t ) dt is independent of k. An equivalent representation of eq. (5.1)

2

in the frequency domain can be written as: H k ( jΩ) = a k / 2 H ( ja k Ω) (5.2)

We know the ear can be viewed as a set of non-linear band-pass filters whose frequency resolution decreases with the increase in frequency. Based on this analogy let’s assume H ( jΩ) as a band-pass filter with cutoff frequencies α and β. Since discrete systems are

45

efficiently described in powers of two let’s assume a = 2 and β = 2α. We can define the center frequency to be the geometric mean of the two cutoff edges, that is [Vaidyanathan, P.P, 1993]:
Ω k = 2 − k αβ = α 2 − k 2

(5.3)

5.2.1 Modification of the STFT

If we consider the continuous version of eq. (4.1) we get: X STFT ( jΩ k ,τ ) = e −Ω kτ ∫ x(t )hk (τ − t )dt here e − Ω kτ is the kernel, Ωk is the frequency (analog domain) and hk are the filters. Keeping this equation structure we substitute eq. (5.1) in eq. (5.4) and get: a − k / 2 e − jΩ k τ


(5.4)

−∞

∫ x(t )h(a

−k

(τ − t ))dt (5.5)

Now we know from our earlier discussion that the bandwidth of Hk(jΩ) gets smaller as k (frequency) increases. With this varying bandwidth the windowing (time domain) is also affected. As the window size varies one must account for the step sizes, so we replace the continuous variable τ with na k T , where n is an integer. This means that the step size for window movement is a k T and it increases with k. In other words the window movement increases as the center-frequency Ωk of the filter decreases. eq. (5.5) takes care of the first case of changing bandwidths. Removing the kernel and taking account of the step size ( na k T ) as frequency resolution (bandwidth) changes we can modify eq. (5.5) and get:


X DWT (k , n) =

−∞

∫ x(t )h (na T − t )dt k k

(5.6)

46

Note the above equation represents the convolution between x(t) and hk(t) evaluated at a discrete set of points nakT. In other words, the output of the convolution is sampled with spacing akT [Vaidyanathan, P.P, 1993]. To summarize the fundamental differences between the STFT and Wavelet transform one can look at the time-frequency plot. The STFT has uniform time and frequency spacing whereas the wavelet transform, the frequency spacing gets smaller at lower frequencies, and the corresponding time spacing get larger [Vaidyanathan, P.P, 1993], as shown in Figure 5.2 t (time) t (time)

4T 3T 2T T 3T 2T T

Ω1 2Ω1 ...

Ω (frequency)

Ω0/4

Ω0/2 ...

Ω (frequency)

(a)

(b)

Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b) Vaidyanathan, P.P, “Multirate Systems and Filter Banks”

The beauty of wavelets is that the wavelet transform is not explicitly implemented by a moving window because there is in reality no unique window, as seen in equation (5.6). The system is in essence a filter bank, and is somewhat analogous to the family of windows. Another general form of the wavelet transform is:
X CWT ( p, q ) = 1 p


−∞

∫ f⎜ ⎜ ⎝

⎛t −q⎞ ⎟dt p ⎟ ⎠

(5.7)

47

here p and q are real-valued continuous variables, where p = a k , q = a k Tn and f(t) = h(t). This is known as the continuous wavelet transform. The variable p can also be considered as a scaling function, where the scale factor of a wavelet is inversely related to the frequency, in other words the larger the scale of the wavelet the lower the frequency of the wavelet, the narrower the bandwidth and vice versa. The variable q can be looked upon as the translation parameter which is responsible for the shifting of the wavelet. It’s step size movement increases as the value of k increases. This can be seen in Figure 5.3

Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©

5.3 Inversion of the Wavelet Transform

The original signal x(t) is reconstructed from the wavelet coefficients. The reconstruction of XDWT depends on the filter h(t), and the parameters a and T which completely characterizes the transformation. Changing a will change the spacing of the

48

band-pass filters and the frequency resolution of filter-banks as described earlier in section 5.2. If the inverse transform exists it appears as: x(t ) = ∑∑ X DWT (k , n)ψ kn (t ) (inverse DWT) k n

(5.8)

5.4 Orthonormal Basis

A subset {v1,.....vk} of a vector space V, with the inner product is called orthonormal if = 0 when i ≠ j. That is when the vectors are mutually perpendicular. Moreover, they are all required to have length one: = 1. An orthonormal set must be linearly independent (linear combination of functions cannot be expressed equal to zero) so that it is a vector space basis for the space it spans. Such a basis is called a orthonormal basis [Eric W. Weisstein et al]. Of particular interest is the case where {ψ kn (t ) } is a set of orthonormal funtions, where the integral of the basis {ψ kn (t ) } and its conjugate is equal to unity.


−∞

∫ψ

*

kn

(t )ψ lm (t )dt = δ (k − l )δ (n − m)

(5.9)

applying the orthogonality property to eq. (5.6):


X DWT (k , n) = we conclude:

−∞

∫ x(t )ψ

*

kn

(t )dt

ψ kn (t ) = a − k / 2 h * (nT − a − k t )
= h * k (a k nT − t )

(5.10) (5.11)

But we have ψ (t ) = f (t ) so that, in the orthonormal case, f (t ) = h * (−t ) thus

49

f k (t ) = h * k (−t ) which is very similar to the perfect reconstruction paraunitary QMF banks.
5.5 Wavelet Packet Analysis

Wavelet Packets are smooth versions of Walsh functions [Coifman, R, Ronald & Wickerhauser, V, Mladen]. Walsh functions consist of trains of square pulses with -1 and +1 states, such that transitions may only occur at fixed intervals of a unit time step, the initial state is always +1. It is a generalization of the wavelet decomposition that offers a rich range of possibilities to analyze a signal. The wavelet packet analysis is a tree structured filter bank that splits the signal in two sub-bands, and after decimating, each sub-band is again split into two and decimated. The sub-bands are then recombined, two at a time, by use of two-channel synthesis banks. Each node in the tree structure represents a subspace of the original signal. Each subspace is the orthogonal direct sum (direct sum of two subspaces) of its two children nodes. The leaves of every connected subtree give an orthonormal basis. This procedure permits the segmentation of acoustic signals into those dyadic windows best adapted to the local frequency content [Coifman, R, Ronald & Wickerhauser, V, Mladen]. The low-pass sub-band of the decomposition is known as approximation and the high-pass sub-band of the decomposition is known as detail. For an n-level decomposition, there are n+1 possible ways to decompose the signal [Matlab documentation, wavelet packet analysis]. A three level wavelet decomposition tree is shown in Figure 5.4 where, the signal S can be reconstructed by adding the approximation and its previous details.

50

S = A1+D1 = A2+D2+ D1 = A3+D3+ D2+D1
S

A1

D1

A1

D2

A1

D3

Figure 5.4: 3-level Wavelet decomposition tree

5.5.1 Discrete Wavelet Transform

The information achieved from the continuous wavelet transform (CWT) is often redundant in nature. In fact CWT computed by computers is actually discretized versions of itself. X DWT (k , n) = 2 −k / 2 ∑ x(t )hk (na k T − t )dt ,
−∞ ∞

k, n are integers

(5.12)

An elegant way to represent the non-redundant data in the time-frequency plane would be to sample the plane on a dyadic (octave) grid. This representation is ideal since it maps the frequency spectrum on a logarithmic scale similar to that of the ear. The dyadic sampling of the time-frequency plane can be achieved by series of up/down sampling operations. This approach gives us a multi-resolution representation as seen in Figure 5.5

51

Magnitude x(t) H0 H1

Sample at nT Sample at 2nT Sample at 4nT

XDWT(0,n) XDWT(1,n) XDWT(2,n)

2

2

1 α β=2α

H2

α/4

α/2

Ω (frequency)

Wavelet Coefficients

Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of discrete wavelet transform

5.6 Wavelet Packet Tree Representation

The wavelet packet tree can be represented in many ways. Among them the most important to us are the Energy Representation, Index Representation, and the Depth Representation.
5.6.1 Energy Representation

The energy representation of the wavelet decomposition tree displays the energy of each node in the tree. The frequency response of the wavelet plays an important role on how the energy is distributed along the wavelet decomposition tree which is determined by the cut-off frequency of the wavelet. The pseudocode methodology of calculating the energy of the wavelet tree is as follows: 1) get all coefficients of the parent node 2) calculate the total energy of the parent node
ETotal = ∑ C n , where Cn are the wavelet coefficients
2 n

(5.13)

3) get terminal nodes 4) calculate the energy of the terminal nodes
ETer min alNodes = ∑ C n , where Cn are the wavelet coefficients of the terminal nodes
2 n

52

5) Represent the energy in terms of a percentage
ETer min alNode = 100 * ETer min alNode ETotal

Below is the energy representation of a 1kHz signal, analyzed using a Haar wavelet at depth level 3.

Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal

5.6.2 Index Representation

The index representation represents the index value of the each node in the tree. The values progress from left to right in an ascending order. It is important to note that the order of the index node does not change even if one of the nodes is not decomposed in to its children. Figure 5.7 is an index representation of a 1kHz signal with node 5 not decomposed.

53

Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal

This type of representation is important in terms of coding the detection algorithm which relies on keeping track of the terminal nodes (7,8,9,10,5,13,14).
5.6.3 Filterbank Representation

One can also look at the wavelet tree decomposition as a tree of filter banks where each parent node is split in to its low-pass and high-pass. These nodes in turn can also act as parent nodes and so on. Figure 5.8 shows us the mapping of the wavelet tree in Figure 5.9
Magnitude

AA2 A1

DA2

AD2 D1

DD2

AAA3 DAA3 ADA3 DDA3 AAD3 DAD3 ADD3 DDD3

Frequency Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree

54

S

A1

D1

AA2

DA2

AD2

DD2

AAA3

DAA3

ADA3

DDA3

AAD3

DAD3

ADD3

DDD3

Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree

The discrete wavelet packet tree obtains it multi-resolution representation by sampling the time-frequency plane on a dyadic (octave) grid. This is done by down sampling by a factor of two in the analysis stage and up-sampling by a factor of two in the reconstruction stage. Figure 5.10 is a discrete wavelet packet tree in its analysis stage. Note that the amount of samples decreases as the depth of the wavelet tree increases, also the bandwidth of the filters decreases by half during each decomposition.
Length: 512 B: 0 ~ π

|H(jw)|

g[n]
Length: 256 B: π/2 ~ π Hz

h[n]
2

2 d1: Level 1 DWT Coeff.

a1

Length: 256 B: 0 ~ π/2 Hz

-π/2

π/2 |G(jw)|

w

g[n]
2

h[n] a2 2
Length: 128 B: 0 ~ π /4 Hz

Length: 128 B: π/4 ~ π/2 Hz



-π/2

π/2

π

w

d2: Level 2 DWT Coeff.
Length: 64 B: π/8 ~ π/4 Hz

g[n]
2

h[n]
2
Length: 64 B: 0 ~ π/8 Hz

d3: Level 3 DWT Coeff.

…a3…. Level 3 approximation
Coefficients

Figure 5.10: Discrete wavelet packet tree (analysis stage)

55

Chapter 6: Analysis and Results

To estimate tonality it is important to understand the masking phenomenon. In regards to masking we are more concerned with instantaneous masking or frequency masking. Data suggest that a pure tone and narrow-band noise of equal intensity and equal loudness have different masking ability. Narrow band noise in particular is complicated by the fact that a reduction in bandwidth is accompanied by a decrease in the rate of intensity fluctuations [Stevens, 1956]. Bos and de Boer [1966] point out that the slow rate of intensity fluctuations inherent in the structure of narrow-band noise increases its ability to mask. Young and Wenner also indicated that the 20-dB difference between the masking effect of a tone and a narrow critical band noise disappears when the pure tone is replaced by a tone that is frequency-modulated at a rate of 25 per Hz. More research is needed to see how frequency-modulated tones can be used as partial maskers [Hellman, R.P, Harvard University, Cambride, Massachusetts 02138]. In previous work [Johnston, J.D, 1988] of tonality estimation spectral flatness measure was used to interpolate between the masking threshold formulas [Hellman, R.P, Harvard University, Cambride, Massachusetts 02138] and [Scharf, B, 1970]. The problem arises with the notion of global tonality [Johnston, J.D, 1990]. Signals such as speech have “tonal” parts and “noisy” parts of considerable energy at high frequencies. The resultant unpredictability measure will not show the parts of the signal that are very tonal (due to the fixed block size of the transform). Tonality by definition, in terms of perceptual coding, estimates the amount of masking a signal can achieve based on its type (tonal or noisy). The Tonality Index on the other hand is a global value that characterizes a signal’s tonality based on its correlation information. The wavelet packet

56

analysis is suitable for this task because one can control the bandwidth of the frequency bands thus making it possible to detect transients (change in frequency, attack regions) more accurately. Our proposed model uses this analysis tool to estimate tonality and uses Fourier analysis to determine signal levels (SPL) and spread signal levels. A general block diagram is shown in Figure 6.1
Wavelet Packet Analysis
Detector

FFT Signal Levels (eb)

Node Reconstruction Tonality

Estimator
Node Frequency Mapping

Spread Signal Levels (ecb,,en,nbb,nbw)

Tonality Indices (tbb)
Masking Levels (thrw) SMR per Sub-band

Figure 6.1: General block diagram of the proposed model

6.1 Detection Scheme

The proposed detection scheme relies on the flow of energy, which is analyzed using the wavelet tree decomposition. Each audio frame is considered to have an energy value of 100 and is decomposed to the first level as shown in Figure 6.2. The energy ratios of the child nodes (low-end and high-end) are then calculated and compared to the

57

parent node, which are compared to a threshold ratio (in our case 1.0 ≤ ratio num_crit_band_samples(i) [blah,i]=max(bits); x(i) = x(i) + 1; bits(i) = bits(i) - 1; bitsleft=bitsleft-num_crit_band_samples(i); end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_ENCODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Quantized_Gain,quantized_words]=p_encode(x2,Fs,framelength,bit_alloc,scalebits) for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); Gain(i) = 2^(ceil(log2((max(abs(x2(indices(1):indices(end))+1e-10)))))); if Gain(i) < 1 Gain(i) = 1; end x2(indices(1):indices(end)) = x2(indices(1):indices(end)) / (Gain(i)+1e-10); Quantized_Gain(i) = log2(Gain(i)); end for i=1:length(x2) quantized_words(i) = midtread_quantizer(x2(i), max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)+1e-10); % 03/20/03 end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_QUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_quantizer(x,R) Q = 2 / (2^R - 1);

90

q = quant(x,Q); s = q1) & check_flag==0 clear gNodes HE_N=Pnode(HE_N); temp=HE_N; rcfs_nodes=[temp, rcfs_nodes]; if(HE_N==1)|(HE_N==2) rcfs_nodes=[rcfs_nodes, tn(path)]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); end % end while-III % send the nodes for reconstruction [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); % generate tonality index with frequency mapping

110

tIndex_f=tIndex_f+tin; % clear node list rcfs_nodes=[]; check_flag=0; else count=count_gn; container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') end end %end for

4) Node Reconstruction
% Vaibhav Chhabra % Thesis - Node Reconstruction Fucntion % Last Modified: 04/12/2005 % % This program gets the nodes from the wavelet analysis function and % reconstructs them using the inverse discrete wavelet transform. It then % sends the array of reconstructed nodes to the tonality estimator function [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count,tn) global length_block rcfs=zeros(count,length_block); % Initializing the array for i=1:length(gNodes) rcfs_str = ['rcfs',int2str(gNodes(i)),' = wprcoef(wpt, [gNodes(i)]);']; eval(rcfs_str); rcfs_store_str=['rcfs(',int2str(i) ',:) = rcfs',int2str(gNodes(i)),';']; eval(rcfs_store_str); end % check if gNodes is empty check=isempty(gNodes); if (check==0) [diff,count,gNodes]=ACFALG(rcfs,count,gNodes,tn); end

5) Tonality Estimation
% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the reconstructed nodes and analyzes them based on the % auto-covariance peaks function [diff,count,gNodes] = ACFALG(rcfs,count,gNodes,tn) global frame_count frames diff f check_flag=0;

111

check=isempty(rcfs); if (check==1) & (check_flag==0) container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') check_flag=1; end if (check==0) & (check_flag==0) % Calculate Autocorrelation Function ACFrcfs=zeros(count,21); for acf_count=1:count ACFrcfs(acf_count,:) = autocorr(rcfs(acf_count,:)); end % Calculate Autocovariance varACF=zeros(count,41); for var_count=1:count varACF(var_count,:) = xcov(ACFrcfs(var_count,:)); end % Uncomment this if you want to plot the auto-covariance of the ACF % for p=1:count % % if p==1 % plot(varACF(p,:));hold on % end % % if p==2 % plot(varACF(p,:),'g') % end % % if p==3 % plot(varACF(p,:),'r') % end % % if p==4 % plot(varACF(p,:),'y') % end % % if p==5 % plot(varACF(p,:),'c') % end % % if p==6 % plot(varACF(p,:),'*k') % end % % if p==7 % plot(varACF(p,:),'om') % end % end % hold off;title('AutoCovariance of ACF');xlabel('0-21 lags of ACF');ylabel('AC')

112

% legend=sprintf('Legend: blue-rcfs10, green-rcfs20, red-rcfs30, yellow-rcfs40, cyan-rcfs50, black*-rcfs60, purpleo-rcfs70'); % disp(legend); % Calculate AC difference if (var_count==1) & check_flag==0 max_var_rcfs10=max(varACF(1,1:10)); diff_sl_rcfs10=abs(max_var_rcfs10); container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; disp('Almost all the energy is in the Low-end'); gNodes; else if (max(varACF(1,:)) > max(varACF(2,:))) & check_flag==0 diff_rcfs1020=abs((max(varACF(1,:))-max(varACF(2,:)))); disp('type I Analysis - Peak Difference of AC-ACF'); gNodes; container_diff_rcfs1020(frame_count,1) = diff_rcfs1020; container_diff_rcfs1020(frame_count,2) = count; diff(frame_count,:)=container_diff_rcfs1020(frame_count,:)*1000; check_flag=1; end if (var_count>=4 & check_flag==0) min_var_rcfs10=min(varACF(1,(1:10))); if(min_var_rcfs10 max(varACF(1,:))) & check_flag==0 max_var_rcfs10=max(varACF(1,(1:10))); min_var_rcfs10=min(varACF(1,(1:10))); diff_sl_rcfs10=(max_var_rcfs10-min_var_rcfs10); disp('type II Analysis - Side Lobe Peaks of AC-ACF-rcfs10'); gNodes; container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; end

113

if (var_count==4) & check_flag==0 if (max(varACF(4,:)) > max(varACF(3,:))) max_var_rcfs40=max(varACF(4,:)); min_var_rcfs30=min(varACF(3,:)); diff_sl_rcfs4030=(max_var_rcfs40-min_var_rcfs30); disp('type III Analysis - noise like characteristics varACF40 and varACF30 peaks compared'); gNodes; container_diff_sl_rcfs4030(frame_count,1) = diff_sl_rcfs4030; container_diff_sl_rcfs4030(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs4030(frame_count,:)*1000; check_flag=1; end end end end % if for check==0 % send gNodes for frequency mapping fTable(gNodes,diff,f);

6) Frequency Mapping
% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the last node "generated nodes array" and maps them on a % frequency axis. It then stores them in an array corresponding to its % frame value (frame_count) function tin=fTable(gNodes,diff,store_gNodes,f) global frame_count tin tin=zeros(1,22050); i=gNodes(end); % Stupid way of mapping to the frequency axis if (gNodes(end)==1) for j=1:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==2) for j=11024:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end

114

end end if (gNodes(end)==3) for j=1:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==4) for j=5512:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==5) for j=11024:16538 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==6) for j=16538:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==7) for j=1:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==8) for j=2756:5512 if (round(diff(frame_count))==0)

115

tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==9) for j=5512:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==10) for j=8268:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==11) for j=11024:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==12) for j=13780:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==13) for j=16536:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==14) for j=19292:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else

116

tin(j)=diff(frame_count);end end end if (gNodes(end)==15) for j=1:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==16) for j=1378:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==17) for j=2756:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==18) for j=4134:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==19) for j=5512:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==20) for j=6890:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end

117

end if (gNodes(end)==21) for j=8268:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==22) for j=9646:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==23) for j=11024:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==24) for j=12402:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==25) for j=13780:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==26) for j=15158:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

118

if (gNodes(end)==27) for j=16536:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==28) for j=17914:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==29) for j=19292:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==30) for j=20670:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==31) for j=1:689 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==32) for j=689:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==33) for j=1378:2067

119

if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==34) for j=2067:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==35) for j=2756:3445 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==36) for j=3445:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==37) for j=4134:4823 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==38) for j=4823:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==39) for j=5512:6201 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count));

120

else tin(j)=diff(frame_count);end end end if (gNodes(end)==40) for j=6201:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==41) for j=6890:7579 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==42) for j=7579:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==43) for j=8268:8957 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==44) for j=8957:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==45) for j=9646:10335 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end

121

end end if (gNodes(end)==46) for j=10335:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==47) for j=11024:11713 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==48) for j=11713:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==49) for j=12402:13091 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==50) for j=13091:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==51) for j=13780:14469 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

122

if (gNodes(end)==52) for j=14469:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==53) for j=15158:15847 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==54) for j=15847:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==55) for j=16536:17225 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==56) for j=17225:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==57) for j=17914:18603 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==58)

123

for j=18603:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==59) for j=19292:19981 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==60) for j=19981:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==61) for j=20670:21359 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==62) for j=21359:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

124

Similar Documents

Free Essay

Dsp Textbook

...Digital Image Processing Second Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive Prentice Hall Upper Saddle River, New Jersey 07458 Library of Congress Cataloging-in-Pubblication Data Gonzalez, Rafael C. Digital Image Processing / Richard E. Woods p. cm. Includes bibliographical references ISBN 0-201-18075-8 1. Digital Imaging. 2. Digital Techniques. I. Title. TA1632.G66 621.3—dc21 2001 2001035846 CIP Vice-President and Editorial Director, ECS: Marcia J. Horton Publisher: Tom Robbins Associate Editor: Alice Dworkin Editorial Assistant: Jody McDonnell Vice President and Director of Production and Manufacturing, ESM: David W. Riccardi Executive Managing Editor: Vince O’Brien Managing Editor: David A. George Production Editor: Rose Kernan Composition: Prepare, Inc. Director of Creative Services: Paul Belfanti Creative Director: Carole Anson Art Director and Cover Designer: Heather Scott Art Editor: Greg Dulles Manufacturing Manager: Trudy Pisciotti Manufacturing Buyer: Lisa McDowell Senior Marketing Manager: Jennie Burger © 2002 by Prentice-Hall, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness...

Words: 66542 - Pages: 267