Free Essay

Neural Network

In:

Submitted By Jitesh19
Words 7306
Pages 30
EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Introduction to feedforward neural networks
1. Problem statement and historical context
A. Learning framework
Figure 1 below illustrates the basic framework that we will see in artificial neural network learning. We assume that we want to learn a classification task G with n inputs and m outputs, where, y = G(x) ,

(1)

x = x1 x2 … xn

T

and y = y 1 y 2 … y m

T

.

(2)

In order to do this modeling, let us assume a model Γ with trainable parameter vector w , such that, z = Γ ( x, w )

(3)

where, z = z1 z2 … zm

T

.

(4)

Now, we want to minimize the error between the desired outputs y and the model outputs z for all possible inputs x . That is, we want to find the parameter vector w∗ so that,
E ( w∗ ) ≤ E ( w ) , ∀w ,

(5)

where E ( w ) denotes the error between G and Γ for model parameter vector w . Ideally, E ( w ) is given by,
E(w) =



y – z 2 p ( x ) dx

(6)

x

where p ( x ) denotes the probability density function over the input space x . Note that E ( w ) in equation (6) is dependent on w through z [see equation (3)]. Now, in general, we cannot compute equation (6) directly; therefore, we typically compute E ( w ) for a training data set of input/output data,
{ ( x i, y i ) } , i ∈ { 1, 2, …, p } ,

(7)

where x i is the n -dimensional input vector, x i = x i 1 x i 2 … x in

T

(8)

x2

y2




Unknown mapping G

xn

ym

z1 z2 Trainable model Γ


zm

-1-

model outputs

y1



inputs

x1

desired outputs

corresponding to the i th training pattern, and y i is the m -dimensional output vector,

Figure 1

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

y i = y i 1 y i 2 … y im

T

(9)

corresponding to the i th training pattern, i ∈ { 1, 2, …, p } . For (7), we can define the computable error function E ( w ) ,
1
E ( w ) = -2

p



i=1

yi – zi

2

1
= -2

p

m

∑ ∑ ( yij – zij ) 2

(10)

i = 1j = 1

where, z i ≡ Γ ( x i, w ) .

(11)

If the data set is well distributed over possible inputs, equation (10) gives a good approximation of the error measure in (6).
As we shall see shortly, artificial neural networks are one type of parametric model Γ for which we can minimize the error measure in equation (10) over a given training data set. Simply put, artificial neural networks are nonlinear function approximators, with adjustable (i.e. trainable) parameters w , that allow us to model functional mappings, including classification tasks, between inputs and outputs.
B. Biological inspiration

axon


dentrites

So why are artificial neural networks called artificial neural networks? These models are referred to as neural networks because their structure and function is loosely based on biological neural networks, such as the human brain. Our brains consist of basic cells, called neurons, connected together in massive and parallel fashion. An individual neuron receives electrical signals from dentrites, connected from other neurons, and passes on electrical signals through the neuron’s output, the axon, as depicted (crudely) in Figure 2 below.

neuron
Figure 2

axon output

A neuron’s transfer function can be roughly approximated by a threshold function as illustrated in Figure 3 below. In other words, a neuron’s axon fires if the net stimulus from all the incoming dentrites is above some threshold. Learning in our brain occurs through adjustment of the strength of connection between neurons (at the axon-dentrite junction). [Note, this description is a gross simplification of what really goes on in a brain; nevertheless, this brief summary is adequate for our purposes.]

net stimulus from dentrites

-2-

Figure 3

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Now, artificial neural networks attempt to crudely emulate biological neural networks in the following important ways:
1. Simple basic units are the building blocks of artificial neural networks. It is important to note that artificial
“neurons” are much, much simpler than their biological counterparts.
2. Individual units are connected massively and in parallel.
3. Individual units have threshold-type activation functions.
4. Learning in artificial neural networks occurs by adjusting the strength of connection between individual units. These parameters are known as the weights of the neural network.
We point out that artificial neural networks are much, much, much simpler than complex biological neural networks (like the human brain). According to the Encyclopedia Britannica, the average human brain consists of approximately 10 10 individual neurons with approximately 10 12 connections. Even very complicated artificial neural networks typically do not have more than 10 4 to 10 5 connections between, at most, 10 4 individual basic units.
As of September, 2001, an INSPEC database search generated over 45,000 hits with the keyword “neural network.” Considering that neural network research did not really take off until 1986, with the publication of the backpropagation training algorithm, we see that research in artificial neural networks has exploded over the past 15 years and is still quite active today. We will try to cover some of the highlights of that research. First, however, we will formalize our discussion above, clearly defining what a neural network is, and how we can train artificial neural networks to model input/output data; that is, how learning occurs in artificial neural networks.

2. What makes a neural network a neural network?
A. Basic building blocks of neural networks
Figure 4 below illustrates the basic building block of artificial neural networks; the unit’s basic function is intended to roughly approximate the behavior of biological neurons, although biological neurons tend to be orders-of-magnitude more complex than these artificial units.
In Figure 4, φ ≡ φ0 φ1 … φq
˜

T

(12)

represents a vector of scalar inputs to the unit, where the φ i variables are either neural network inputs x j , or the outputs from previous units, including the bias unit φ 0 , which is fixed at a constant value (typically 1).
Also,
w ≡ ω0 ω1 … ωq

T

(13)

represents the input weights of the unit, indicating the strength of connection from the unit inputs φ i ; as we shall see later, these are the trainable parameters of the neural network. Finally, γ represents the (typically nonlinear) activation function of the unit, and ψ represents the scalar output of the unit where, ψ γ ω0 φ0 = 1

ω1

ω2

φ1

φ2
-3-



ωq φq Figure 4

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

 q

ψ ≡ γ ( w ⋅ φ ) = γ  ∑ ω i φ i
i = 0

˜

(14)

Thus, a unit in an artificial neural network sums up its total input and passes that sum through some (in general) nonlinear activation function.
B. Perceptrons
A simple perceptron is the simplest possible neural network, consisting of only a single unit. As shown in
Figure 6, the output unit’s activation function is the threshold function, u≥θ u<θ

1 γ t(u) = 
0

(15)

which we plot in Figure 5. The output z of the perceptron is thus given by,

γ t(u)

1

Figure 5

0

u

θ
1
z = 
0

w⋅x≥0 w⋅x<0 (16)

where, x = 1 x1 … xn

T

and,

w = ω0 ω1 … ωn

(17)

T

(18)

A perceptron like that pictured in Figure 6 is capable of learning a certain set of decision boundaries, specifically those that are linearly separable. The property of linear separability is best understood geometrically.
Consider the two, two-input Boolean functions depicted in Figure 7 — namely, the OR and the XOR functions (filled circles represent 0, while hollow circles represent 1). The OR function can be represented (and learned) by a two-input perceptron, because a straight line can completely separate the two classes. In other z ω0

ωn ω1 ω2

bias unit
1

x1



x2
Figure 6
-4-

xn

EEL5840: Machine Intelligence

ω 0 = – 0.5

Introduction to feedforward neural networks

x2

x2

1

1

0.6

ω2 = 1

0.6

0.4

ω1 = 1

0.8

0.8

0.4

0.2

0.2

0.2

0.4

0.6

0.8

OR function

1

x1

0.2

0.4

0.6

0.8

1

x1

XOR function

Figure 7

words, the two classes are linearly separable. On the other hand, the XOR function cannot be represented (or learned) by a two-input perceptron because a straight line cannot completely separate one class from the other. For three inputs and above, whether or not a Boolean function is representable by a simple perceptron depends on whether or not a plane (or a hyperplane) can completely separate the two classes.
The algorithm for learning a linearly separable Boolean function is known as the perceptron learning rule, which is guaranteed to converge for linearly separable functions. Since this training algorithm does not generalize to more complicated neural networks, discussed below, we refer the interested reader to [2] for further details. C. Activation function
In biological neurons, the activation function can be roughly approximated as a threshold function [equation
(15)], as in the case of the simple perceptron above. In artificial neural networks that are more complicated than simple perceptrons, we typically emulate this biological behavior through nonlinear functions that are similar to the threshold function, but are, at the same time, continuous and differentiable. [As we will see later, differentiability is an important and necessary property for training neural networks more complicated than simple perceptrons.] Thus, two common activation functions used in artificial neural networks are the sigmoid function,
1
γ ( u ) = ---------------1 + e –u

(19)

or the hyperbolic tangent function, e u – e –u γ ( u ) = -----------------e u + e –u

(20)

These two functions are plotted in Figure 8 below. Note that the two functions closely resemble the threshold function in Figure 5 and differ from each other only in their respective output ranges; the sigmoid function’s range is [ 0, 1 ] , while the hyperbolic tangent function’s range is [ – 1, 1 ] . In some cases, when a system output does not have a predefined range, its corresponding output unit may use a linear activation function, hyperbolic tangent

sigmoid
1

1
0.8

0.5

γ (u)

γ (u)

0.6
0.4

0

-0.5

0.2
0

-1
-10

-5

u

0

5

10

-10

Figure 8
-5-

-5

u

0

5

10

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

γ (u) = u

(21)

From Figure 8, the role of the bias unit φ 0 should now be a little clearer; its role is essentially equivalent to the threshold parameter θ in Figure 5, allowing the unit output ψ to be shifted along the horizontal axis.
D. Neural network architectures
Figures 9 and 10 show typical arrangements of units in artificial neural networks. In both figures, all connections are feedforward and layered; such neural networks are commonly referred to as feedforward multilayer perceptrons (MLPs). Note that units that are not part of either the input or output layer of the neural network are referred to as hidden units, in part since their output activations cannot be directly observed from the outputs of the neural network. Note also that each unit in the neural network receives as input a connection from the bias unit.
The neural networks in Figures 9 and 10 are typical of many neural networks in use today in that they arrange the hidden units in layers, fully connected between consecutive layers. For example, ALVINN, a neural network that learned how to autonomously steer an automobile on real roads by mapping coarse camera images of the road ahead to corresponding steering directions [3], used a single-hidden-layer architecture to achieve its goal (see Figure 11 below).
MLPs are, however, not the only appropriate or allowable neural network architecture. For example, it is frequently advantageous to have direct input-output connections; such connections, which jump hidden-unit layers, are sometimes referred to as shortcut connections. Furthermore, hidden units do not necessarily have to be arranged in layers; later in the course, we will, for example, study the cascade learning architecture, an adaptive architecture that arranges hidden units in a particular, non-layered manner. We will say more about neural network architectures later within the context of specific, successful neural network applications.
Finally, we point out that there also exist neural networks that allow cyclic connections; that is, connections from any unit in the neural network to any other unit, including self-connections. These recurrent neural networks present additional challenges and will be studied later in the course; for now, however, we will confine our studies to feedforward (acyclic) neural networks only.
E. Simple example
Consider the simple, single-input, single-output neural network shown in Figure 12 below. Assuming sigmoidal hidden-unit and linear output-unit activation functions (equations (19) and (21), respectively), what values of the weights { ω 1, ω 2, …, ω 7 } will approximate the function f ( x ) in Figure 12? z1 z2

zm output layer

signal flow (feedforward)



bias unit



1

bias unit
1

x1

x2
Figure 9
-6-



hidden unit layer

xn

input layer

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z1

z2

zm output layer

signal flow (feedforward)



bias unit



hidden unit layer #2



hidden unit layer #1

1

bias unit
1

bias unit x1 1



x2

xn

input layer

Figure 10
To answer this question, let us first express f ( x ) in terms of threshold activation functions [equation (15)]: f(x) = c[γ t(x – a) – γ t(x – b)]

(22)

f ( x ) = cγ t ( x – a ) – cγ t ( x – b )

(23)

Recognizing that the threshold function can be approximated arbitrarily well by a sigmoid function [equation
(19)],
γ t ( u ) → γ ( ku ) as k → ∞

(24)

we can rewrite (23) in terms of sigmoidal activation functions,
Straight
Ahead

Sharp
Right

30 Output
Units

4 Hidden
Units

30x32 Sensor
Input Retina

Figure 11

-7-

ALVINN: Neural Network for Autonomous Steering

Sharp
Left

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z

f(x) ω7 ω6 ω5 ω2

ω1

ω4

ω3

bias unit

c

x

1

0 a b

Figure 12

x

f ( x ) ≈ cγ [ k ( x – a ) ] – cγ [ k ( x – b ) ] for large k .

(25)

Now, let us write down an expression for z , the output of the neural network. From Figure 12, z = ω5 + ω6 γ ( ω1 + ω2 x ) + ω7 γ ( ω3 + ω4 x )

(26)

Comparing (25) and (26), we arrive at two possible sets of weight values for approximating f ( x ) with z : weights ω1

ω2

ω3

ω4

ω5

ω6

ω7

set #1

– kb

k

– ka

k

0

–c

c

set #2

– ka

k

– kb

k

0

c

–c

3. Some theoretical properties of neural networks
A. Single-input functions
From the example in Section 2(E), we can conclude that a single-hidden layer neural network can model any single-input function arbitrarily well with a sufficient number of hidden units, since any one-dimensional function can be expressed as the sum of localized “bumps.” It is important to note, however, that typically, a neural network does not actually approximate functions as the sum of localized bumps. Consider, for example, Figure 13. Here, we used a three-hidden neural network to approximate a scaled sine wave. Note that even with only three hidden units, the maximum neural network error is less than 0.01.
B. Multi-input functions
Now, does this universal function approximator property for single-hidden layer neural networks hold for multi-dimensional functions? No, because the creation of localized peaks in multiple dimensions requires an
1
0.004
0.8

NN error

0.002

f(x)

0.6
0.4
0.2

0
-0.002
-0.004
-0.006
-0.008

0
0

200

400

x

600

800

1000

Figure 13
-8-

0

200

400

600

x

800

1000

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

additional hidden layer. Consider, for example, Figure 14 below, where we used a four-hidden unit network to create a localized peak. Note, however, that unlike in the single-dimensional example, secondary ridges are also present. Thus, an additional sigmoidal hidden unit in a second layer is required to suppress the secondary ridges, but, at the same time, preserve the localized peak. This ad hoc “proof” indicates that any multi-input function can be modeled arbitrarily well by a two-hidden-layer neural network, as long as a sufficient number of hidden units are present in each layer. A formal proof of this is given by Cybenko [1].
Figure 14

x2 x1 4. Neural network training
There are three basic steps in applying neural networks to real problems:
1. Collect input/output training data of the form:
{ ( x i, y i ) } , i ∈ { 1, 2, …, p } ,

(27)

where x i is the n -dimensional input vector, x i = x i1 x i2 … x in

T

(28)

corresponding to the i th training pattern, and y i is the m -dimensional output vector, y i = y i1 y i2 … y im

T

(29)

corresponding to the i th training pattern, i ∈ { 1, 2, …, p } .
2. Select an appropriate neural network architecture. Generally, this involves selecting the number of hidden layers, and the number of hidden units in each layer. For notational convenience, let, z = Γ ( w, x )

(30)

denote the m -dimensional output vector z for the neural network Γ , with q -dimensional weight vector w, w = ω1 ω2 … ωq

T

(31)

and input vector x . Thus, z i = Γ ( w, x i )

(32)

denotes the neural network outputs z i corresponding to the input vector for the i th training pattern.
3. Train the weights of the neural network to minimize the error measure,
-9-

EEL5840: Machine Intelligence

1
E = -2

Introduction to feedforward neural networks

p



i=1

yi – zi

2

1
= -2

p

m

∑ ∑ ( yij – zij ) 2

(33)

i = 1j = 1

which measures the difference between the neural network outputs z i and the training data outputs y i .
This error minimization is also frequently referred to as learning.
Steps 1 and 2 above are quite application specific and will be discussed a little later. Here, we will begin to investigate Step 3 — namely, the training of the neural network parameters (weights) from input/output training data.
A. Gradient descent
Note that since z i (as defined in equation (32) above) is a function of the weights w of the neural network, E is implicitly a function of those weights as well. That is, E changes as a function of w . Therefore, our goal is to find that set of weights w∗ which minimizes E over a given training data set.
The first algorithm that we will study for neural network training is based on a method known as gradient descent. To understand the intuition behind this algorithm, consider Figure 15 below, where a simple onedimensional error surface is drawn schematically. The basic question we must answer is: how do we find the parameter ω∗ that corresponds to the minimum of that error surface (point d )?
Gradient descent offers a partial answer to this question. In gradient descent, we initialize the parameter ω to some random value and then incrementally change that value by an amount proportional to the negative derivative, dE
– -----dω

(34)

Denoting ω ( t ) as parameter ω at step t of the gradient descent procedure, we can write this in equation form as, dE ω ( t + 1 ) = ω ( t ) – η ------------dω ( t )

(35)

where η is a small positive constant that is frequently referred to as the learning rate. In Figure 15, given an initial parameter value of a and a small enough learning rate, gradient descent will converge to the global minimum d as t → ∞ . Note, however, that the gradient descent procedure is not guaranteed to always converge to the global minimum for general (non-convex) error surfaces. If we start at an initial ω value of b , iteration (35) will converge to e , while for an initial ω value of c , gradient descent will converge to f as t → ∞ . Thus, gradient descent is only guaranteed to converge to a local minimum of the error surface (for sufficiently small learning rates η ), not a global minimum.

E(ω)

b a e

c

f

d

Figure 15 ω - 10 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Iteration (35) is easily generalized to error minimization over multiple dimensions (i.e. parameter vectors w ), w ( t + 1 ) = w ( t ) – η ∇E [ w ( t ) ]

(36)

where ∇E [ w ( t ) ] denotes the gradient of E with respect to w ( t ) ,
∂E
∂E
∇E [ w ( t ) ] = ∂E

∂ ω1 ( t ) ∂ ω2 ( t )
∂ ωq ( t )

T

(37)

Thus, one approach for training the weights in a neural network implements iteration (37) with the error measure defined in equation (33).
B. Simple example
Consider the simple single-input, single-output feedforward neural network in Figure 16 below, with sigmoidal hidden-unit activation functions γ , and a linear output unit. For this neural network, let us, by way of example, compute,
∂E
∂ ω4

(38)

where,
1
E = -- ( y – z ) 2
2

(39)

for a single training pattern 〈 x, y〉 . Note that since differentiation is a linear operator, the derivative for multiple training patterns is simply the sum of the derivatives of the individual training patterns,
∂E
∂ 1
-=
∂ ωj
∂ ωj 2

p

∑ ( yi – zi

i=1

p

)2



∑ ∂ω

=

i=1

j

2
1
-- ( y i – z i ) .
2

(40)

Therefore, generalizing the example below to multiple training patterns is straightforward.
First, let us explicitly write down z as a function of the neural network weights. To do this, we define some intermediate variables, net 1 ≡ ω 1 + ω 2 x

(41)

net 2 ≡ ω 3 + ω 4 x

(42)

which denote the net input to the two hidden units, respectively, and, h 1 ≡ γ ( net 1 )

(43)

h 2 ≡ γ ( net 2 )

(44)

which denote the outputs of the two hidden units, respectively. Thus, z ω7

ω6 ω5 ω2

ω1

ω4

ω3

bias unit
1

x
- 11 -

Figure 16

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z = ω 5 + ω 6 h 1 + ω 7 h 2 (linear output unit).

(45)

Now, we can compute the derivative of E with respect to ω 4 . From (39), and remembering the chain rule of differentiation, ∂z
∂E
= –( y – z )
∂ ω4
∂ ω4

(46)

∂z   ∂h 2   ∂net 2
∂E
-----------= (z – y)
 ∂ h 2  ∂ net 2  ∂ω 4 
∂ ω4



(47)

∂E
= ( z – y )ω 7 γ'net 2 )x
(
∂ ω4

(48)

where γ'denotes the derivative of the activation function. This example shows that, in principle, computing the partial derivatives required for the gradient descent algorithm simply requires careful application of the chain rule. In general, however, we would like to be able to simulate neural networks whose architecture is not known a priori. In other words, rather than hard-code derivatives with explicit expressions like (48) above, we require an algorithm which allows us to compute derivatives in a more general way. Such an algorithm exists, and is known as the backpropagation algorithm.
C. Backpropagation algorithm
The backpropagation algorithm was first published by Rumelhart and McClelland in 1986 [4], and has since led to an explosion in previously dormant neural-network research. Backpropagation offers an efficient, algorithmic formulation for computing error derivatives with respect to the weights of a neural network. As such, it allows us to implement gradient descent for neural network training without explicitly hard-coding derivatives.
In order to develop the backpropagation algorithm, let us first look at an arbitrary (hidden or output) unit in a feedforward (acyclic) neural network with activation function γ . In Figure 17, that unit is labeled j . Let h j be the output of unit j , and let net j be the net input to unit j . By definition, h j ≡ γ ( net j )

(49)

net j ≡ ∑ h k ω kj

(50)

k

Note that net j is summed over all units feeding into unit j ; unit i is one of those units. Let us now compute,
∂E   ∂net j
∂E
----------= 
 ∂ net j  ∂ω ij 
∂ ω ij

(51)

From equation (50),
∂net j
----------- = h i
∂ω ij

(52)

hj unit j hi unit i

ω ij

- 12 -

γ

net j
Figure 17

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

since all the terms in summation (50), k ≠ i are independent of ω ij . Defining, δj ≡

∂E
∂ net j

(53)

we can write equation (51) as,
∂E
= δj hi
∂ ω ij

(54)

As we will see shortly, equation (54) forms the basis of the backpropagation algorithm in that the δ j variables can be computed recursively from the outputs of the neural network back to the inputs of the neural network.
In other words, the δ j values are backpropagated through the network (hence, the name of the algorithm).
D. Backpropagation example
Consider Figure 18, which plots a small part of a neural network. Below, we derive an expression for δ k (output unit) and δ j (hidden unit one layer removed from the outputs of the neural network). For a single training pattern, we can write,
1
E = -2

m

∑ ( yl – zl ) 2

(55)

l=1

where l indexes the outputs (not the training patterns). Now, δk ≡

∂z k
∂E
∂E
=    ------------ 
 ∂ z k  ∂net k
∂ net k

(56)

Since, z k = γ ( net k )

(57)

we have that,
∂z k
------------ = γ'net k )
(
∂net k

(58) zk unit k

γ net k ω jk

hj γ unit j

net j hi unit i

ω ij

Figure 18
- 13 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Furthermore, from equation (55),
∂E
= ( zk – yk )
∂ zk

(59)

since all the terms in summation (55), l ≠ k are independent of z k . Combining equations (56), (58) and (59), and recalling equation (54), δ k = ( z k – y k )γ'net k )
(

(60)

∂E
= δk hj
∂ ω jk

(61)

Note that equations (60) and (61) are valid for any weight in a neural network that is connected to an output unit. Also note that h j is the output value of units feeding into output unit k . While this may be the output of a hidden unit, it could also be the output of the bias unit (i.e. 1) or the value of a neural network input (i.e. x j ).
Next, we want to compute δ j in Figure 18 in terms of the δ values that follow unit j . Going back to definition (53), δj ≡

∂E
=
∂ net j

∂net l

∂E

∑  ∂ net   ------------

  ∂net  l l

(62)

j

Note that the summation in equation (62) is over all the immediate successor units of unit j . Thus, δj =

∂net l

∑ δl  ------------
 ∂net 

(63)

j

l

By definition,

∑ ωsl γ ( nets )

net l =

(64)

s

So, from equation (64),
∂net l
----------- = ω jl γ'net j )
(
∂net j

(65)

since all the terms in summation (64), s ≠ j are independent of net j . Combining equations (63) and (65), δj =

(
∑ δl ωjl γ'netj )

(66)

l

δ j =  ∑ δ l ω jl γ'net j )
(



(67)

∂E
= δj hi
∂ ω ij

(68)

l

Note that equation (67) computes δ j in terms of those δ values one connection ahead of unit j . In other words, the δ values are backpropagated from the outputs back through the network. Also note that h i is the output value of units feeding into unit j . While this may be the output of a hidden unit from an earlier hiddenunit layer, it could also be the output of a bias unit (i.e. 1) or the value of a neural network input (i.e. x i ).
It is important to note that (1) the general derivative expression in (54) is valid for all weights in the neural network; (2) the expression for the output δ values in (60) is valid for all neural network output units; and (3) the recursive relationship for δ j in (67) is valid for all hidden units, where the l -indexed summation is over all immediate successors of unit j .
- 14 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

E. Summary of backpropagation algorithm
Below, we summarize the results of the derivation in the previous section. The partial derivative of the error,
1
E = -2

m

∑ ( yl – zl ) 2

(69)

l=1

(i.e. a single training pattern) with respect to a weight ω jk connected to output unit k of a neural network is given by, δ k = ( z k – y k )γ'net k )
(

(70)

∂E
= δk hj
∂ ω jk

(71)

where h j is the output of hidden unit j (or the input j ), and net k is the net input to output unit k . The partial derivative of the error E with respect to a weight ω ij connected to hidden unit j of a neural network is given by, δ j =  ∑ δ l ω jl γ'net j )
(



(72)

∂E
= δj hi
∂ ω ij

(73)

l

where h i is the output of hidden unit i (or the input i ), and net j is the net input to hidden unit j . The above results are trivially extended to multiple training patterns by summing the results for individual training patterns over all training patterns.

5. Basic steps in using neural networks
So, now we know what a neural network is, and we know a basic algorithm for training neural networks (i.e. backpropagation). Here, we will extend our discussion of neural networks by discussing some practical aspects of applying neural networks to real-world problems. Below, we review the steps that need to be followed in using neural networks.
A. Collect training data
In order to apply a neural network to a problem, we must first collect input/output training data that adequately represents that problem. Often, we also need to condition, or preprocess that data so that the neural network training converges more quickly and/or to better local minima of the error surface. Data collection and preprocessing is very application-dependent and will be discussed in greater detail in the context of specific applications.
B. Select neural network architecture
Selecting a neural network architecture typically requires that we determine (1) an appropriate number of hidden layers and (2) an appropriate number of hidden units in each hidden layer for our specific application, assuming a standard multilayer feedforward architecture. Often, there will be many different neural network structures that work about equally well; which structures are most appropriate is frequently guided by experience and/or trial-and-error. Alternatively, as we will talk about later in this course, we can use neural network learning algorithms that adaptively change the structure of the neural network as part of the learning process.
C. Select learning algorithm
If we use simple backpropagation, we must select an appropriate learning rate η . Alternatively, as we will talk about later in this course, we have a choice of more sophisticated learning algorithms as well, including the conjugate gradient and extended Kalman filtering methods.
- 15 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

D. Weight initialization
Weights in the neural network are usually initialized to small, random values.
E. Forward pass
Apply a random input vector x i from the training data set to the neural network and compute the neural network outputs ( z k ) , the hidden-unit outputs ( h j ) , and the net input to each hidden unit ( net j ) .
F. Backward pass
1. Evaluate δ k at the outputs, where,
∂E
δ k = -----------∂net k

(74)

for each output unit.
2. Backpropagate the δ values from the outputs backwards through the neural network.
3. Using the computed δ values, calculate,
∂E
------- ,
∂ω i

(75)

the derivative of the error with respect to each weight ω i in the neural network.
4. Update the weights based on the computed gradient, w ( t + 1 ) = w ( t ) – η ∇E [ w ( t ) ] .

(76)

G. Loop
Repeat steps E and F (forward and backward passes) until training results in a satisfactory model.

6. Practical issues in neural networks
A. What should the training data be?
Some questions that need to be answered include:
1. Is your training data sufficient for the neural network to adequately learn what you want it to learn? For example, what if, in ALVINN [3], we down-sampled to 10 × 10 images, instead of 30 × 32 images? Such coarse images would probably not suffice for learning the steering of the on-road vehicle with enough accuracy. At the same time we must make sure that we don’t include training data that is too much or irrelevant for our application (e.g. for ALVINN, music played while driving). Poorly correlated or irrelevant inputs can easily slowdown convergence of, or completely sidetrack, neural network learning algorithms.
2. Is your training data biased? Suppose for ALVINN, we trained the neural network on race track oval. How would ALVINN drive on real roads? Well, it would probably not have adequately learned right turns, since the race track consists of left turns only. The distribution of your training data needs to approximately reflect the expected distribution of input data where the neural network will be used after training.
3. Is your task deterministic or stochastic? Is it stationary or nonstationary? Nonstationary problems cannot be trained from fixed data sets, since, by definition, things change over time.
We will have more on these concerns within the context of specific applications later.
B. What should your neural network architecture/structure be?
This question is largely task dependent, and often requires experience and/or trial-and-error to answer adequately. Therefore, we will have more on this question within the context of specific applications later. In
- 16 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

general, though, it helps to look at similar problems that have previously been solved with neural networks, and apply the lessons learned there to our current application. Adaptive neural network architectures, that change the structure of the neural network as part of training, are also an alternative to manually selecting an appropriate structure.
C. Preprocessing of data
Often, it is wise to preprocess raw input/output training data, since it can make the learning (i.e. neural network training) converge much better and faster. In computer vision applications, for example, intensity normalization can remove variation in intensity — caused perhaps by sunny vs. overcast days — as a potential source of confusion for the neural network. We will have more on this question within the context of specific applications later.
D. Weight initialization
Since the weight parameters w are learned through the recursive relationship in (76), we obviously need to initialized the weights [i.e. set w ( 0 ) ]. Typically, the weights are initialized to small, random values. If we were to initialize the weights to uniform (i.e. identical) values instead, the significant weight symmetries in the neural network would substantially reduce the effective parameterization of the neural network since many partial error derivatives in the neural network would be identical at the beginning of training and remain so throughout. If we were to initialize the weights to large values, there is a high likelihood that many of the hidden unit activations in the neural network would be stuck in the flat areas of the typical sigmoidal activation functions, where the derivatives evaluate to approximately zero. As such, it could take quite a long time for the weights to converge.
E. Select a learning parameter
If using standard gradient descent, we must select an appropriate learning rate η . This can be quite tricky, as the simple example below illustrates. Consider the trivial two-dimensional, quadratic “error” function,
2
2
E = 20ω 1 + ω 2

(77)

which we plot in Figure 19 below. [Note that equation (77) could never really be a neural network error function, since a neural network typically has many hundreds or thousands of weights.]
For this error function, note that the global minimum occurs at ( ω 1, ω 2 ) = ( 0, 0 ) . Now, let us investigate how quickly gradient-descent converges to this global minimum for different learning rates η ; for the purposes of this example, we will say that gradient descent has converged when E < 10 – 6 . First, we must compute the derivatives,
∂E
-------- = 40ω 1 , and,
∂ω 1

(78)

∂E
-------- = 2ω 2 ,
∂ω 2

(79)

40

E

2

20

1.5

0
-1.5

1
0.5

-1
-0.5

ω1

- 17 -

ω2

0

0
0.5
1

-0.5

Figure 19

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

so that the gradient-descent weight recursion in (76) is given by,
∂E
ω 1 ( t + 1 ) = ω 1 ( t ) – η --------------∂ω 1 ( t )

(80)

ω 1 ( t + 1 ) = ω 1 ( t ) ( 1 – 40η )

(81)

and similarly, ω 2 ( t + 1 ) = ω 2 ( t ) ( 1 – 2η ) .

(82)

From an initial point ( ω 1, ω 2 ) = ( 1, 2 ) , Figure 20 below plots the number of steps to convergence as a function of the learning parameter η . Note that the number of steps to convergence decreases as a function of the learning rate parameter η until about 0.047 (intuitive), but then shoots up sharply until 0.05 , at which point the gradient-descent equations in (81) and (82) become unstable and diverge (counter-intuitive).

# steps to convergence

1400
1200
1000
800
600
400
200
0

0.01

0.02

0.03

0.04

η

0.05

Figure 20

Figure 21 plots some actual gradient-descent trajectories for the learning rates 0.02 , 0.04 and 0.05 . Note that for η = 0.05 , gradient descent does not converge but oscillates about ω 2 = 0 . To understand why this is happening, consider the fixed-point iterations in (81) and (82). Each of these is of the form, ω ( t + 1 ) = cω ( t )

(83)

which will diverge for any nonzero ω ( 0 ) and c > 1 , and converge for c < 1 . Thus, equation (81) will converge for,
1 – 40η < 1

(84)

– 1 < 1 – 40η < 1

(85)

η = 0.02

η = 0.04

η = 0.05

2

2

1.5

1.5

1.5

1

ω2

2

1

1

ω2

0.5

0

ω2

0.5

0

-0.5

0

-0.5
-1.5

-1

-0.5

ω1

0

0.5

1

0.5

-0.5
-1.5

-1

-0.5

ω1
Figure 21
- 18 -

0

0.5

1

-1.5

-1

-0.5

ω1

0

0.5

1

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

0 < η < 0.05

(86)

Since recursion (82) generates the weaker bound,
0<η<1,

(87)

the upper bound in (86) is controlling in that it determines the range of learning rates for which gradient descent will converge in this example.
We make a few observations from this specific example: First, “long, steep-sided valleys” in the error surface typically cause slow convergence with a single learning rate, since gradient descent will converge quickly down the steep valleys of the error surface, but will take a long time to travel along the shallow valley. Slow convergence of gradient descent is largely why we will study more sophisticated learning algorithms, with de facto adaptive learning rates, later in this course. In this example, convergence along the ω 2 axis is assured for larger η ; however, the upper bound in (86) prevents us from using a (fixed) learning rate greater than or equal to 0.05 . Second, Figure 20, although drawn specifically for this example, is generally reflective of gradient-descent convergence rates for more complex error surfaces as well. If the chosen learning rate is too small, convergence can take a very long time, while learning rates that are too large will cause gradient descent to diverge. This is another reason to study more sophisticated algorithms — since selecting an appropriate learning rate can be quite frustrating, algorithms that do not require such a selection have a real advantage. Finally, note that, in general, it is not possible to determine theoretical convergence bounds, such as those in (86), for real neural networks and error functions. Only the very simple error surface in (77) allowed us to do that here.
F. Pattern vs. batch training
In pattern training, we compute the error E and the gradient of the error ∇E for one input/output pattern at a time, and update weights based on that single training example (Section 5 describes pattern training). It is usually a good idea to randomize the order of training patterns in pattern training, so that the neural network does not converge to a bad local minima or forget training examples early in the training.
In batch training, we compute the error E and the gradient of the error ∇E for all training examples at once, and update the weights based on that aggregate error measure.
G. Good generalization
Generalization to examples not explicitly seen in the training data set is one of the most important properties of a good model, including neural network models. Consider, for example, Figure 22. Which is a better model, the left curve or the right curve? Although the right curve (i.e. model) has zero error over the specific data set, it will probably generalize more poorly to points not in the data set, since it appears to have modeled the noise properties of the specific training data set. The left model, on the other hand, appears to have abstracted the essential feature of the data, while rejecting the random noise superimposed on top.

y

y

x

Figure 22
- 19 -

x

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

NN error

cross-validation data

training data

Figure 23 early stopping point

training time

There are two ways that we can ensure that neural networks generalize well to data not explicitly in the training data set. First we need to pick a neural network architecture that is not over-parameterized — in other words, the smallest neural network that will perform its task well. Second, we can use a method known as cross-validation. In typical neural network training, we take our complete data set, and split that data set in two. The first data set is called the training data set, and is used to actually train the weights of the neural network; the second data set is called the cross-validation data set, and is not explicitly used in training the weights; rather, the cross-validation set is reserved as a check on neural network learning to prevent overtraining. While training (with the training data set), we keep track of both the training data set error and the cross-validation data set error. When the cross-validation error no longer decreases, we should stop training, since that is a good indication that further learning will adjust the weights only to fit peculiarities of the training data set. This scenario is depicted in the generic diagram of Figure 23 below, where, we plot neural network error as a function of training time. As we indicate in the figure, the training data set error will generally be lower than the cross-validation data set error; moreover, the training data set error will usually continue to decrease as a function of training time, whereas the cross-validation data set error will typically begin to increase at some point in the training.
[1] G. Cybenko, “Approximation by Superposition of a Sigmoidal Function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303-14, 1989.
[2] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, 2nd ed., Chapters 5 and 6,
John Wiley & Sons, New York, 2001. .
[3] D. A. Pomerleau, “Neural Network Perception for Mobile Robot Guidance,” Ph.D. Thesis, School of
Computer Science, Carnegie Mellon University, 1992.
[4] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Exploration in the Microstructure of Cognition, vols. 1 and 2, MIT Press, Cambridge, MA, 1986.

- 20 -

Similar Documents

Free Essay

Neural Network

...– MGT 501 Neural Network Technique Outline * Overview ………………………………………………………….……… 4 * Definition …………………………………………………4 * The Basics of Neural Networks……………………………………………5 * Major Components of an Artificial Neuron………………………………..5 * Applications of Neural Networks ……………….9 * Advantages and Disadvantages of Neural Networks……………………...12 * Example……………………………………………………………………14 * Conclusion …………………………………………………………………14 Overview One of the most crucial and dominant subjects in management studies is finding more effective tools for complicated managerial problems, and due to the advancement of computer and communication technology, tools used in management decisions have undergone a massive change. Artificial Neural Networks (ANNs) is an example, knowing that it has become a critical component of business intelligence. The below article describes the basics of neural networks as well as some work done on the application of ANNs in management sciences. Definition of a Neural Network? The simplest definition of a neural network, particularly referred to as an 'artificial' neural network (ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen who defines a neural network as follows: "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs."Neural Network Primer: Part...

Words: 3829 - Pages: 16

Free Essay

Neural Network

...ARTIFICIAL NEURAL NETWORK FOR SPEECH RECOGNITION One of the problem found in speech recognition is recording samples never produce identical waveforms. This happens due to different in length, amplitude, background noise, and sample rate. This problem can be encountered by extracting speech related information using Spectogram. It can show change in amplitude spectra over time. For example in diagram below: X Axis : TimeY Axis : FrequencyZ Axis : Colour intensity represents magnitude | | A cepstral analysis is a popular method for feature extraction in speech recognition applications and can be accomplished using Mel Frequency Cepstrum Coefficient (MFCC) analysis Input Layer is 26 Cepstral CoefficientsHidden Layer is 100 fully-connected hidden-layerWeight is range between -1 and +1 * It is initially random and remain constantOutput : * 1 output unit for each target * Limited to values between 0 and +1 | | First of all, spoken digits were recorded. Seven samples of each digit consist of “one” through “eight” and a total of 56 different recordings with varying length and environmental conditions. The background noise was removed from each sample. Then, calculate MFCC using Malcolm Slaney’s Auditory Toolbox which is c=mfcc(s,fs,fix((3*fs)/(length(s)-256))). Choose intended target and create a target vector. If the training network recognise spoken one, target has a value of +1 for each of the known “one” stimuli and 0 for everything else. This will be supervised...

Words: 341 - Pages: 2

Free Essay

Arificial Neural Network

...A Review of ANN-based Short-Term Load Forecasting Models Y. Rui A.A. El-Keib Department of Electrical Engineering University of Alabama, Tuscaloosa, AL 35487 Abstract - Artificial Neural Networks (AAN) have recently been receiving considerable attention and a large number of publications concerning ANN-based short-term load forecasting (STLF) have appreared in the literature. An extensive survey of ANN-based load forecasting models is given in this paper. The six most important factors which affect the accuracy and efficiency of the load forecasters are presented and discussed. The paper also includes conclusions reached by the authors as a result of their research in this area. Keywords: artificial neural networks, short-term load forecasting models Introduction Accurate and robust load forecasting is of great importance for power system operation. It is the basis of economic dispatch, hydro-thermal coordination, unit commitment, transaction evaluation, and system security analysis among other functions. Because of its importance, load forecasting has been extensively researched and a large number of models were proposed during the past several decades, such as Box-Jenkins models, ARIMA models, Kalman filtering models, and the spectral expansion techniques-based models. Generally, the models are based on statistcal methods and work well under normal conditions, however, they show some deficiency in the presence of an abrupt change in environmental or sociological variables...

Words: 3437 - Pages: 14

Free Essay

Artificial Neural Network Essentials

...NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos |   Abstract This report is an introduction to Artificial Neural Networks. The various types of neural networks are explained and demonstrated, applications of neural networks like ANNs in medicine are described, and a detailed historical background is provided. The connection between the artificial and the real thing is also investigated and explained. Finally, the mathematical models involved are presented and demonstrated. Contents: 1. Introduction to Neural Networks 1.1 What is a neural network? 1.2 Historical background 1.3 Why use neural networks? 1.4 Neural networks versus conventional computers - a comparison   2. Human and Artificial Neurones - investigating the similarities 2.1 How the Human Brain Learns? 2.2 From Human Neurones to Artificial Neurones   3. An Engineering approach 3.1 A simple neuron - description of a simple neuron 3.2 Firing rules - How neurones make decisions 3.3 Pattern recognition - an example 3.4 A more complicated neuron 4. Architecture of neural networks 4.1 Feed-forward (associative) networks 4.2 Feedback (autoassociative) networks 4.3 Network layers 4.4 Perceptrons 5. The Learning Process  5.1 Transfer Function 5.2 An Example to illustrate the above teaching procedure 5.3 The Back-Propagation Algorithm 6. Applications of neural networks 6.1 Neural networks in practice 6.2 Neural networks in medicine 6.2.1 Modelling and Diagnosing the Cardiovascular...

Words: 7770 - Pages: 32

Free Essay

Segmentation Using Neural Networks

...SEGMENTATION WITH NEURAL NETWORK B.Prasanna Rahul Radhakrishnan Valliammai Engineering College Valliammai Engineering College prakrish_2001@yahoo.com krish_rahul_1812@yahoo.com Abstract: Our paper work is on Segmentation by Neural networks. Neural networks computation offers a wide range of different algorithms for both unsupervised clustering (UC) and supervised classification (SC). In this paper we approached an algorithmic method that aims to combine UC and SC, where the information obtained during UC is not discarded, but is used as an initial step toward subsequent SC. Thus, the power of both image analysis strategies can be combined in an integrative computational procedure. This is achieved by applying “Hyper-BF network”. Here we worked a different procedures for the training, preprocessing and vector quantization in the application to medical image segmentation and also present the segmentation results for multispectral 3D MRI data sets of the human brain with respect to the tissue classes “ Gray matter”, “ White matter” and “ Cerebrospinal fluid”. We correlate manual and semi automatic methods with the results. Keywords: Image analysis, Hebbian learning rule, Euclidean metric, multi spectral image segmentation, contour tracing. Introduction: Segmentation can be defined as the identification of meaningful image components. It is a fundamental task in image processing providing the basis for any kind of...

Words: 2010 - Pages: 9

Free Essay

Artificial Neural Network for Biomedical Purpose

...ARTIFICIAL NEURAL NETWORKS METHODOLOGICAL ADVANCES AND BIOMEDICAL APPLICATIONS Edited by Kenji Suzuki Artificial Neural Networks - Methodological Advances and Biomedical Applications Edited by Kenji Suzuki Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Ivana Lorkovic Technical Editor Teodora Smiljanic Cover Designer Martina Sirotic Image Copyright Bruce Rolff, 2010. Used under license from Shutterstock.com First published March, 2011 Printed in...

Words: 43079 - Pages: 173

Free Essay

Neural Networks for Matching in Computer Vision

...Neural Networks for Matching in Computer Vision Giansalvo Cirrincione1 and Maurizio Cirrincione2 Department of Electrical Engineering, Lab. CREA University of Picardie-Jules Verne 33, rue Saint Leu, 80039 Amiens - France exin@u-picardie.fr Universite de Technologie de Belfort-Montbeliard (UTBM) Rue Thierry MIEG, Belfort Cedex 90010, France maurizio.cirricione@utbm.fr 1 2 Abstract. A very important problem in computer vision is the matching of features extracted from pairs of images. At this proposal, a new neural network, the Double Asynchronous Competitor (DAC) is presented. It exploits the self-organization for solving the matching as a pattern recognition problem. As a consequence, a set of attributes is required for each image feature. The network is able to find the variety of the input space. DAC exploits two intercoupled neural networks and outputs the matches together with the occlusion maps of the pair of frames taken in consideration. DAC can also solve other matching problems. 1 Introduction In computer vision, structure from motion (SFM) algorithms recover the motion and scene parameters by using a sequence of images (very often only a pair of images is needed). Several SFM techniques require the extraction of features (corners, lines and so on) from each frame. Then, it is necessary to find certain types of correspondences between images, i.e. to identify the image elements in different frames that correspond to the same element in the scene. This paper...

Words: 3666 - Pages: 15

Free Essay

A 3-Layer Artificial Neural Network

...1. Describe (a) the basic structure of and (b) the learning process for a 3-layer artificial neural network. A 3-layer artificial neural network consists of an input, output and a hidden layer in the middle. For e.g. To recognize male and female faces, the input layer would be made up of a computer program analyzing a camera shot. The output layer would be the word male or female appearing on the screen. The hidden layer is where all action takes place and connections are made between input and output. In an ANN these connections are mathematical. It works by learning from success (hits) and failures (misses) by making adjustments in these mathematical connections. 2. According to Churchland, why does intrapersonal (within one person) moral conflict occur? Intrapersonal moral conflict occurs when some contextual feature is alternately magnified or minimized and one’s overall perceptual take flips back and forth between two distinct activation patterns in the neighborhood of 2 distinct prototypes. In such case, an individual is morally conflicted eg. Should I protect a friends feeling by lying about someone’s hurtful slur or should I tell him the truth? 3. According to Churchland, when should moral correction occur and why? According to Churchland, moral correction should occur at an early age, before child turns into a young adult. Reasons - 1. Firstly, cognitive plasticity and eagerness to imitate found...

Words: 549 - Pages: 3

Free Essay

Prediction of Oil Prices Using Neural Networks

...Oil Price Prediction using Artificial Neural Networks Author: Siddhant Jain, 2010B3A7506P Birla Institute of Technology and Science, Pilani Abstract: Oil is an important commodity for every industrialised nation in the modern economy. The upward or downward trends in Oil prices have crucially influenced economies over the years and a priori knowledge of such a trend would be deemed useful to all concernd - be it a firm or the whole country itself. Through this paper, I intend to use the power of Artificial Neural Networks (ANNs) to develop a model which can be used to predict oil prices. ANNs are widely used for modelling a multitude of financial and economic variables and have proven themselves to be a very powerful tool to handle volumes of data effectively and analysing it to perform meaningful calculations. MATLAB has been employed as the medium for developing the neural network and for efficiently handling the volume of calculations involved. Following sections shall deal with the theoretical and practical intricacies of the aforementioned model. The appendix includes snapshots of the generated results and other code snippets. Artificial Neural Networks: Understanding To understand any of the ensuing topics and the details discussed thereof, it is imperative to understand what actually we mean by Neural Networks. So, I first dwell into this topic: In simplest terms a Neural Network can be defined as a computer system modelled on the human brain and nervous system...

Words: 3399 - Pages: 14

Free Essay

Rough Set Approach for Feature Reduction in Pattern Recognition Through Unsupervised Artificial Neural Network

...First International Conference on Emerging Trends in Engineering and Technology Rough Set Approach for Feature Reduction in Pattern Recognition through Unsupervised Artificial Neural Network A. G. Kothari A.G. Keskar A.P. Gokhale Rucha Pranjali Lecturer Professor Professor Deshpande Deshmukh agkothari72@re B.Tech Student B.Tech Student diffmail.com Department of Electronics & Computer Science Engineering, VNIT, Nagpur Abstract The Rough Set approach can be applied in pattern recognition at three different stages: pre-processing stage, training stage and in the architecture. This paper proposes the application of the Rough-Neuro Hybrid Approach in the pre-processing stage of pattern recognition. In this project, a training algorithm has been first developed based on Kohonen network. This is used as a benchmark to compare the results of the pure neural approach with the RoughNeuro hybrid approach and to prove that the efficiency of the latter is higher. Structural and statistical features have been extracted from the images for the training process. The number of attributes is reduced by calculating reducts and core from the original attribute set, which results into reduction in convergence time. Also, the above removal in redundancy increases speed of the process reduces hardware complexity and thus enhances the overall efficiency of the pattern recognition algorithm Keywords: core, dimensionality reduction, feature extraction, rough sets, reducts, unsupervised ANN as any...

Words: 2369 - Pages: 10

Premium Essay

Market Segmentation

...www.elsevier.com/locate/atoures Annals of Tourism Research, Vol. 32, No. 1, pp. 93–111, 2005 Ó 2005 Elsevier Ltd. All rights reserved. Printed in Great Britain 0160-7383/$30.00 doi:10.1016/j.annals.2004.05.001 MARKET SEGMENTATION A Neural Network Application Jonathan Z. Bloom University of Stellenbosch, South Africa Abstract: The objective of the research is to consider a self-organizing neural network for segmenting the international tourist market to Cape Town, South Africa. A backpropagation neural network is used to complement the segmentation by generating additional knowledge based on input–output relationship and sensitivity analyses. The findings of the self-organizing neural network indicate three clusters, which are visually confirmed by developing a comparative model based on the test data set. The research also demonstrated that Cape Metropolitan Tourism could deploy the neural network models and track the changing behavior of tourists within and between segments. Marketing implications for the Cape are also highlighted. Keywords: segmentation, SOM neural network, input–output analysis, sensitivity analysis, deployment. Ó 2005 Elsevier Ltd. All rights reserved. ´ ´ Resume: Segmentation du marche: une application du reseau neuronal. Le but de la ´ ´ recherche est de considerer un reseau neuronal auto-organisateur pour segmenter le marche ´ ´ ´ touristique international a Cape Town, en Afrique du Sud. On utilise un reseau neuronal de ` ´ retropropogation pour...

Words: 7968 - Pages: 32

Free Essay

Hurst Wx

...stronger trend. In this paper we investigate the use of the Hurst exponent to classify series of financial data representing different periods of time. Experiments with backpropagation Neural Networks show that series with large Hurst exponent can be predicted more accurately than those series with H value close to 0.50. Thus Hurst exponent provides a measure for predictability. KEY WORDS Hurst exponent, time series analysis, neural networks, Monte Carlo simulation, forecasting In time series forecasting, the first question we want to answer is whether the time series under study is predictable. If the time series is random, all methods are expected to fail. We want to identify and study those time series having at least some degree of predictability. We know that a time series with a large Hurst exponent has strong trend, thus it’s natural to believe that such time series are more predictable than those having a Hurst exponent close to 0.5. In this paper we use neural networks to test this hypothesis. Neural networks are nonparametric universal function approximators [9] that can learn from data without assumptions. Neural network forecasting models have been widely used in financial time series analysis during the last decade [10],[11],[12]. As universal function approximators, neural networks can be used for surrogate predictability. Under the same conditions, a time series with a smaller forecasting error than another is said to be more predictable. We study the Dow-Jones...

Words: 1864 - Pages: 8

Free Essay

Stereoscopic Building Reconstruction Using High-Resolution Satellite Image Data

...Stereoscopic Building Reconstruction Using High-Resolution Satellite Image Data Anonymous submission Abstract—This paper presents a novel approach for the generation of 3D building model from satellite image data. The main idea of 3D modeling is based on the grouping of 3D line segments. The divergence-based centroid neural network is employed in the grouping process. Prior to the grouping process, 3D line segments are extracted with the aid of the elevation information obtained by using area-based stereo matching of satellite image data. High-resolution IKONOS stereo images are utilized for the experiments. The experimental result proved the applicability and efficiency of the approach in dealing with 3D building modeling from high-resolution satellite imagery. Index Terms—building model, satellite image, 3D modeling, line segment, stereo I. I NTRODUCTION Extraction of 3D building model is one of the important problems in the generation of an urban model. The process aims to detect and describe the 3D rooftop model from complex scene of satellite imagery. The automated extraction of the 3D rooftop model can be considered as an essential process in dealing with 3D modeling in the urban area. There has been a significant body of research in 3D reconstruction from high-resolution satellite imagery. Even though a natural terrain can be successfully reconstructed in a precise manner by using correlation-based stereoscopic processing of satellite images [1], 3D building reconstruction...

Words: 2888 - Pages: 12

Free Essay

Ebusiness-Process-Personalization Using Neuro-Fuzzy Adaptive Control for Interactive Systems

...International Review of Business Research Papers Vol.2. No.4. December 2006, Pp. 39-50 eBusiness-Process-Personalization using Neuro-Fuzzy Adaptive Control for Interactive Systems Zunaira Munir1 , Nie Gui Hua2 , Adeel Talib3 and Mudassir Ilyas4 ‘Personalization’, which was earlier recognized as the 5th ‘P’ of e-marketing , is now becoming a strategic success factor in the present customer-centric e-business environment. This paper proposes two changes in the current structure of personalization efforts in ebusinesses. Firstly, a move towards business-process personalization instead of only website-content personalization and secondly use of an interactive adaptive scheme instead of the commonly employed algorithmic filtering approaches. These can be achieved by applying a neuro-intelligence model to web based real time interactive systems and by integrating it with converging internal and external e-business processes. This paper presents a framework, showing how it is possible to personalize e-business processes by adapting the interactive system to customer preferences. The proposed model applies Neuro-Fuzzy Adaptive Control for Interactive Systems (NFACIS) model to converging business processes to get the desired results. Field of Research: Marketing, e-business 1. Introduction: As Kasanoff (2001) mentioned, the ability to treat different people differently is the most fundamental form of human intelligence. "You talk differently to your boss than to...

Words: 4114 - Pages: 17

Free Essay

Prediction and Optimisation of Fsw

...EXECUTIVE SUMMARY INTRODUCTION/BACKGROUND The objective of the thesis is to predict and optimize the mechanical properties of Aircraft fuselage aluminium (AA5083). Firstly, data-driven modelling techniques such as Artificial Neural – Fuzzy networks and regressive analysis are used and by making the effective use of experimental data, FIS membership function parameters are trained. At the core, mathematical model that functionally relates tool rotational speed and forward movement per revolution to that of Yield strength, Ultimate strength and Weld quality are obtained. Also, simulations are performed, and the actual values are compared with the predicted values. Finally, multi-objective optimization of mechanical properties fuselage aluminium was undertaken using Genetic Algorithm to improve the performance of the tools industrially. AIMS AND OBJECTIVES Objectives of the dissertation include  Understanding the basic principles of operation of Friction Stir Welding (FSW).  Gaining experience in modelling and regressive analysis.  Gaining expertise in MATLAB programming.  Identifying the best strategy to achieve the yield strength, Ultimate Tensile strength and Weld quality of Friction Stir Welding.  Performing optimization of mechanical properties of FSW using Genetic Algorithm. I  To draw conclusions on prediction of mechanical properties of FSW optimization of aircraft fuselage aluminium. ACHIEVEMENTS  The basic principles of friction welding of the welding...

Words: 9686 - Pages: 39