# Neural Networks for Time Series Forecasting: Practical ... Neural Networks for Time Series...

date post

17-Jul-2020Category

## Documents

view

0download

0

Embed Size (px)

### Transcript of Neural Networks for Time Series Forecasting: Practical ... Neural Networks for Time Series...

Neural Networks for Time Series Forecasting:

Practical Implications of Theoretical Results

Melinda Thielbar and D.A. Dickey

February 25, 2011

Research on the performance of neural networks in modeling nonlinear

time series has produced mixed results. While neural networks have great

potential because of their status as universal approximators (Hornik, Stinch-

combe, and White 1989), their flexibility can lead to estimation problems.

When Faraway and Chatfield (1998) used an autoregressive neural network

to forecast airline data, they found that the neural networks they speci-

fied frequently would not converge. When they did converge, they failed to

find the global minimum of the objective function. In some cases, neu-

ral networks that fit the in-sample data well performed poorly on hold-

out samples. In conducting the NN3 competition, a time series forecasting

competition designed to showcase autoregressive neural networks and other

computationally-intensive methods of forecasting, standard methods such as

ARIMA models still out-performed autoregressive neural networks (Crone et

1

2

al 2008).

A comparison of linear methods, smooth transition autoregressive meth-

ods, and autoregressive neural networks performed in Terasvirta (2005) may

shed some light on neural network estimation problems and general poor

performance. In Terasvirta (2005) it was discovered that when estimated

without constraints, autoregressive neural networks tended to yield explosive

forecasts, and only by hand-tuning the models or applying a post-estimation

filter to the resulting parameter estimates could they remove the worst of

the offenders. In addition, while some researchers claim that autoregressive

neural networks could estimate trend and seasonality (Gorr 1994, Sharda and

Patil 1992), in an empirical study on simulated series with seasonality and

trends Zhang and Qi (2005) showed that the neural network performed much

better after the series was adjusted for trends and seasonality. As Faraway

and Chatfield (1998) discovered, the autoregressive neural network could not

be treated as a “black box”.

In recent years, some researchers have made an attempt to open the box

and better understand the properties of autoregressive neural networks. In

Trapletti et al (2000), Trappletti, Leisch, and Hornik showed that an autore-

gressive neural network is stationary and ergodic (under certain sufficient

but not necessary regularity conditions). In Leoni (2009) sufficient condi-

tions whereby the skeleton of an autoregressive neural network approaches a

unique attraction point were defined.

We propose to build on the results in Trapletti et al (2000) and Leoni

1 STATEMENT OF THE PROBLEM 3

(2009) by examining the practical aspects of forecasting with neural networks,

including starting value selection, forecast performance, and the behavior of

series generated from a neural network with known parameters. We begin

by deriving some theoretical properties of an AR-NN with one lag. We

then shift to simulated results and focus on an autoregressive neural network

model with one lag and one hidden unit, where the noise term is distributed

N(0, 1). We find that the general properties derived in the first section

hold for our simple model, and that even when the AR-NN is reduced to

its simplest form, the practical aspects of model estimation can still cause

problems in using an AR-NN to forecast a nonlinear time series. We end

with some general conclusions, including cautions about some of the pitfalls

of AR-NN and recommendations for avoiding them.

1 Statement of the Problem

Consider the one-lag autoregressive neural network (AR-NN):

Yt = α0 + ρYt−1 + k∑ j=1

λjg(γjrt−1,j) + et (1)

where rt−1,j = Yt−1− cj, Yt−1 is the first lag of Yt and {c1, ..., ck} are location

parameters for the activation function g. The parameters {γ1, ..., γk} are

slope parameters for the activation function and the vector λ = {λ1, ..., λk}

is a vector of weights.

Let the activation function g be bounded and continuous, let ρ be such

1 STATEMENT OF THE PROBLEM 4

that |ρ| < 1, and let the et be iid with probability distribution function that

is positive everywhere in (−∞,∞). These assumptions are necessary for the

stability conditions shown in Trapletti et al (2000).

1.1 Process Skeleton and Attractors

In Tong (1993), it is suggested that simulation is often necessary for studying

nonlinear time series, as general analytical results are difficult to obtain and

often require restrictive assumptions. Even with modern computing power,

simulations for AR-NN can quickly grow too large to be practical. The

following results assume that the parameters for the AR-NN are given and

examine the theoretical behavior of the series under the regularity conditions

described above.

In Tong (1993), the behavior of a nonlinear time series is described in

terms of the skeleton and the skeleton’s equilibrium point(s). The skeleton,

as described in Tong (1993), is the relationship between Yt and Yt−1 when

the noise term et is set identically equal to 0 for all t. Denote the skeleton of

the series as St. The skeleton of the AR-NN described in (1) is:

St = α0 + ρSt−1 + k∑ j=1

λjg(γjr s t−1,j) (2)

where rst−1,j = St−1 − cj

The skeleton has also been called the deterministic portion of the random

process.

1 STATEMENT OF THE PROBLEM 5

We can rewrite St as:

St = (α0 + k∑ j=1

λjg(γjr s t−1,j))(1− ρB)−1 (3)

where B is the backshift operator.

This expresses the skeleton as a weighted sum of the previous value of the

skeleton and the nonlinear operation on the previous value of the skeleton.

Because g is a bounded function and |ρ| < 1, for t large enough, we can

expect St to be bounded:

St ∈ [min(α0+ k∑ j=1

λjg(γjr s t−1,j))(1−ρ)−1,max(α0+

k∑ j=1

λjg(γjr s t−1,j))(1−ρ)−1]

(4)

within a finite number of steps.

If |g| ≤ 1 (as with the logistic and hyperbolic tangent basis functions)

then the above range becomes:

St ∈ [( α∗0 −

k∑ j=1

|λj| )(

1− ρ )−1

, ( α∗0 +

k∑ j=1

|λj| )(

1− ρ )−1]

We have assumed that the noise term et is iid, and therefore we know

that for any � > 0, there exists M such that P ( |et| ≤M

) > (1− �) i.e. that

et is bounded in probability. We can therefore place et inside a finite range

such that the probability of observing a value for et outside this range can

be shrunk arbitrarily close to 0. This allows us to stay within the conditions

1 STATEMENT OF THE PROBLEM 6

set in Trapletti et al (2000), yet have an expected range for the noise term.

The bounds on the St and et imply that there is a practical range for Yt

that depends on the activation function g and the parameters {λ1, ...λk}, and

M . For the normal distribution with mean 0, a popular default distributional

assumption in the statistical literature, and a basis function g that is bounded

such that |g| < 1, M = 3σ, and the range for Yt is:

Yt ∈ [( α∗0 −

k∑ j=1

|λj| )(

1− ρ )−1 − 3σ, (α∗0 + k∑

j=1

|λj| )(

1− ρ )−1

+ 3σ ]

The practical range for Yt may be further reduced by the presence of

equilibria and whether those points are attractors for the series. We use the

definitions of equilibrium and attraction point as defined in Tong (1993).

A point Y ∗ is an equilibrium point if it satisfies the condition:

Y ∗ = α0 + ρY ∗ +

k∑ j=1

λjg(γj(Y ∗ − cj)) (5)

i.e. if it is a point where the result from the skeleton is the same as the

value going into the skeleton. Once St reaches Y ∗ it will remain at the same

value for all t → ∞. The long-run behavior of the AR-NN is dependent on

the existence of Y ∗ and whether it is stable or unstable. We begin by deriving

some basic properties of Y ∗ for a series that meets the stability conditions

set forth in Trapletti et al (2000).

Theorem 1. Suppose Yt is an AR-NN with one lag and an arbitrary number

1 STATEMENT OF THE PROBLEM 7

of hidden units that meets the stability conditions in Trapletti et al (2000),

then if Y ∗ exists, it can be bounded by a finite range that depends on the

weights of the activation function and the intercept term for the AR-NN.

Proof: We can re-arrange equation (5) as follows:

Y ∗ = ( α∗0 +

k∑ j=1

λjg(γj(Y ∗ − cj))

)( 1− ρ

)−1 (6)

The equilibrium point Y ∗ then becomes the solution to equation (6), and

must lie within the same bounds as set for the skeleton in (4).

The properties of the equilibrium, particularly whether it is unique and

stable, now become critical to understanding the behavior

*View more*