30 April 2014

[R]Time series analysis

what is that?

A time series is a collection of observations from repeated measurements over time. Data observations can be either discrete or continuous. Both types can be equally spaced, unequally spaced, or even have missing data. Discrete measurements can be recorded at any time interval, but are mostly taken at evenly-spaced intervals. Continuous measurements can be spaced randomly in time. There are the example of continuous time series: measuring rainfall, stream discharge, spring discharge, or even earthquake; and these are the example of discrete time series: water quality, water temperature.

why do we need it?

Every analysis has goals, and the goals of time series analysis are simply to understand and to forecast.

  • to understand : we need to understand the system that produced the observed data, in this case the hydrological system in a defined watershed.
  • to forecast : we need to make a prediction based on the trends of previous data trends. We'd like to forecast the flood or groundwater level drop in the future. Our assets are the historical data pattern and our knowledge of potential upcoming event that can impact the forecast itself.

Both goals are defined in mathematical equations. You don't have to create one. Let's just use the one we already know. Then why it is so important? Because we'd like to decide how we can solve the problem in the system and what measures we should choose to manage the available water resources.

what terms do we use?

  • variance, autocovariance and partial autocovariance
  • correlation and autocorrelation
  • white noise
  • residuals
  • time series model
  • random walk
  • moving average

how do we do it?

Time series are very complex because each observation is somewhat dependent upon the previous observation. Just like a continuous chain reaction. Therefore our challenge is to extract the autocorrelation between data, either to understand the trend or to model the underlying mechanisms.

Aside to the complexity, we have to build the simplest model by setting an assumption or set of assumptions. Remember The Occam's Razor. It is assumed that: a time series data set has at least one systematic pattern. The most common patterns are trends and seasonality. Trends are generally linear or quadratic. To find trends, moving averages or regression analysis is often used. Seasonality is a trend that repeats itself systematically over time.
A second assumption is that the data exhibits enough of a random process so that it is hard to identify the systematic patterns within the data. Data filtering is often employed to dampen the error.

A forecasting task usually involves five basic steps (Hyndman and Athana­sopou­los­, 2012)l

Step 1: Problem definition Often this is the most difficult part of forecasting. Defining the problem carefully requires an understanding of the way the forecasts will be used, who requires the forecasts, and how the forecasting function fits within the organization requiring the forecasts. A forecaster needs to spend time talking to everyone who will be involved in collecting data, maintaining databases, and using the forecasts for future planning.

Step 2: Gathering information There are always at least two kinds of information required: (a) statistical data, and (b) the accumulated expertise of the people who collect the data and use the forecasts. Often, it will be difficult to obtain enough historical data to be able to fit a good statistical model. However, occasionally, very old data will be less useful due to changes in the system being forecast.

Step 3: Preliminary (exploratory) analysis Always start by graphing the data. Are there consistent patterns? Is there a significant trend? Is seasonality important? Is there evidence of the presence of business cycles? Are there any outliers in the data that need to be explained by those with expert knowledge? How strong are the relationships among the variables available for analysis? Various tools have been developed to help with this analysis. These are discussed in Chapters 2 and 6.

Step 4: Choosing and fitting models The best model to use depends on the availability of historical data, the strength of relationships between the forecast variable and any explanatory variables, and the way the forecasts are to be used. It is common to compare two or three potential models. Each model is itself an artificial construct that is based on a set of assumptions (explicit and implicit) and usually involves one or more parameters which must be "fitted" using the known historical data. We will discuss regression models (Chapters 4 and 5), exponential smoothing methods (Chapter 7), Box-Jenkins ARIMA models (Chapter 8), and a variety of other topics including dynamic regression models, neural networks, and vector autoregression in Chapter 9.

Step 5: Using and evaluating a forecasting model Once a model has been selected and its parameters estimated, the model is used to make forecasts. The performance of the model can only be properly evaluated after the data for the forecast period have become available. A number of methods have been developed to help in assessing the accuracy of forecasts. There are also organizational issues in using and acting on the forecasts. A brief discussion of some of these issues is in Chapter 2.

References:

  • http://www.itl.nist.gov/
  • http://userwww.sfsu.edu/
  • http://www.public.iastate.edu/~alicia/stat328/Time%20Series.pdf
  • http://math.ucr.edu/home/baez/physics/General/occam.html
  • Hyndman, RJ and Athana­sopouo­l G2012, osForecasting: principles and practic, available athttps://www.otexts.org/fpp, accessed 30 April 2014.

No comments: