I want to detect seasonality in data that I receive. There are some methods that I have found like the seasonal subseries plot and the autocorrelation plot but the thing is I don't understand how to read the graph, could anyone help? The other thing is, are there other methods to detect seasonality with or without the final result in graph?
-
1$\begingroup$ you might include the actual graph you are having trouble understanding. $\endgroup$– KarlCommented Sep 27, 2011 at 19:02
-
$\begingroup$ More preferably the original data which can be used to generate the "troublesome" ACF. $\endgroup$– IrishStatCommented Sep 27, 2011 at 19:16
-
3$\begingroup$ See stats.stackexchange.com/q/1207/159 $\endgroup$– Rob HyndmanCommented Jul 21, 2013 at 1:21
-
1$\begingroup$ see: journals.ametsoc.org/doi/abs/10.1175/JCLI-D-10-05012.1 Qian, C., Z Wu, C Fu, and D Wang, 2011: On changing El Niño: A view from time-varying annual cycle, interannual variability and mean state. J. Climate, 24(24), 6486–6500 $\endgroup$– user59867Commented Nov 3, 2014 at 12:40
6 Answers
A really good way to find periodicity in any regular series of data is to inspect its power spectrum after removing any overall trend. (This lends itself well to automated screening when the total power is normalized to a standard value, such as unity.) The preliminary trend removal (and optional differencing to remove serial correlation) is essential to avoid confounding periods with other behaviors.
The power spectrum is the discrete Fourier transform of the autocovariance function of an appropriately smoothed version of the original series. If you think of the time series as sampling a physical waveform, you can estimate how much of the wave's total power is carried within each frequency. The power spectrum (or periodogram) plots the power versus frequency. Cyclic (that is, repetitive or seasonal patterns) will show up as large spikes located at their frequencies.
As an example, consider this (simulated) time series of residuals from a daily measurement taken for one year (365 values).
The values fluctuate around $0$ without any evident trends, showing that all important trends have been removed. The fluctuation appears random: no periodicity is apparent.
Here's another plot of the same data, drawn to help us see possible periodic patterns.
If you look really hard, you might be able to discern a noisy but repetitive pattern that occurs 11 to 12 times. The longish sequences of above-zero and below-zero values at least suggest some positive autocorrelation, showing this series is not completely random.
Here's the periodogram, shown for periods up to 91 (one-quarter of the total series length). It was constructed with a Welch window and normalized to unit area (for the entire periodogram, not just the part shown here).
The power looks like "white noise" (small random fluctuations) plus two prominent spikes. They're hard to miss, aren't they? The larger occurs at a period of 12 and the smaller at a period of 52. This method has thereby detected a monthly cycle and a weekly cycle in these data. That's really all there is to it. To automate detection of cycles ("seasonality"), just scan the periodogram (which is a list of values) for relatively large local maxima.
It's time to reveal how these data were created.
The values are generated from a sum of two sine waves, one with frequency 12 (of squared amplitude 3/4) and another with frequency 52 (of squared amplitude 1/4). These are what the spikes in the periodogram detected. Their sum is shown as the thick black curve. Iid Normal noise of variance 2 was then added, as shown by the light gray bars extending from the black curve to the red dots. This noise introduced the low-level wiggles at the bottom of the periodogram, which otherwise would just be a flat 0. Fully two-thirds of the total variation in the values is non-periodic and random, which is very noisy: that's why it's so difficult to make out the periodicity just by looking at the dots. Nevertheless (in part because there's so much data) finding the frequencies with the periodogram is easy and the result is clear.
Instructions and good advice for computing periodograms appear on the Numerical Recipes site: look for the section on "power spectrum estimation using the FFT." R
has code for periodogram estimation. These illustrations were created in Mathematica 8; the periodogram was computed with its "Fourier" function.
-
2$\begingroup$ The assumption"after removing any overall trend"is the Achilles Heel as there may be many time trends,many level shifts all of which were excluded in your example.The idea that the input series are deterministic in nature flies in the face of the possible presence of seasonal and regular ARIMA structure. Untreated Unusual One-Time Values will distort any periodogram-based identification scheme due to a downward bias to the periodogram estimates yielding non-significance.If weekly and/or monthly effects changed at some point in the past the periodogram-based procedure would fail $\endgroup$ Commented Sep 29, 2011 at 0:06
-
$\begingroup$ @Irish I think your comment may exaggerate somewhat. It is most elementary to look for and treat "Unusual One-Time Values" (aka outliers), so this only bears mentioning to emphasize that some time series estimators may be sensitive to outliers. "Deterministic in nature" misrepresents the basic ideas: nobody supposes there is determinism (as evidenced by the huge amount of noise in the simulation). The simulation incorporates a definite periodic signal as a model--always approximate in reality--only to illustrate the connection between the periodogram and seasonality. (Continued...) $\endgroup$– whuber ♦Commented Sep 29, 2011 at 16:41
-
3$\begingroup$ Yes, changes in seasonality can obscure the periodogram (and the acf, etc.), especially changes in frequency (unlikely) or phase (possible). The references in my post give a solution to handle that: they recommend using a moving window for periodogram estimation. There's an art to this, and clearly there are pitfalls, so that much time series analysis will benefit from expert treatment, as you advocate. But the question asks if there are "other methods to detect seasonality" and undeniably the periodogram is a statistically powerful, computationally efficient, readily interpretable option. $\endgroup$– whuber ♦Commented Sep 29, 2011 at 16:46
-
$\begingroup$ In my world using sines/cosines are "deterministic effects" much like month of the year indicators. Fitting any pre-specifed model restricts the fitted values to a user-specified pattern, often sub-standard. The data should be "listened to" as helping the analyst/advanced computer software to effectively discern between fixed and stochastic inputs n.b. I refer to ARIMA lags structures as stochastic or adaptive "drivers" as the fitted values adjust/adapt to changes in the history of the series. In my opinion the utilization of the periodogram "oversells" simple statistical modelling $\endgroup$ Commented Sep 29, 2011 at 17:44
-
2$\begingroup$ @whuber Repeating the same thing might not be useful. However, it might be nice too to fix the paragraph below the periodogram to say the spikes are located at a "frequency of" 12 and 52 times per year, and not "period of". Fixing the plot too to say "frequency" instead of "period" might be nice as well if you think it's not too annoying. $\endgroup$– CelelibiCommented Oct 11, 2016 at 15:29
Here's an example using monthly data on log unemployment claims from a city in New Jersey (from Stata, only because that's what I analyzed these data in originally).
The heights of the lines indicate the correlation between a variable and the sth lag of itself; the gray area gives you a sense of whether this correlation is significant (this range is a guide only and isn't the most reliable way to test the significance). If this correlation is high, there is evidence of serial correlation. Note the humps that occur around periods 12, 24, and 36. Since this is monthly data, this suggests that the correlation gets stronger when you look at periods exactly 1, 2, or 3 years previous. This is evidence of monthly seasonality.
You can test these relationships statistically by regressing the variable on dummy variables indicating the seasonality component---here, month dummies. You can test the joint significance of those dummies to test for seasonality.
This procedure isn't quite right, as the test requires that the error terms not be serially correlated. So, before testing these seasonality dummies, we need to remove the remaining serial correlation (typically by including lags of the variable). There may be pulses, breaks, and all the other time series problems that you need to correct as well to get the appropriate results from the test. You didn't ask about those, so I won't go into detail (plus, there are a lot of CV questions on those topics). (Just to feed your curiosity, this series requires the month dummies, a single lag of itself, and a shift component to get rid of the serial correlation.)
Seasonality can and does often change over time thus summary measures can be quite inadequate to detect structure. One needs to test for transience in ARIMA coefficients and often changes in the “seasonal dummies”. For example in a 10 year horizon there may not have been a June effect for the first k years but the last 10-k years there is evidence of a June effect. A simple composite June effect might be non-significant since the effect was not constant over time. In a similar manner a seasonal ARIMA component may have also changed. Care should be taken to include local level shifts and or local time trends while ensuring that the variance of the errors has remained constant over time. One should not evaluate transformations like GLS/weighted least Squares or power transformations like logs/square roots, etc. on the original data but on the errors from a tentative model. The Gaussian assumptions have nothing whatsoever to do with the observed data but all to do with the errors from the model. This is due to the underpinnings of the statistical tests which use the ratio of a non-central chi-square variable to a central chi-square variable.
If you wanted to post an example series from your world I would be glad to provide you and the list a thorough analysis leading to the detection of the seasonal structure.
Continuous wavelet transform can show the seasonality as well. Because the assumption of periodogram is the seasonality is stationary, wavelet is better than periodogram since it allows the change of seasonality along the time. Just like periodogram decomposes the time series into sine or cosine waves of different frequencies and calculates the power in each frequency, continuous wavelet transform decomposes the time series into Morlet wavelet of different frequencies, and calculate the power of the time series against each frequency.
This is an example of wavelet. We can see there is a strong signal of frequency of 0.02/kyr during 0-400 kyrs.
One issue of wavelet is that, since the data length is not enough to calculate wavelet at the ends of the time series(like first 100 days has 500-day cycle), the wavelet spectrum is not accurate(which is also called edge effects), cone of influence will be drawn(the dashed line in the wavelet above), and only the spectrum within cone of influence is reliable.
Helpful resources: https://en.wikipedia.org/wiki/Continuous_wavelet_transform https://www.youtube.com/watch?v=GV34hKXDw_c&t=189s
Picture comes from: http://mres.uni-potsdam.de/index.php/2017/03/02/calculating-the-continuous-1-d-wavelet-transform-with-the-new-function-cwt-update/
Charlie's answer is good, and it's where I'd start. If you don't want to use ACF graphs, you could create k-1 dummy variables for the k time periods present. Then you can see if the dummy variables are significant in a regression with the dummy variables (and likely a trend term).
If your data is quarterly: dummy Q2 is 1 if this is the second quarter, else 0 dummy Q3 is 1 if this is the third quarter, else 0 dummy Q4 is 1 if this is the fourth quarter, else 0 Note quarter 1 is the base case (all 3 dummies zero)
You might want to also check out "time series decomposition" in Minitab -- often called "classical decomposition". In the end, you may want to use something more modern, but this is a simple place to start.
I"m a bit new to R myself, but my understanding of the ACF function is that if the vertical line goes above the top dashed line or below the bottom dashed line, there is some autoregression (including seasonality). Try creating a vector of sine
-
$\begingroup$ Fitting sines/cosines etc can be useful for some physical/electrical time series but you must be aware of MSB , Model Specification Bias. $\endgroup$ Commented Sep 28, 2011 at 14:31
-
2$\begingroup$ Autoregression does not imply seasonality. $\endgroup$– JensCommented Nov 22, 2013 at 12:32