A lot of data is recorded in time domain, which means you will have a datapoint in the form of
timestamp: value
A useful approach to get insights into the data is, to decompose the timeseries. That usually means, you seperate your data into
- seasonal
- trend
- residual
This famous library from R (`decompose`) is available in Python via statsmodel since version 0.6. Yeah! Let’s take a look into it with the parking lot data of city of Dresden.
The Data
The Open Data guys of Dresden (@offenesdresden) collected parking lot occupancy of a shopping mall called ‘Centrum-Galerie’ in the city of Dresden for over a year. After my talk at PyData 2015, a guy from NewYork came to me (thank you!) and said, I should decompose the data first and try to predict the occupancy of the parking lots with the decomposed timeseries. I tried, but the results were not that good, like with my approach (see talk video). Give it a try:
Centrum-Galerie-Belegung.csvNever the less, at least this blogpost came out of this.
Pandas Time Series Decomposition with Python
After loading the .csv with Pandas with
import pandas as pd centrumGalerie = pd.read_csv('Centrum-Galerie-Belegung.csv', names=['Datum', 'Belegung'], index_col=['Datum'], parse_dates=True) centrumGalerie.Belegung.plot()
we can simply decompose the data with statsmodels:
import statsmodels.api as sm
The `seasonal_decompose()` function needs a parameter called `freq`, which could be computed from the Pandas Timeseries, but is not fully functional right now. So we have to specify it for ourselves. The frequency of decomposition must be an interval, which ‘may’ repeat. Like a hour, a week, a day or something one is interested in. Our data is stored with 15min resolution and I want to see a weekly seasonality, so our `freq` is
\(decompfreq = \frac{24h \cdot 60min}{15min} \cdot 7days\)The Python implementation is this:
decompfreq = 24*60/15*7
Now we can decompose the Pandas TimeSeries with statsmodels:
res = sm.tsa.seasonal_decompose(centrumGalerie.Belegung.interpolate(), freq=decompfreq, model='additive') resplot = res.plot()
The resulting decomposed timeseries is looking like this:
We chose `additive`, so you can add Trend+Seasonal+Residual, which should result in the `Observed`.
Evaluation of the TimeSeries Decomposition
The most interesting is the ‘Trend’, which is clearly showing some impacts of school holidays and christmas in Germany. Obviously, a lot of people drove back to the city, to gave back or change their christmas presents after 24.12.. One may ask, what the huge increase in the trend in the end of April 2015 was? Well, let’s take a look, what happened next to the ‘Centrum-Galerie’, where also a lot of parking spots were located: Beginning of a huge construction site (sorry, german).
5 Comments
Danke schoen for the nice article. May I ask some questions?
1. Why is there a large gap in the graphs at around Dec 2014? Maybe Xmas holidays?
2. Does the gap not cause any problem in the process of decomposition?
I am considering a problem of missing data in data file. That is, no data during holidays but it is not regular; some weeks have 5 days but others have 4 or 3. If the freq is simply set to 5, I expect the result will not correct, and therefore a kind of data manipulation is required. But I don’t know how. If you know anything about this, any comment is very welcome. Thanks.
when we get the decomposition components, how to predict the future steps?
Sun Cellular
Smart Communications
Bitte aktualisiere öfter, weil ich deinen Blog liebe. Vielen Dank!