import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
We are going to do time series analysis and forecasting for furniture sales.
df = pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']
We have a good 4-year furniture sales data.
furniture['Order Date'].min()
Timestamp('2014-01-06 00:00:00')
furniture['Order Date'].max()
Timestamp('2017-12-30 00:00:00')
This step includes removing columns we do not need, check missing values, aggregate sales by date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
Order Date 0 Sales 0 dtype: int64
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()
Order Date | Sales | |
0 | 2014-01-06 | 2573.820 |
1 | 2014-01-07 | 76.728 |
2 | 2014-01-10 | 51.940 |
3 | 2014-01-11 | 9.940 |
4 | 2014-01-13 | 879.939 |
furniture = furniture.set_index('Order Date')
DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11', '2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19', '2014-01-20', '2014-01-21', ... '2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28', '2017-12-29', '2017-12-30'], dtype='datetime64[ns]', name='Order Date', length=889, freq=None)
Our current datetime data can be tricky to work with, therefore, we will use the averages daily sales value for that month instead, and we are using the start of each month as the timestamp.
y = furniture['Sales'].resample('MS').mean()
Have a quick peek 2017 sales data.
Order Date 2017-01-01 397.602133 2017-02-01 528.179800 2017-03-01 544.672240 2017-04-01 453.297905 2017-05-01 678.302328 2017-06-01 826.460291 2017-07-01 562.524857 2017-08-01 857.881889 2017-09-01 1209.508583 2017-10-01 875.362728 2017-11-01 1277.817759 2017-12-01 1256.298672 Freq: MS, Name: Sales, dtype: float64
y.plot(figsize=(15, 6))
Some distinguishable patterns appear when we plot the data. The time-series has seasonality pattern, such as sales are always low at the beginning of the year and high at the end of the year. There is always a strong upward trend within any single year with a couple of low months in the mid of the year.
We can also visualize our data using a method called time-series decomposition that allows us to decompose our time series into three distinct components: trend, seasonality, and noise.
from pylab import rcParams
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
fig = decomposition.plot()
The plot above clearly shows that the sales of furniture is unstable, along with its obvious seasonality.
We are going to apply one of the most commonly used method for time-series forecasting, known as ARIMA, which stands for Autoregressive Integrated Moving Average.
Parameter Selection for the ARIMA Time Series Model
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))
Examples of parameter combinations for Seasonal ARIMA... SARIMAX: (0, 0, 1) x (0, 0, 1, 12) SARIMAX: (0, 0, 1) x (0, 1, 0, 12) SARIMAX: (0, 1, 0) x (0, 1, 1, 12) SARIMAX: (0, 1, 0) x (1, 0, 0, 12)
for param in pdq:
for param_seasonal in seasonal_pdq:
mod = sm.tsa.statespace.SARIMAX(y,
results =
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
RUNNING THE L-BFGS-B CODE

* * *

Machine precision = 2.220D-16
 N =            1     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  8.15333D+00    |proj g|=  1.77636D-10

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    1      0      1      0     0     0   1.776D-10   8.153D+00
  F =   8.1533264604570608     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL

ARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:784.7193402038779
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:1902.5456079381986
ARIMA(0, 0, 0)x(0, 1, 0, 12)12 - AIC:495.37090274829427
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:489.8305326463162
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:691.7786646579052
This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation.
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:1910.454201820771
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:491.25398194270207
ARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:491.80956304929055
ARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:751.063546276295
This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation.
ARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:1836.256935139399
ARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:497.0445558719426
ARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:491.66407712869625
ARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:689.6572888739806
This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained.
ARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:1872.6205615336607
ARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:493.1979865785892
ARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:493.62935223517286
ARIMA(0, 1, 0)x(0, 0, 0, 12)12 - AIC:691.6686053842182
ARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:1840.9119302049864
ARIMA(0, 1, 0)x(0, 1, 0, 12)12 - AIC:501.19171493471
This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. Bad direction in the line search; refresh the lbfgs memory and restart the iteration.
ARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:498.2211835724025
ARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:672.790589808306
/Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration.
ARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:2048.4203823353773
ARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:500.1070474246497
ARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:500.0205212320573
ARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:679.3515257502534
/Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:2003.8112126856263
ARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:489.63451139733695 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.07796D+00 |proj g|= 2.15436D-01 At iterate 5 f= 5.01521D+00 |proj g|= 9.97784D-04 At iterate 10 f= 5.01460D+00 |proj g|= 2.66512D-02 At iterate 15 f= 4.97837D+00 |proj g|= 7.37651D-02 At iterate 20 f= 4.96759D+00 |proj g|= 5.24458D-03 At iterate 25 f= 4.96700D+00 |proj g|= 2.58078D-04 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 3 27 30 1 0 0 1.327D-06 4.967D+00 F = 4.9669995082315213 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:482.831952790226 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 6.92632D+00 |proj g|= 7.11986D-02 At iterate 5 f= 6.90617D+00 |proj g|= 1.51339D-03 At iterate 10 f= 6.80402D+00 |proj g|= 6.88232D-02 At iterate 15 f= 6.77626D+00 |proj g|= 8.03868D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 3 19 36 1 0 0 4.207D-06 6.776D+00 F = 6.7762235340320416 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:656.517459267076 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 4 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 3.88270D+01 |proj g|= 1.09511D-01 At iterate 5 f= 3.87753D+01 |proj g|= 2.90594D-03 At iterate 10 f= 3.45587D+01 |proj g|= 2.47620D-01 ys=-7.745E+01 -gs= 9.675E-01 BFGS update SKIPPED At iterate 15 f= 2.11250D+01 |proj g|= 1.98597D+02 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 4 17 85 1 1 0 1.520D+02 2.112D+01 F = 21.124062780596994 CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH ARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:2035.9100269373114 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.03740D+00 |proj g|= 9.04964D-02 At iterate 5 f= 5.01839D+00 |proj g|= 9.15972D-04 At iterate 10 f= 5.01803D+00 |proj g|= 1.35677D-02 At iterate 15 f= 4.99936D+00 |proj g|= 7.06349D-02 At iterate 20 f= 4.98658D+00 |proj g|= 1.13119D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 3 23 27 1 0 0 4.685D-06 4.987D+00 F = 4.9865799455200941 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:484.711674769929 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 4 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.07796D+00 |proj g|= 2.15436D-01 At iterate 5 f= 5.01464D+00 |proj g|= 3.51472D-03 At iterate 10 f= 5.01433D+00 |proj g|= 2.18864D-03 At iterate 15 f= 5.01308D+00 |proj g|= 3.30191D-02 At iterate 20 f= 4.98556D+00 |proj g|= 1.10737D-01 At iterate 25 f= 4.96696D+00 |proj g|= 1.46917D-04 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 4 26 30 1 0 0 3.076D-06 4.967D+00 F = 4.9669573181330735 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:484.827902540775 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 2 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 7.33280D+00 |proj g|= 5.52117D-03 At iterate 5 f= 7.33214D+00 |proj g|= 3.25849D-04 At iterate 10 f= 7.33210D+00 |proj g|= 1.50709D-04 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 2 11 13 1 0 0 8.458D-06 7.332D+00 F = 7.3320959831063739 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:707.8812143782119 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 3.91740D+01 |proj g|= 7.49298D-03 At iterate 5 f= 2.75187D+01 |proj g|= 3.21565D+02
Warning: more than 10 function and gradient evaluations in the last line search. Termination may possibly be caused by a bad search direction. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:1715.7465322066853
ARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:496.96369022312945
ARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:491.6434591679682
ARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:682.5313938730986
Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:1472.5014128371877
ARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:493.1879623942243
Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:493.6074307437069
ARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:697.3491224687206
ARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:2068.447000241521
ARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:498.8527554530867
Bad direction in the line search; refresh the lbfgs memory and restart the iteration. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:493.5574965424775
/Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:673.3007926966135
Nonpositive definiteness in Cholesky factorization in formk; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:1878.6479250452435
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:495.08266711807704
ARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:495.3239053904089 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 2 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 7.09353D+00 |proj g|= 5.68673D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 2 3 5 1 0 0 5.467D-05 7.094D+00 F = 7.0935060822781111 CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH ARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:684.9765838986987 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 3.87467D+01 |proj g|= 2.86598D-03 ys=-5.109E+02 -gs= 9.773E-01 BFGS update SKIPPED ys=-4.493E+01 -gs= 9.587E-01 BFGS update SKIPPED At iterate 5 f= 1.93053D+01 |proj g|= 2.37298D+05 ys=-5.455E-02 -gs= 4.478E-01 BFGS update SKIPPED * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 3 6 86 4 3 0 2.373D+05 1.931D+01 F = 19.305265289027478 ABNORMAL_TERMINATION_IN_LNSRCH ARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:1859.305467746638 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 2 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.10745D+00 |proj g|= 5.90573D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 2 3 5 1 0 0 3.271D-05 5.107D+00 F = 5.1074085124345423 CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH ARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:494.31121719371606 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 3 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.10745D+00 |proj g|= 1.99473D-01 At iterate 5 f= 5.06018D+00 |proj g|= 7.95676D-04 At iterate 10 f= 5.05870D+00 |proj g|= 2.35552D-02 At iterate 15 f= 5.04830D+00 |proj g|= 1.25348D-03 At iterate 20 f= 5.04822D+00 |proj g|= 4.95075D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 3 24 29 1 0 0 6.684D-06 5.048D+00 F = 5.0481695334536907 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(1, 1, 0)x(0, 1, 1, 12)12 - ARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:490.62427521155433
ARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:665.2664123034787
This problem is unconstrained. This problem is unconstrained. Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:2395.971601892641
ARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:491.8542011684047
ARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:492.5754415767637
ARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:678.4136280158871
Bad direction in the line search; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:2071.6855703589845
Bad direction in the line search; refresh the lbfgs memory and restart the iteration. Warning: more than 10 function and gradient evaluations in the last line search. Termination may possibly be caused by a bad search direction. This problem is unconstrained. This problem is unconstrained. This problem is unconstrained.
ARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:490.8791080244459
ARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:484.5936678041933
ARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:656.534404875145
This problem is unconstrained.
ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:1789.9533256004122
ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:486.5631981200067
ARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:486.5821503895846
Nonpositive definiteness in Cholesky factorization in formt; refresh the lbfgs memory and restart the iteration. /Users/thomas/miniforge3/lib/python3.9/site-packages/statsmodels/base/ ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to " Line search cannot locate an adequate point after MAXLS function and gradient evaluations. Previous x, f and g restored. Possible causes: 1 error in function or gradient evaluation; 2 rounding error dominate computation. This problem is unconstrained. This problem is unconstrained.
* * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 5 15 96 2 2 0 3.681D+05 1.854D+01 F = 18.541180475004293 ABNORMAL_TERMINATION_IN_LNSRCH ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:1789.9533256004122 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 4 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.04808D+00 |proj g|= 8.88357D-02 At iterate 5 f= 5.01487D+00 |proj g|= 2.85906D-02 At iterate 10 f= 5.01305D+00 |proj g|= 8.83301D-04 At iterate 15 f= 5.01275D+00 |proj g|= 1.18778D-02 At iterate 20 f= 4.99531D+00 |proj g|= 9.50058D-02 At iterate 25 f= 4.98503D+00 |proj g|= 1.23486D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 4 28 37 1 0 0 8.216D-07 4.985D+00 F = 4.9850333137500700 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:486.5631981200067 RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 5 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.09331D+00 |proj g|= 2.33947D-01 At iterate 5 f= 5.01345D+00 |proj g|= 6.35790D-02 At iterate 10 f= 5.00824D+00 |proj g|= 2.96564D-03 At iterate 15 f= 5.00814D+00 |proj g|= 3.18523D-03 At iterate 20 f= 5.00509D+00 |proj g|= 4.18692D-02 At iterate 25 f= 4.96998D+00 |proj g|= 4.48102D-02 At iterate 30 f= 4.96445D+00 |proj g|= 3.97815D-03 At iterate 35 f= 4.96440D+00 |proj g|= 3.04858D-06 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 5 35 40 1 0 0 3.049D-06 4.964D+00 F = 4.9643973998915065 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:486.5821503895846
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 0, 12),
results =
RUNNING THE L-BFGS-B CODE * * * Machine precision = 2.220D-16 N = 4 M = 10 At X0 0 variables are exactly at the bounds At iterate 0 f= 5.04808D+00 |proj g|= 8.88357D-02 At iterate 5 f= 5.01487D+00 |proj g|= 2.85906D-02 At iterate 10 f= 5.01305D+00 |proj g|= 8.83301D-04 At iterate 15 f= 5.01275D+00 |proj g|= 1.18778D-02 At iterate 20 f= 4.99531D+00 |proj g|= 9.50058D-02 At iterate 25 f= 4.98503D+00 |proj g|= 1.23486D-03 * * * Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 4 28 37 1 0 0 8.216D-07 4.985D+00 F = 4.9850333137500700 CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ ar.L1 0.0676 0.226 0.299 0.765 -0.376 0.511 ma.L1 -1.0000 0.279 -3.590 0.000 -1.546 -0.454 ar.S.L12 -0.4807 0.147 -3.260 0.001 -0.770 -0.192 sigma2 4.108e+04 6.78e-06 6.06e+09 0.000 4.11e+04 4.11e+04 ==============================================================================
This problem is unconstrained.
results.plot_diagnostics(figsize=(16, 8))
To help us understand the accuracy of our forecasts, we compare predicted sales to real sales of the time series, and we set forecasts to start at 2017-07-01 to the end of the data.
pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_ylabel('Furniture Sales')
The line plot is showing the observed values compared to the rolling forecast predictions. Overall, our forecasts align with the true values very well, showing an upward trend starts from the beginning of the year.
y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
# Compute the mean square error
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
The Mean Squared Error of our forecasts is 39996.01
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))
The Root Mean Squared Error of our forecasts is 199.99
In statistics, the mean squared error (MSE) of an estimator measures the average of the squares of the errors — that is, the average squared difference between the estimated values and what is estimated. The MSE is a measure of the quality of an estimator—it is always non-negative, and the smaller the MSE, the closer we are to finding the line of best fit.
Root Mean Square Error (RMSE) tells us that our model was able to forecast the average daily furniture sales in the test set within 151.64 of the real sales. Our furniture daily sales range from around 400 to over 1200. In my opinion, this is a pretty good model so far.
pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_ylabel('Furniture Sales')
Our model clearly captured furniture sales seasonality. As we forecast further out into the future, it is natural for us to become less confident in our values. This is reflected by the confidence intervals generated by our model, which grow larger as we move further out into the future.
The above time series analysis for furniture makes me curious about other categories, and how do they compare with each other onver time. Therefore, we are going to compare time series of furniture and office supplier.
furniture = df.loc[df['Category'] == 'Furniture']
office = df.loc[df['Category'] == 'Office Supplies']
According to our data, there were way more number of sales from Office Supplies than from Furniture over the years.
furniture.shape, office.shape
((2121, 21), (6026, 21))
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
office.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
office = office.sort_values('Order Date')
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()
office = office.groupby('Order Date')['Sales'].sum().reset_index()
Have a quick peek, perfect!
Order Date | Sales | |
0 | 2014-01-06 | 2573.820 |
1 | 2014-01-07 | 76.728 |
2 | 2014-01-10 | 51.940 |
3 | 2014-01-11 | 9.940 |
4 | 2014-01-13 | 879.939 |
Order Date | Sales | |
0 | 2014-01-03 | 16.448 |
1 | 2014-01-04 | 288.060 |
2 | 2014-01-05 | 19.536 |
3 | 2014-01-06 | 685.340 |
4 | 2014-01-07 | 10.430 |
We are going to compare two categories' sales in the same time period. This means combine two data frames into one and plot these two categories' time series into one plot.
furniture = furniture.set_index('Order Date')
office = office.set_index('Order Date')
y_furniture = furniture['Sales'].resample('MS').mean()
y_office = office['Sales'].resample('MS').mean()
furniture = pd.DataFrame({'Order Date':y_furniture.index, 'Sales':y_furniture.values})
office = pd.DataFrame({'Order Date': y_office.index, 'Sales': y_office.values})
store = furniture.merge(office, how='inner', on='Order Date')
store.rename(columns={'Sales_x': 'furniture_sales', 'Sales_y': 'office_sales'}, inplace=True)
Order Date | furniture_sales | office_sales | |
0 | 2014-01-01 | 480.194231 | 285.357647 |
1 | 2014-02-01 | 367.931600 | 63.042588 |
2 | 2014-03-01 | 857.291529 | 391.176318 |
3 | 2014-04-01 | 567.488357 | 464.794750 |
4 | 2014-05-01 | 432.049188 | 324.346545 |
plt.figure(figsize=(20, 8))
plt.plot(store['Order Date'], store['furniture_sales'], 'b-', label = 'furniture')
plt.plot(store['Order Date'], store['office_sales'], 'r-', label = 'office supplies')
plt.xlabel('Date'); plt.ylabel('Sales'); plt.title('Sales of Furniture and Office Supplies')
We observe that sales of furniture and office supplies shared a similar seasonal pattern. Early of the year is the off season for both of the two categories. It seems summer time is quiet for office supplies too. in addition, average daily sales for furniture are higher than those of office supplies in most of the months. It is understandable, as the value of furniture should be much higher than those of office supplies. Occationaly, office supplies passed furnitue on average daily sales. Let's find out when was the first time office supplies' sales surpassed those of furniture's.
first_date = store.loc[np.min(list(np.where(store['office_sales'] > store['furniture_sales'])[0])), 'Order Date']
print("Office supplies first time produced higher sales than furniture is {}.".format(
Office supplies first time produced higher sales than furniture is 2014-07-01.
It was July 2014.
Released by Facebook in 2017, forecasting tool Prophet is designed for analyzing time-series that display patterns on different time scales such as yearly, weekly and daily. It also has advanced capabilities for modeling the effects of holidays on a time-series and implementing custom changepoints. Therefore, we are using Prophet to get a model up and running.
from prophet import Prophet
furniture = furniture.rename(columns={'Order Date': 'ds', 'Sales': 'y'})
furniture_model = Prophet(interval_width=0.95)
office = office.rename(columns={'Order Date': 'ds', 'Sales': 'y'})
office_model = Prophet(interval_width=0.95)
Importing plotly failed. Interactive plots will not work. INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -59.4782
Iteration   1. Log joint probability =     32.1963. Improved by 91.6745.
Iteration   2. Log joint probability =     66.4876. Improved by 34.2913.
Iteration   3. Log joint probability =     91.4725. Improved by 24.9849.
Iteration   4. Log joint probability =     97.1278. Improved by 5.65524.
Iteration   5. Log joint probability =     97.227. Improved by 0.099276.
[... optimization continues ...]
Iteration  48. Log joint probability =     97.3029. Improved by 2.72054e-09.

Initial log joint probability = -59.2582
Iteration   1. Log joint probability =     38.6371. Improved by 97.8953.
Iteration   2. Log joint probability =     65.5131. Improved by 26.876.
Iteration   3. Log joint probability =     79.6826. Improved by 14.1695.
[... optimization continues ...] [... optimization continues ...] [... optimization continues ...] [... optimization continues ...] [... optimization continues ...] [... optimization continues ...] [... optimization continues ...] [... optimization continues ...]
Iteration  76. Log joint probability =     80.0892. Improved by 4.2336e-09.
<prophet.forecaster.Prophet at 0x140945100>
06. Iteration 55. Log joint probability = 80.0892. Improved by 2.52923e-06. Iteration 56. Log joint probability = 80.0892. Improved by 1.87124e-06. Iteration 57. Log joint probability = 80.0892. Improved by 9.328e-07. Iteration 58. Log joint probability = 80.0892. Improved by 2.17926e-06. Iteration 59. Log joint probability = 80.0892. Improved by 2.01237e-07. Iteration 60. Log joint probability = 80.0892. Improved by 1.30211e-07. Iteration 61. Log joint probability = 80.0892. Improved by 3.95631e-07. Iteration 62. Log joint probability = 80.0892. Improved by 9.2246e-07. Iteration 63. Log joint probability = 80.0892. Improved by 1.90158e-07. Iteration 64. Log joint probability = 80.0892. Improved by 3.68394e-07. Iteration 65. Log joint probability = 80.0892. Improved by 9.89863e-08. Iteration 66. Log joint probability = 80.0892. Improved by 4.13581e-08. Iteration 67. Log joint probability = 80.0892. Improved by 3.94547e-07. Iteration 68. Log joint probability = 80.0892. Improved by 3.94549e-07. Iteration 69. Log joint probability = 80.0892. Improved by 1.51908e-07. Iteration 70. Log joint probability = 80.0892. Improved by 3.20735e-07. Iteration 71. Log joint probability = 80.0892. Improved by 5.39842e-08. Iteration 72. Log joint probability = 80.0892. Improved by 6.52261e-08. Iteration 73. Log joint probability = 80.0892. Improved by 4.33995e-08. Iteration 74. Log joint probability = 80.0892. Improved by 2.14595e-08. Iteration 75. Log joint probability = 80.0892. Improved by 1.07748e-08. Iteration 76. Log joint probability = 80.0892. Improved by 4.2336e-09.
furniture_forecast = furniture_model.make_future_dataframe(periods=36, freq='MS')
furniture_forecast = furniture_model.predict(furniture_forecast)
office_forecast = office_model.make_future_dataframe(periods=36, freq='MS')
office_forecast = office_model.predict(office_forecast)
plt.figure(figsize=(18, 6))
furniture_model.plot(furniture_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Furniture Sales');
plt.figure(figsize=(18, 6))
office_model.plot(office_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Office Supplies Sales');
We already have the forecasts for three years for these two categories into the future. We will now join them together to compare their future forecasts.
furniture_names = ['furniture_%s' % column for column in furniture_forecast.columns]
office_names = ['office_%s' % column for column in office_forecast.columns]
merge_furniture_forecast = furniture_forecast.copy()
merge_office_forecast = office_forecast.copy()
merge_furniture_forecast.columns = furniture_names
merge_office_forecast.columns = office_names
forecast = pd.merge(merge_furniture_forecast, merge_office_forecast, how = 'inner', left_on = 'furniture_ds', right_on = 'office_ds')
forecast = forecast.rename(columns={'furniture_ds': 'Date'}).drop('office_ds', axis=1)
Date | furniture_trend | furniture_yhat_lower | furniture_yhat_upper | furniture_trend_lower | furniture_trend_upper | furniture_additive_terms | furniture_additive_terms_lower | furniture_additive_terms_upper | furniture_yearly | ... | office_additive_terms | office_additive_terms_lower | office_additive_terms_upper | office_yearly | office_yearly_lower | office_yearly_upper | office_multiplicative_terms | office_multiplicative_terms_lower | office_multiplicative_terms_upper | office_yhat | |
0 | 2014-01-01 | 726.057713 | 308.847618 | 766.893401 | 726.057713 | 726.057713 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | ... | -140.040481 | -140.040481 | -140.040481 | -140.040481 | -140.040481 | -140.040481 | 0.0 | 0.0 | 0.0 | 347.490278 |
1 | 2014-02-01 | 727.494023 | 205.973005 | 688.341122 | 727.494023 | 727.494023 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | ... | -385.678283 | -385.678283 | -385.678283 | -385.678283 | -385.678283 | -385.678283 | 0.0 | 0.0 | 0.0 | 109.240162 |
2 | 2014-03-01 | 728.791335 | 464.177460 | 936.271600 | 728.791335 | 728.791335 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | ... | -31.379844 | -31.379844 | -31.379844 | -31.379844 | -31.379844 | -31.379844 | 0.0 | 0.0 | 0.0 | 470.211349 |
3 | 2014-04-01 | 730.227645 | 386.124561 | 873.643884 | 730.227645 | 730.227645 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | ... | -134.291690 | -134.291690 | -134.291690 | -134.291690 | -134.291690 | -134.291690 | 0.0 | 0.0 | 0.0 | 374.687188 |
4 | 2014-05-01 | 731.617622 | 331.461015 | 819.125115 | 731.617622 | 731.617622 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | ... | -263.821569 | -263.821569 | -263.821569 | -263.821569 | -263.821569 | -263.821569 | 0.0 | 0.0 | 0.0 | 252.306682 |
5 rows × 31 columns
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_trend'], 'b-')
plt.plot(forecast['Date'], forecast['office_trend'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Sales Trend');
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_yhat'], 'b-')
plt.plot(forecast['Date'], forecast['office_yhat'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Estimate');
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Now, we can use the Prophet Models to inspect different trends of these two categories in the data.
Good to see that the sales for both furniture and office supplies have been linearly increasing over time although office supplies' growth seems slightly stronger.
The worst month for furniture is April, the worst month for office supplies is February. The best month for furniture is December, and the best month for office supplies is November.
There are many time-series analysis we can explore from now on, such as forecast with uncertainty bounds, change point and anomaly detection, forecast time-series with external data source. We have only scratched the surface here. Stay tuned for future works on time-series analysis.