Time Series Forecasting with Prophet in Python

Prophet is a time series forecasting library developed by Facebook’s Core Data Science team. It is designed to make it easy for analysts and developers to create accurate forecasts, even for datasets with complex patterns and seasonalities. In this tutorial, we’ll go through the steps of using Prophet to create a forecast for a time series dataset in Python.

Step 1: Importing Libraries and Loading Data

We’ll start by importing the necessary libraries, including pandas, which we’ll use to load and manipulate the dataset, and Prophet, which we’ll use to create the forecast.

import pandas as pd
from fbprophet import Prophet

Next, we’ll load the dataset into a pandas dataframe. For this example, we’ll use the daily total number of air passengers dataset, which is available in the seaborn library.

import seaborn as sns

df = sns.load_dataset('flights')

Step 2: Preprocessing the Data

Before we can create a forecast with Prophet, we need to preprocess the data to make it suitable for modeling. Specifically, we need to rename the columns to ‘ds’ and ‘y’, where ‘ds’ represents the date column and ‘y’ represents the target column. We also need to convert the ‘ds’ column to a datetime format.

df = df.rename(columns={'date': 'ds', 'passengers': 'y'})
df['ds'] = pd.to_datetime(df['ds'])

Next, we’ll split the data into training and testing sets. We’ll use the first 80% of the data for training and the remaining 20% for testing.

train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

Step 3: Creating and Fitting the Model

Now we’re ready to create the Prophet model. We’ll create a new instance of the Prophet class and fit it to the training data.

model = Prophet()
model.fit(train_df)

Step 4: Generating a Forecast

With the model trained, we can now generate a forecast for the test data. We’ll use the Prophet make_future_dataframe() method to create a new dataframe with the dates we want to forecast. In this example, we’ll forecast 365 days into the future.

future_df = model.make_future_dataframe(periods=365)
forecast_df = model.predict(future_df)

Step 5: Evaluating the Forecast

Finally, we’ll evaluate the accuracy of the forecast by comparing it to the actual values in the test data. We’ll use the mean absolute percentage error (MAPE) as the evaluation metric.

y_true = test_df['y']
y_pred = forecast_df[-365:]['yhat']
mape = (abs(y_true - y_pred) / y_true).mean() * 100
print('MAPE: {:.2f}%'.format(mape))

Step 6: Visualizing the Results

To visualize the results, we can use the Prophet plot() method to plot the forecast along with the actual values.

fig = model.plot(forecast_df)

This will produce a plot with the forecasted values and the actual values for the entire dataset, including the test set and the 365 days of forecasted values.

That’s it! With these steps, you can create a basic time series forecast using Prophet in Python.

Understanding the Prophet Forecasting Library:

Prophet is a forecasting library developed by Facebook’s Core Data Science team. It is designed to make it easy for analysts and developers to create accurate forecasts, even for datasets with complex patterns and seasonalities. Here are some key features and concepts of the Prophet library:

Additive Model

Prophet uses an additive model to forecast time series data. This means that the forecast is the sum of several components, including trend, seasonality, and holidays. The model can also include additional regressors that can help to explain and predict the time series data.

Trend

The trend component captures the long-term direction of the time series data. Prophet models the trend as a non-linear curve, which can be specified using various methods, such as Fourier series, piecewise linear or logistic growth models.

Seasonality

The seasonality component captures the periodic fluctuations in the time series data. For example, sales of ice cream may increase during the summer months and decrease during the winter months. Prophet can model multiple seasonalities, such as daily, weekly, monthly, or yearly.

Holidays

Holidays are special events that can affect the time series data. For example, sales may increase during the holiday season. Prophet provides a way to include known holidays, and it can also automatically detect and model the effects of holidays that are not explicitly specified.

Regressors

Regressors are additional variables that can be used to explain and predict the time series data. For example, the sales of ice cream may depend on the temperature. Prophet allows for the inclusion of user-defined regressors and can also automatically select relevant regressors based on a given dataset.

Forecasting

Once the Prophet model is trained, it can be used to generate forecasts for future time periods. The forecast can include uncertainty intervals, which give a range of possible values for the predicted data.

Model Evaluation

Prophet provides several metrics for evaluating the accuracy of the forecast. These include mean absolute percentage error (MAPE), mean squared error (MSE), and root mean squared error (RMSE).

Overall, Prophet is a powerful tool for forecasting time series data. Its ease of use, flexibility, and ability to handle complex patterns make it a popular choice for analysts and developers.

How to Install the Python Prophet library?

To install the Prophet library for Python, you can use pip, the package installer for Python. Here are the steps to install Prophet:

  1. Open a terminal or command prompt.
  2. Type the following command and press Enter to install Prophet:
pip install fbprophet
  1. Wait for the installation to complete. Depending on your internet speed, this may take a few minutes.
  2. After the installation is complete, you can test if it was successful by importing Prophet in a Python script or in the Python interpreter:
from fbprophet import Prophet

If there are no error messages, then the installation was successful, and you can start using Prophet for time series forecasting in Python.

Verifying the Installation:

To verify that the Prophet library has been installed correctly, you can create a simple Python script that imports the Prophet library and prints a message to confirm that the import was successful. Here are the steps:

  1. Open a text editor or an Integrated Development Environment (IDE) such as PyCharm.
  2. Create a new Python script and save it as prophet_installation_test.py.
  3. In the script, add the following code:
from fbprophet import Prophet

print("Prophet installation test successful!")
  1. Save the file and close the text editor or IDE.
  2. Open a terminal or command prompt.
  3. Navigate to the directory where the prophet_installation_test.py file is located.
  4. Type the following command and press Enter to run the script:
python prophet_installation_test.py
  1. If the Prophet library has been installed correctly, the script will print the message “Prophet installation test successful!” to the console.

If you encounter any error messages, it may indicate that the installation was not successful or that there is an issue with your Python environment. In that case, you should check your installation or consult the Prophet documentation or community for assistance.

Understanding the working of the Prophet library:

The Prophet library is designed to make it easy for analysts and developers to create accurate forecasts for time series data. Here is a high-level overview of how the Prophet library works:

  1. Data Preparation: The first step in using Prophet is to prepare your time series data. The data should be in a pandas DataFrame with two columns: ds (datestamp) and y (the observed value).
  2. Model Specification: After preparing the data, you need to specify the Prophet model. This involves specifying the seasonality and any additional regressors that you want to include in the model.
  3. Model Fitting: Once you have specified the model, you can fit it to the data using the fit method. This method uses a Bayesian optimization algorithm to find the optimal model parameters that minimize the mean squared error.
  4. Forecasting: After fitting the model, you can use it to generate forecasts for future time periods using the predict method. This method returns a DataFrame with the predicted values and associated uncertainty intervals.
  5. Model Evaluation: Finally, you can evaluate the accuracy of the model using various metrics such as mean absolute percentage error (MAPE), mean squared error (MSE), or root mean squared error (RMSE).

Here is a more detailed explanation of each step:

  1. Data Preparation: The Prophet library requires the input time series data to be in a pandas DataFrame with two columns: ds and y. The ds column should contain the date or datetime information in a pandas-compatible format, and the y column should contain the observed values for each date. The ds column should be of type datetime, and the y column can be any numerical data type.
  2. Model Specification: The Prophet model can include several components such as trend, seasonality, holidays, and additional regressors. You can specify these components using the Prophet() function and several related methods. For example, you can use the add_seasonality method to specify a periodic seasonality, or the add_regressor method to include additional regressors in the model.
  3. Model Fitting: Once you have specified the model, you can fit it to the data using the fit method. This method estimates the optimal model parameters that minimize the mean squared error using a Bayesian optimization algorithm.
  4. Forecasting: After fitting the model, you can use it to generate forecasts for future time periods using the predict method. This method takes a DataFrame with the future dates in the ds column and returns a DataFrame with the predicted values and associated uncertainty intervals.
  5. Model Evaluation: Finally, you can evaluate the accuracy of the model using various metrics such as mean absolute percentage error (MAPE), mean squared error (MSE), or root mean squared error (RMSE). You can use the make_future_dataframe method to create a DataFrame with future dates, and then use the cross_validation method to perform cross-validation on the model using a sliding window approach. This method returns a DataFrame with the predicted values and the corresponding actual values, which you can use to compute the evaluation metrics.

Overall, the Prophet library provides a powerful and flexible framework for time series forecasting in Python. Its ease of use and ability to handle complex patterns and seasonalities make it a popular choice for analysts and developers.

Loading and Summarizing Dataset:

To work with Prophet in Python, you need to load your time series data into a pandas DataFrame with the required ds and y columns. Here are the steps to load and summarize a dataset in Python:

  1. Import the necessary libraries, including pandas and matplotlib, and load your data into a pandas DataFrame. For example, you can use the following code to load a CSV file named data.csv into a DataFrame:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
  1. Examine the DataFrame using the head and tail methods to ensure that the data has been loaded correctly. For example, you can use the following code to display the first 5 rows of the DataFrame:
print(df.head())
  1. Use the info method to display a summary of the DataFrame, including the column names, data types, and number of non-null values. This can help you identify any missing values or data quality issues. For example, you can use the following code to display the DataFrame summary:
print(df.info())
  1. Convert the ds column to a datetime format using the to_datetime method. This ensures that the data is interpreted correctly as time series data. For example, you can use the following code to convert the ds column to a datetime format:
df['ds'] = pd.to_datetime(df['ds'])
  1. Plot the time series data using the plot method from matplotlib. This can help you visualize the patterns and trends in the data. For example, you can use the following code to plot the time series data:
plt.plot(df['ds'], df['y'])
plt.show()
  1. Optionally, you can also use the describe method to display summary statistics for the y column, such as the mean, standard deviation, and quartiles. This can help you understand the distribution of the data and identify any outliers or extreme values. For example, you can use the following code to display summary statistics for the y column:
print(df['y'].describe())

By following these steps, you can load and summarize your time series data in Python using the pandas library. This provides a solid foundation for applying the Prophet library for time series forecasting.

Loading and Plotting Dataset:

To load and plot a time series dataset in Python, you can use the pandas and matplotlib libraries. Here are the steps to follow:

  1. Import the necessary libraries, including pandas and matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
  1. Load the time series data into a pandas DataFrame. The read_csv method can be used to load a CSV file into a DataFrame. Make sure to set the parse_dates parameter to True to ensure that the ds column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
  1. Check the head of the DataFrame using the head() method to ensure that the data has been loaded correctly.
print(df.head())
  1. Plot the time series data using the plot() method from matplotlib. Pass in the ds column as the x-axis and the y column as the y-axis. You can also add a title and axis labels using the title(), xlabel(), and ylabel() methods.
plt.plot(df['ds'], df['y'])
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

This will generate a plot of the time series data. You can customize the plot by adding other elements such as legends, grid lines, or multiple time series if needed.

By following these steps, you can load and plot a time series dataset in Python. This provides a good starting point for exploring and analyzing the data before applying any time series forecasting techniques using Prophet.

Forecasting car sales using Prophet in Python:

Here are the steps to forecast car sales using Prophet in Python:

  1. Import the necessary libraries, including pandas, matplotlib, and Prophet.
import pandas as pd
import matplotlib.pyplot as plt
from prophet import Prophet
  1. Load the time series data into a pandas DataFrame. The read_csv method can be used to load a CSV file into a DataFrame. Make sure to set the parse_dates parameter to True to ensure that the ds column is correctly parsed as a date.
df = pd.read_csv('car_sales.csv', parse_dates=['Month'])
  1. Rename the columns to ds and y to match the Prophet’s input requirements.
df = df.rename(columns={'Month': 'ds', 'Sales': 'y'})
  1. Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y'])
plt.title('Car Sales')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()
  1. Create a Prophet model by initializing a new instance of the Prophet class and fitting the model to the data.
model = Prophet()
model.fit(df)
  1. Generate future dates for the forecast using the make_future_dataframe() method. You can specify the number of periods to forecast using the periods parameter.
future = model.make_future_dataframe(periods=12, freq='M')
  1. Generate the forecast using the predict() method on the model and passing in the future DataFrame.
forecast = model.predict(future)
  1. Plot the forecast using the plot() method from Prophet. This will generate a plot of the actual sales data and the forecasted values.
fig = model.plot(forecast)
plt.title('Car Sales Forecast')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()

By following these steps, you can use Prophet to forecast car sales in Python. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.

Fitting Prophet Model:

To fit a Prophet model to a time series dataset, you can follow these steps:

  1. Import the necessary libraries, including pandas, matplotlib, and Prophet.
import pandas as pd
import matplotlib.pyplot as plt
from prophet import Prophet
  1. Load the time series data into a pandas DataFrame. The read_csv method can be used to load a CSV file into a DataFrame. Make sure to set the parse_dates parameter to True to ensure that the ds column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
  1. Rename the columns to ds and y to match the Prophet’s input requirements.
df = df.rename(columns={'Date': 'ds', 'Value': 'y'})
  1. Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y'])
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
  1. Create a Prophet model by initializing a new instance of the Prophet class and fitting the model to the data.
model = Prophet()
model.fit(df)
  1. Generate future dates for the forecast using the make_future_dataframe() method. You can specify the number of periods to forecast using the periods parameter.
future = model.make_future_dataframe(periods=12, freq='M')
  1. Generate the forecast using the predict() method on the model and passing in the future DataFrame.
forecast = model.predict(future)
  1. Plot the forecast using the plot() method from Prophet. This will generate a plot of the actual data and the forecasted values.
fig = model.plot(forecast)
plt.title('Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

By following these steps, you can fit a Prophet model to a time series dataset in Python and generate a forecast of future values. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.

Making an in-sample Forecast:

To make an in-sample forecast using a Prophet model, you can follow these steps:

  1. Import the necessary libraries, including pandas, matplotlib, and Prophet.
import pandas as pd
import matplotlib.pyplot as plt
from prophet import Prophet
  1. Load the time series data into a pandas DataFrame. The read_csv method can be used to load a CSV file into a DataFrame. Make sure to set the parse_dates parameter to True to ensure that the ds column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
  1. Rename the columns to ds and y to match the Prophet’s input requirements.
df = df.rename(columns={'Date': 'ds', 'Value': 'y'})
  1. Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y'])
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
  1. Create a Prophet model by initializing a new instance of the Prophet class and fitting the model to the data.
model = Prophet()
model.fit(df)
  1. Generate a DataFrame with the future dates for which you want to make predictions using the make_future_dataframe() method. By default, this method generates a DataFrame that includes all dates from the original DataFrame up to n periods into the future, where n is the value passed to the periods parameter. If you don’t specify a value for periods, Prophet will generate a DataFrame with dates up to the end of the input data.
future = model.make_future_dataframe(periods=0)
  1. Generate predictions for the entire time series using the predict() method on the model and passing in the future DataFrame.
forecast = model.predict(future)
  1. Extract the predictions for the original time series by selecting the rows of the forecast DataFrame that correspond to the original data.
in_sample_predictions = forecast[['ds', 'yhat']].iloc[:-len(future)]
  1. Plot the in-sample predictions together with the actual data to visualize the accuracy of the model’s fit.
plt.plot(df['ds'], df['y'], label='Actual')
plt.plot(in_sample_predictions['ds'], in_sample_predictions['yhat'], label='Predicted')
plt.title('In-Sample Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

By following these steps, you can make an in-sample forecast using a Prophet model in Python and visualize the accuracy of the model’s fit to the actual data. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.