Prophet is a time series forecasting library developed by Facebook’s Core Data Science team. It is designed to make it easy for analysts and developers to create accurate forecasts, even for datasets with complex patterns and seasonalities. In this tutorial, we’ll go through the steps of using Prophet to create a forecast for a time series dataset in Python.
Step 1: Importing Libraries and Loading Data
We’ll start by importing the necessary libraries, including pandas, which we’ll use to load and manipulate the dataset, and Prophet, which we’ll use to create the forecast.
import pandas as pd from fbprophet import Prophet
Next, we’ll load the dataset into a pandas dataframe. For this example, we’ll use the daily total number of air passengers dataset, which is available in the seaborn library.
import seaborn as sns df = sns.load_dataset('flights')
Step 2: Preprocessing the Data
Before we can create a forecast with Prophet, we need to preprocess the data to make it suitable for modeling. Specifically, we need to rename the columns to ‘ds’ and ‘y’, where ‘ds’ represents the date column and ‘y’ represents the target column. We also need to convert the ‘ds’ column to a datetime format.
df = df.rename(columns={'date': 'ds', 'passengers': 'y'}) df['ds'] = pd.to_datetime(df['ds'])
Next, we’ll split the data into training and testing sets. We’ll use the first 80% of the data for training and the remaining 20% for testing.
train_size = int(len(df) * 0.8) train_df = df[:train_size] test_df = df[train_size:]
Step 3: Creating and Fitting the Model
Now we’re ready to create the Prophet model. We’ll create a new instance of the Prophet class and fit it to the training data.
model = Prophet() model.fit(train_df)
Step 4: Generating a Forecast
With the model trained, we can now generate a forecast for the test data. We’ll use the Prophet make_future_dataframe() method to create a new dataframe with the dates we want to forecast. In this example, we’ll forecast 365 days into the future.
future_df = model.make_future_dataframe(periods=365) forecast_df = model.predict(future_df)
Step 5: Evaluating the Forecast
Finally, we’ll evaluate the accuracy of the forecast by comparing it to the actual values in the test data. We’ll use the mean absolute percentage error (MAPE) as the evaluation metric.
y_true = test_df['y'] y_pred = forecast_df[-365:]['yhat'] mape = (abs(y_true - y_pred) / y_true).mean() * 100 print('MAPE: {:.2f}%'.format(mape))
Step 6: Visualizing the Results
To visualize the results, we can use the Prophet plot() method to plot the forecast along with the actual values.
fig = model.plot(forecast_df)
This will produce a plot with the forecasted values and the actual values for the entire dataset, including the test set and the 365 days of forecasted values.
That’s it! With these steps, you can create a basic time series forecast using Prophet in Python.
Understanding the Prophet Forecasting Library:
Prophet is a forecasting library developed by Facebook’s Core Data Science team. It is designed to make it easy for analysts and developers to create accurate forecasts, even for datasets with complex patterns and seasonalities. Here are some key features and concepts of the Prophet library:
Additive Model
Prophet uses an additive model to forecast time series data. This means that the forecast is the sum of several components, including trend, seasonality, and holidays. The model can also include additional regressors that can help to explain and predict the time series data.
Trend
The trend component captures the long-term direction of the time series data. Prophet models the trend as a non-linear curve, which can be specified using various methods, such as Fourier series, piecewise linear or logistic growth models.
Seasonality
The seasonality component captures the periodic fluctuations in the time series data. For example, sales of ice cream may increase during the summer months and decrease during the winter months. Prophet can model multiple seasonalities, such as daily, weekly, monthly, or yearly.
Holidays
Holidays are special events that can affect the time series data. For example, sales may increase during the holiday season. Prophet provides a way to include known holidays, and it can also automatically detect and model the effects of holidays that are not explicitly specified.
Regressors
Regressors are additional variables that can be used to explain and predict the time series data. For example, the sales of ice cream may depend on the temperature. Prophet allows for the inclusion of user-defined regressors and can also automatically select relevant regressors based on a given dataset.
Forecasting
Once the Prophet model is trained, it can be used to generate forecasts for future time periods. The forecast can include uncertainty intervals, which give a range of possible values for the predicted data.
Model Evaluation
Prophet provides several metrics for evaluating the accuracy of the forecast. These include mean absolute percentage error (MAPE), mean squared error (MSE), and root mean squared error (RMSE).
Overall, Prophet is a powerful tool for forecasting time series data. Its ease of use, flexibility, and ability to handle complex patterns make it a popular choice for analysts and developers.
How to Install the Python Prophet library?
To install the Prophet library for Python, you can use pip, the package installer for Python. Here are the steps to install Prophet:
- Open a terminal or command prompt.
- Type the following command and press Enter to install Prophet:
pip install fbprophet
- Wait for the installation to complete. Depending on your internet speed, this may take a few minutes.
- After the installation is complete, you can test if it was successful by importing Prophet in a Python script or in the Python interpreter:
from fbprophet import Prophet
If there are no error messages, then the installation was successful, and you can start using Prophet for time series forecasting in Python.
Verifying the Installation:
To verify that the Prophet library has been installed correctly, you can create a simple Python script that imports the Prophet library and prints a message to confirm that the import was successful. Here are the steps:
- Open a text editor or an Integrated Development Environment (IDE) such as PyCharm.
- Create a new Python script and save it as
prophet_installation_test.py
. - In the script, add the following code:
from fbprophet import Prophet print("Prophet installation test successful!")
- Save the file and close the text editor or IDE.
- Open a terminal or command prompt.
- Navigate to the directory where the
prophet_installation_test.py
file is located. - Type the following command and press Enter to run the script:
python prophet_installation_test.py
- If the Prophet library has been installed correctly, the script will print the message “Prophet installation test successful!” to the console.
If you encounter any error messages, it may indicate that the installation was not successful or that there is an issue with your Python environment. In that case, you should check your installation or consult the Prophet documentation or community for assistance.
Understanding the working of the Prophet library:
The Prophet library is designed to make it easy for analysts and developers to create accurate forecasts for time series data. Here is a high-level overview of how the Prophet library works:
- Data Preparation: The first step in using Prophet is to prepare your time series data. The data should be in a pandas DataFrame with two columns:
ds
(datestamp) andy
(the observed value). - Model Specification: After preparing the data, you need to specify the Prophet model. This involves specifying the seasonality and any additional regressors that you want to include in the model.
- Model Fitting: Once you have specified the model, you can fit it to the data using the
fit
method. This method uses a Bayesian optimization algorithm to find the optimal model parameters that minimize the mean squared error. - Forecasting: After fitting the model, you can use it to generate forecasts for future time periods using the
predict
method. This method returns a DataFrame with the predicted values and associated uncertainty intervals. - Model Evaluation: Finally, you can evaluate the accuracy of the model using various metrics such as mean absolute percentage error (MAPE), mean squared error (MSE), or root mean squared error (RMSE).
Here is a more detailed explanation of each step:
- Data Preparation: The Prophet library requires the input time series data to be in a pandas DataFrame with two columns:
ds
andy
. Theds
column should contain the date or datetime information in a pandas-compatible format, and they
column should contain the observed values for each date. Theds
column should be of typedatetime
, and they
column can be any numerical data type. - Model Specification: The Prophet model can include several components such as trend, seasonality, holidays, and additional regressors. You can specify these components using the
Prophet()
function and several related methods. For example, you can use theadd_seasonality
method to specify a periodic seasonality, or theadd_regressor
method to include additional regressors in the model. - Model Fitting: Once you have specified the model, you can fit it to the data using the
fit
method. This method estimates the optimal model parameters that minimize the mean squared error using a Bayesian optimization algorithm. - Forecasting: After fitting the model, you can use it to generate forecasts for future time periods using the
predict
method. This method takes a DataFrame with the future dates in theds
column and returns a DataFrame with the predicted values and associated uncertainty intervals. - Model Evaluation: Finally, you can evaluate the accuracy of the model using various metrics such as mean absolute percentage error (MAPE), mean squared error (MSE), or root mean squared error (RMSE). You can use the
make_future_dataframe
method to create a DataFrame with future dates, and then use thecross_validation
method to perform cross-validation on the model using a sliding window approach. This method returns a DataFrame with the predicted values and the corresponding actual values, which you can use to compute the evaluation metrics.
Overall, the Prophet library provides a powerful and flexible framework for time series forecasting in Python. Its ease of use and ability to handle complex patterns and seasonalities make it a popular choice for analysts and developers.
Loading and Summarizing Dataset:
To work with Prophet in Python, you need to load your time series data into a pandas DataFrame with the required ds
and y
columns. Here are the steps to load and summarize a dataset in Python:
- Import the necessary libraries, including pandas and matplotlib, and load your data into a pandas DataFrame. For example, you can use the following code to load a CSV file named
data.csv
into a DataFrame:
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('data.csv')
- Examine the DataFrame using the
head
andtail
methods to ensure that the data has been loaded correctly. For example, you can use the following code to display the first 5 rows of the DataFrame:
print(df.head())
- Use the
info
method to display a summary of the DataFrame, including the column names, data types, and number of non-null values. This can help you identify any missing values or data quality issues. For example, you can use the following code to display the DataFrame summary:
print(df.info())
- Convert the
ds
column to a datetime format using theto_datetime
method. This ensures that the data is interpreted correctly as time series data. For example, you can use the following code to convert theds
column to a datetime format:
df['ds'] = pd.to_datetime(df['ds'])
- Plot the time series data using the
plot
method from matplotlib. This can help you visualize the patterns and trends in the data. For example, you can use the following code to plot the time series data:
plt.plot(df['ds'], df['y']) plt.show()
- Optionally, you can also use the
describe
method to display summary statistics for they
column, such as the mean, standard deviation, and quartiles. This can help you understand the distribution of the data and identify any outliers or extreme values. For example, you can use the following code to display summary statistics for they
column:
print(df['y'].describe())
By following these steps, you can load and summarize your time series data in Python using the pandas library. This provides a solid foundation for applying the Prophet library for time series forecasting.
Loading and Plotting Dataset:
To load and plot a time series dataset in Python, you can use the pandas
and matplotlib
libraries. Here are the steps to follow:
- Import the necessary libraries, including
pandas
andmatplotlib
.
import pandas as pd import matplotlib.pyplot as plt
- Load the time series data into a pandas DataFrame. The
read_csv
method can be used to load a CSV file into a DataFrame. Make sure to set theparse_dates
parameter toTrue
to ensure that theds
column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
- Check the head of the DataFrame using the
head()
method to ensure that the data has been loaded correctly.
print(df.head())
- Plot the time series data using the
plot()
method frommatplotlib
. Pass in theds
column as the x-axis and they
column as the y-axis. You can also add a title and axis labels using thetitle()
,xlabel()
, andylabel()
methods.
plt.plot(df['ds'], df['y']) plt.title('Time Series Data') plt.xlabel('Date') plt.ylabel('Value') plt.show()
This will generate a plot of the time series data. You can customize the plot by adding other elements such as legends, grid lines, or multiple time series if needed.
By following these steps, you can load and plot a time series dataset in Python. This provides a good starting point for exploring and analyzing the data before applying any time series forecasting techniques using Prophet.
Forecasting car sales using Prophet in Python:
Here are the steps to forecast car sales using Prophet in Python:
- Import the necessary libraries, including
pandas
,matplotlib
, andProphet
.
import pandas as pd import matplotlib.pyplot as plt from prophet import Prophet
- Load the time series data into a pandas DataFrame. The
read_csv
method can be used to load a CSV file into a DataFrame. Make sure to set theparse_dates
parameter toTrue
to ensure that theds
column is correctly parsed as a date.
df = pd.read_csv('car_sales.csv', parse_dates=['Month'])
- Rename the columns to
ds
andy
to match the Prophet’s input requirements.
df = df.rename(columns={'Month': 'ds', 'Sales': 'y'})
- Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y']) plt.title('Car Sales') plt.xlabel('Year') plt.ylabel('Sales') plt.show()
- Create a Prophet model by initializing a new instance of the
Prophet
class and fitting the model to the data.
model = Prophet() model.fit(df)
- Generate future dates for the forecast using the
make_future_dataframe()
method. You can specify the number of periods to forecast using theperiods
parameter.
future = model.make_future_dataframe(periods=12, freq='M')
- Generate the forecast using the
predict()
method on the model and passing in thefuture
DataFrame.
forecast = model.predict(future)
- Plot the forecast using the
plot()
method fromProphet
. This will generate a plot of the actual sales data and the forecasted values.
fig = model.plot(forecast) plt.title('Car Sales Forecast') plt.xlabel('Year') plt.ylabel('Sales') plt.show()
By following these steps, you can use Prophet to forecast car sales in Python. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.
Fitting Prophet Model:
To fit a Prophet model to a time series dataset, you can follow these steps:
- Import the necessary libraries, including
pandas
,matplotlib
, andProphet
.
import pandas as pd import matplotlib.pyplot as plt from prophet import Prophet
- Load the time series data into a pandas DataFrame. The
read_csv
method can be used to load a CSV file into a DataFrame. Make sure to set theparse_dates
parameter toTrue
to ensure that theds
column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
- Rename the columns to
ds
andy
to match the Prophet’s input requirements.
df = df.rename(columns={'Date': 'ds', 'Value': 'y'})
- Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y']) plt.title('Time Series Data') plt.xlabel('Date') plt.ylabel('Value') plt.show()
- Create a Prophet model by initializing a new instance of the
Prophet
class and fitting the model to the data.
model = Prophet() model.fit(df)
- Generate future dates for the forecast using the
make_future_dataframe()
method. You can specify the number of periods to forecast using theperiods
parameter.
future = model.make_future_dataframe(periods=12, freq='M')
- Generate the forecast using the
predict()
method on the model and passing in thefuture
DataFrame.
forecast = model.predict(future)
- Plot the forecast using the
plot()
method fromProphet
. This will generate a plot of the actual data and the forecasted values.
fig = model.plot(forecast) plt.title('Forecast') plt.xlabel('Date') plt.ylabel('Value') plt.show()
By following these steps, you can fit a Prophet model to a time series dataset in Python and generate a forecast of future values. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.
Making an in-sample Forecast:
To make an in-sample forecast using a Prophet model, you can follow these steps:
- Import the necessary libraries, including
pandas
,matplotlib
, andProphet
.
import pandas as pd import matplotlib.pyplot as plt from prophet import Prophet
- Load the time series data into a pandas DataFrame. The
read_csv
method can be used to load a CSV file into a DataFrame. Make sure to set theparse_dates
parameter toTrue
to ensure that theds
column is correctly parsed as a date.
df = pd.read_csv('data.csv', parse_dates=['ds'])
- Rename the columns to
ds
andy
to match the Prophet’s input requirements.
df = df.rename(columns={'Date': 'ds', 'Value': 'y'})
- Plot the time series data to visualize the trends and patterns in the data.
plt.plot(df['ds'], df['y']) plt.title('Time Series Data') plt.xlabel('Date') plt.ylabel('Value') plt.show()
- Create a Prophet model by initializing a new instance of the
Prophet
class and fitting the model to the data.
model = Prophet() model.fit(df)
- Generate a DataFrame with the future dates for which you want to make predictions using the
make_future_dataframe()
method. By default, this method generates a DataFrame that includes all dates from the original DataFrame up ton
periods into the future, wheren
is the value passed to theperiods
parameter. If you don’t specify a value forperiods
, Prophet will generate a DataFrame with dates up to the end of the input data.
future = model.make_future_dataframe(periods=0)
- Generate predictions for the entire time series using the
predict()
method on the model and passing in thefuture
DataFrame.
forecast = model.predict(future)
- Extract the predictions for the original time series by selecting the rows of the forecast DataFrame that correspond to the original data.
in_sample_predictions = forecast[['ds', 'yhat']].iloc[:-len(future)]
- Plot the in-sample predictions together with the actual data to visualize the accuracy of the model’s fit.
plt.plot(df['ds'], df['y'], label='Actual') plt.plot(in_sample_predictions['ds'], in_sample_predictions['yhat'], label='Predicted') plt.title('In-Sample Forecast') plt.xlabel('Date') plt.ylabel('Value') plt.legend() plt.show()
By following these steps, you can make an in-sample forecast using a Prophet model in Python and visualize the accuracy of the model’s fit to the actual data. You can further customize the model and the forecast by adjusting the model parameters, specifying holidays or events, and exploring the forecast components.