ARIMA applied in Predictive Modelling
- Alexander Kiel
- Feb 7, 2024
- 4 min read
Social media is a dominant companion in our today’s world where 80% of the world’s population spent an average of 151 minutes daily on those platforms.
What if you could turn the vast amounts of social media data into a powerful tool for predicting the future of your business? How would it change the way you approach decision-making if you could see trends before they fully unfold?
Bringing social media elements into financial forecasting has shown an overall improved forecast by 7%.
“Social media is not just a platform, it’s a reflection of society.” – Mark Zuckerberg
While having discussed the incorporation of alternative data in financial modelling, today we are going into further detail – using an autoregressive integrated moving average (ARIMA) in predictive modelling using social media data.
ARIMA is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends.
So let’s combine ARIMA with social media for predictive modelling
Step 1: Collect Data
Gather historical data. For example, assume you have monthly social media engagement data (likes, shares, comments) and corresponding financial performance (revenue) over the past 12 months:
Month | Engagement | Revenue ($) |
---|---|---|
January | 1500 | 20,000 |
February | 1600 | 21,000 |
March | 1700 | 22,000 |
April | 1650 | 21,500 |
May | 1750 | 22,500 |
June | 1800 | 23,500 |
July | 1900 | 24,000 |
August | 1850 | 23,500 |
September | 1950 | 24,500 |
October | 2000 | 25,000 |
November | 2100 | 26,000 |
December | 2200 | 27,000 |
Step 2: Check for Stationarity
Plot the engagement data to visually check for trends or seasonality. If the data is not stationary, apply differencing.
Step 3: Differencing
Calculate the first difference of the engagement data to make it stationary:
Months | Engagement | 1st Difference |
---|---|---|
January | 1500 | - |
February | 1600 | 1600 - 1500 = 100 |
March | 1700 | 1700 - 1600 = 100 |
April | 1650 | 1650 - 1700 = -50 |
May | 1750 | 1750 - 1650 = 100 |
June | 1800 | 1800 - 1750 = 50 |
July | 1900 | 1900 - 1800 = 100 |
August | 1850 | 1850 - 1900 = -50 |
September | 1950 | 1950 - 1850 = 100 |
October | 2000 | 2000 - 1950 = 50 |
November | 2100 | 2100 - 2000 = 100 |
December | 2200 | 2200 - 2100 = 100 |
Step 4: Determine ARIMA Parameters (p, d, q)
p (AR order): The AR order p is determined by examining the Autocorrelation Function (ACF) plot. It shows the correlation of the time series with its own lagged values. Look for where the ACF cuts off (drops to zero) to determine p. For example, if the ACF plot shows significant correlation at lag 1 but not beyond, you might choose p=1.
d (Differencing order): The differencing order ddd is the number of times the data needs to be differenced to make it stationary. We calculate the difference between consecutive data points to remove trends and seasonality. In our example, we did the first difference (d = 1); see table above.
q (MA order): The MA order qqq is determined by examining the Partial Autocorrelation Function (PACF) plot. It shows the partial correlation of the time series with its own lagged values, controlling for the values of the intervening lags. For example, if the PACF plot shows significant correlation at lag 1 but not beyond, you might choose q=1.
Assume from ACF and PACF plots, you choose ARIMA (1, 1, 1).
Step 5: Fit the Model
Using ARIMA(1, 1, 1), fit the model manually:
yt = c + ϕyt−1+ θεt−1+ εt
Where:
c is a constant
ϕ is the AR parameter
θ is the MA parameter
εt is the error term
Assume initial estimates (simplified for demonstration):
c = 0
ϕ = 0.5
θ = 0.5
Step 6: Forecast Future Values
Calculate the next month's engagement (January of the following year):
y13 = c+ϕy12+θε12
y13 = 0 + 0.5 x 2200 + 0.5 x100
y13 = 1100 + 50
y13 = 1150
Step 7: Predict Financial Performance
Assume a linear relationship between engagement and revenue. From the historical data, calculate the average increase in revenue per unit of engagement:
Average increase per engagement = (Revenue in Dec - Revenue in Jan) / (Engagement in Dec - Engagement in Jan)
Average increase per engagement = (27000 - 20000) / (2200 - 1500)
Average increase per engagement ≈10
Predict revenue for January of the following year:
Revenue13 = Revenue12 + (y13 - Engagement12) x 10
Revenue13 = 27000 + (1150 - 2200) x 10
Revenue13 = 27000 - 10500
Revenue13 = 16500
So, the forecasted financial performance for January of the following year would be $16,500.
In practice, you might use statistical software to plot ACF and PACF and to estimate these parameters more precisely. The steps outlined here provide a simplified approach for understanding the concept without special software.
The goal of forecasting is not to predict the future but to tell you what you need to know to take meaningful action in the present." - Paul Saffo
When you're using ARIMA to forecast financial performance from social media data, make sure you're prioritising data accuracy and keep your model updated regularly. Validate your predictions against real outcomes and incorporate various data sources like market trends to get a complete picture. Combine ARIMA forecasts with insights from industry experts to stay ahead of emerging trends.
As you stand on the brink of integrating ARIMA into your predictive modeling, what future trends could you uncover? How will you use these insights to not just react but to lead in your industry?
Stay curious about advancements in time series forecasting and machine learning techniques to continually refine your predictions. This personalised approach will empower you to make informed decisions and navigate the ever-changing landscape of business with confidence.