Yeah, I'd be loosing the atypical covid data.
Dropping the weekends shouldn't be a problem, as there is no variability in the amount that could be withdrawn.
Your next problem with doing just a Monday-Friday training is that you are assuming that all weeks are equal and that all days are independent. How would you expect the results of the training to be different from simply running an average or linear recursion over the data for each day? On average, on Monday, the withdrawal is $200,000. With linear recursion, the withdrawal on Monday is $190,000 plus $100 per week ($5,200/year) since Jan 1st 2021, so the withdrawal for next week would be, say, $212,500? If you go with this, you'll need to develop a good understanding of the days when it'll be wrong - public holidays, and other times with a-typical money requirements (you then probably want to eliminate these days from your average/regression as they'll throw the values off). Linear regression may work better with multiple shorter segments, as things like changes in interest rates can throw off the historical data (regression over the last 6 months vs regression over the full 30 months)..
Running Fourier across the whole time sequence will improve your forecasts for fixed days and month start/end - Xmas, Thanksgiving etc..., but still struggle with floating days - Chinese New Year, Easter.
If you want to get a better model, you need to add some correlation of these dates into your training data, but note that for some of them you have only 2 data points and it'll still be thrown by lower relevance of older data. IZPCA has a Fractal Forecasting algorithm that does this, but it's not available outside of the product (it uses it to predict CPU usage on computer systems, which is a sort of similar problem as the computers are processing, amongst other things, the withdrawal transactions)...
------------------------------
Mik Clarke
------------------------------
Original Message:
Sent: Mon June 05, 2023 12:49 AM
From: Gdin ABL
Subject: Best approach for forecasting bank withdrawals and deposits using historical data
Hi everyone,
I have the historical data of a bank having two categories withdrawal and deposit and I want to fit a machine-learning model to it so that I may be able to generate the forecast of withdrawal and deposit for its various branches for the next month.
Currently, I am considering the data for the last two years i.e. from Jan 2021 to Mar 2023
I have two basic questions
- I have discarded the data before 2021 due to the Covid pandemic, as it affected very much the banking system, am I doing it right, or should I also include it in my dataset?
- My time series data consists of weekdays only (Monday to Friday) and does not include any sample point for weekends as banks are closed on weekends. Should I train my model on this data or should I include zeros against those dates of weekends?
------------------------------
Gdin ABL
------------------------------
#AIandDSSkills