Global AI and Data Science

 View Only
Expand all | Collapse all

Is there a data leakage in the model ?

  • 1.  Is there a data leakage in the model ?

    Posted Wed November 18, 2020 06:58 PM

    Hello there, 

    I have a time series model with too high accuracy,
    which is why I'm certain there is a data leakage from the y values to the X values,
    i just cant proof it without a doubt, that's why i need your help.


    The model is a RF Regressor on time series data.


    The data is an un-equally spaced data which was filled with 0 values where there was no data,
    meaning a value was not recorded that day.

    Now, i have data for the values from these points in time :
    t-2 to t-1 
    t-1 to t
    t-2 to t

    I want to predict the value from t to t+1, therefore my hypothesis is that i can utilize t-2 to t-1 and t-2 to t 
    in order to get t-1 to t and t-1 to t+1, thus predicting the value from t to t+1.

    so my final model is : RF Regressor ( X = ['Date','Value from t-2 to t-1'] y = ['Value from t-1 to t'] )

    So my final question is if this model is wrong, how theoretically,
    can i proof that this model is using part of the label from y in the X parameters ?

    Thank you.

    daniel millionshik


  • 2.  RE: Is there a data leakage in the model ?

    Posted Thu November 19, 2020 06:11 AM
    How can we help you?
    Are you looking for literature about data leakage?
    Would you feel better if you could proof that you have a data leakage?

    Matthias Jungbauer