Global Data Science Forum

Expand all | Collapse all

Is low correlation (with a target) enough to dismiss a feature from a baseline model?

  • 1.  Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Mon November 02, 2020 07:20 AM
    I'm wondering if there is any reason to keep a feature from a dataset in order to perform prediction even if it doesn't have a significant correlation to the target.
    Has anyone been able to take advantage of such a feature?

    Thanks

    ------------------------------
    [Marco] [Sánchez Sorondo]
    [UBA]
    ------------------------------


  • 2.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Tue November 03, 2020 09:11 AM
    Edited by Rick Marcantonio Tue November 03, 2020 10:00 AM
    "I'm wondering if there is any reason to keep a feature from a dataset in order to perform prediction even if it doesn't have a significant correlation to the target?"

    There might be; it's hard to say, given this level of information. Personally (and assuming that all assumptions for such analyses are met and/or dealt with), if I am looking at some kind of regression model, I am looking not at the Pearson correlation but at the part and partial correlations. I encourage you to take a look at some sources for those if you are unfamiliar with them. For example: https://www.statisticssolutions.com/what-are-zero-order-partial-and-part-correlations

    You can obtain these in the REGRESSION procedure of SPSS Statistics by requesting /STATISTICS ZPP (zero, part, and partial correlations)

    Also, consider in your model the role of variable; whether a mediating or moderating variable, for example. See sources like https://www.statisticshowto.com/mediator-variable/

    Apart from these, sometimes it's interesting to know what variables (that I thought would be) are NOT related in the way I thought they were, at least in my data sample.

    Rick M

    ------------------------------
    Rick Marcantonio Quality Assurance
    Quality Assurance
    IBM
    IL
    ------------------------------



  • 3.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Thu November 05, 2020 09:41 PM
    Interesting... Didn't know those correlations existed. I'm not much of and SPSS guy but R seems to have functions that compute them. Good tools for making the analysis more complete.

    ------------------------------
    [Marco] [Sánchez Sorondo]
    [UBA]
    ------------------------------



  • 4.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Tue November 03, 2020 05:14 PM
    Hi Marco, 
    it depends on the nature of the data and the feature you're looking at. I've worked with time series where a feature might be a good predictor due to some time-domain or frequency domain attributes of it. Correlation only shows the linear dependency between the two variables but there might be a non-linear dependency that will be missed by only looking at correlation. looking at entropy, mutual information, etc. Also, there might be components inside of that feature that could have useful information about the target. doing a PCA and looking at each component separately and getting rid of the less important components might help. In general, it depends on what kind of data you're looking at.

    ------------------------------
    Bahareh Atoufi
    ------------------------------



  • 5.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Wed November 04, 2020 07:48 AM
    Hi Marco,

    sometimes correlation can be non-linear. So, before you reject this future just check tranasformation with log(x) or something similar.

    best regards

    Pawel

    ------------------------------
    Paweł Niklewicz
    ------------------------------



  • 6.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Fri November 06, 2020 07:24 AM
    Do you mean the entire feature set or just one feature?

    ------------------------------
    [Marco] [Sánchez Sorondo]
    [UBA]
    ------------------------------



  • 7.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Mon November 09, 2020 05:45 AM

    What do you mean with low correlation?

    If you are thinking about dismissing a feature, what made you add the feature to the model in the first place?

    ------------------------------
    Matthias Jungbauer
    ------------------------------



  • 8.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Mon November 09, 2020 12:04 PM
    That's the question: Should I add the feature or not?
    With low correlation I mean computing the pearson correlation between the feature and the target.

    ------------------------------
    [Marco] [Sánchez Sorondo]
    [UBA]
    ------------------------------



  • 9.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Tue November 10, 2020 05:53 AM
    In my humble view low correlation alone is not enough to dismiss a feature.

    ------------------------------
    Matthias Jungbauer
    ------------------------------



  • 10.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Tue November 10, 2020 12:51 PM
    Hello Marco,

    Hope you were fine, about your question, a low linear correlation coefficient (Pearson Coeffi) is not enough information to drop a feature in the modeling phase.

    (working with python)

    If you are facing a Regression problem, one good approach is training an OLS model from the package StatsModels with all the continuous features and view the model.summary() report.

    link:
    model - https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html
    scores - https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.score.html#statsmodels.regression.linear_model.OLS.score


    On the other hand, if you are facing a Classification problem with categorical features. Random Forest has a `feature importance`property that it would help.

    links:
    model - https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier

    A good place to found more information about the correlation:
    http://campus.murraystate.edu/academic/faculty/cmecklin/STA565/_book/correlations-multiple-and-partial.html




    ------------------------------
    Franco Yair Benko
    ------------------------------



  • 11.  RE: Is low correlation (with a target) enough to dismiss a feature from a baseline model?

    Posted Tue November 10, 2020 08:39 PM
    Thanks! Good opportunity to try statsmodels as well... Never used it.

    ------------------------------
    [Marco] [Sánchez Sorondo]
    [UBA]
    ------------------------------