SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  Optimal Binning MDLP Method

    Posted Fri February 04, 2022 08:50 AM
    Hello all,

    I am using the MPLP Method of the Optimal Binning command and am curious to know if there is any documentation around how this command would handle optimization if the variable to be binned and the guide variable do not have a relationship to one another? For example if both values of a binary guide variable had the same mean, standard dev, and distribution of the binned variable.

    Thanks for you help!

    ------------------------------
    Brian Joy
    ------------------------------

    #SPSSStatistics


  • 2.  RE: Optimal Binning MDLP Method

    Posted Fri February 04, 2022 09:02 AM
    Brian;

    I see nothing in the Algorithms manual that directly (or even indirectly as far as I can tell!) answers your question, but I am by no means a domain expert.

    However, the manual has these references; perhaps they will be of some help to you.

    Fayyad, U., and K. Irani. 1993. Multi-interval discretization of continuous-value attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, San Mateo, CA: Morgan Kaufmann, 1022–1027.

    Dougherty, J., R. Kohavi, and M. Sahami. 1995. Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, Los Altos, CA: Morgan Kaufmann, 194–202.

    Liu, H., F. Hussain, C. L. Tan, and M. Dash. 2002. Discretization: An Enabling Technique. Data Mining and Knowledge Discovery, 6, 393–423.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 3.  RE: Optimal Binning MDLP Method

    Posted Fri February 04, 2022 09:02 AM
    MDLP?

    --





  • 4.  RE: Optimal Binning MDLP Method

    Posted Fri February 04, 2022 09:33 AM
    Edited by System Admin Fri January 20, 2023 04:09 PM
    @Rick Marcantonio and @Jon Peck thanks for your quick replies this morning.

    I've done a little bit of  testing on synthetic ​​​data this morning and have found that SPSS throws a helpful warning message in this case: "Unable to create bins for the following binning input variables because of weak or no association with the guide variable: binVar."

    Below is the code that I used to determine this:

    INPUT PROGRAM.
    LOOP ID=1 to 1000.
    END CASE.
    END LOOP.
    END FILE.
    END INPUT PROGRAM.
    EXECUTE.
    
    DATASET NAME TestData.
    
    SET SEED=1000.
    
    DO IF ID <= 500.
    COMPUTE guideVar = 1.
    ELSE IF ID <= 1000.
    COMPUTE guideVar = 2.
    ELSE.
    COMPUTE guideVar = $SYSMIS.
    END IF.
    EXECUTE.
    
    COMPUTE binVar = RV.NORMAL(100, 10).
    EXECUTE.
    
    * Optimal Binning. 
    OPTIMAL BINNING 
      /VARIABLES GUIDE=guideVar BIN=binVar SAVE=NO 
      /CRITERIA METHOD=MDLP PREPROCESS=EQUALFREQ (BINS=1000) FORCEMERGE=0 LOWERLIMIT=INCLUSIVE 
        LOWEREND=UNBOUNDED UPPEREND=UNBOUNDED 
      /MISSING SCOPE=PAIRWISE 
      /PRINT ENDPOINTS DESCRIPTIVES ENTROPY.​


    ------------------------------
    Brian Joy
    ------------------------------