SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 10:41 AM
      |   view attached

    Hello there,

    We have run into a strange error. We have a large dataset where we compute 25 different scores and then the mean of those scores for each case. Each case where the mean score is over 0.5 is considered to have met the standard and we report out the percentage who met the standard overall, and for various subgroups. We expected this to be really straightforward, but something is happening. The calculation of whether a case met the standard--which is simply a test of greater than or equal to--is failing. For a small percentage of cases, the mean score is equal to the cutoff value, but the MetStandard variable is being assigned 0 and we do not understand why.

    I am attaching a dataset for illustration purposes, in case that helps. It has 2500 cases. Each case has up to 25 scores (Scale1 through Scale25) that range from 0.2 to 1.0. The mean score and the variable indicating whether or not the variable met the standard are assigned as follows:

    COMPUTE MeanScale=Mean(Scale1, Scale2, Scale3, Scale4, Scale5, Scale6, Scale7, Scale8, Scale9, 
        Scale10, Scale11, Scale12, Scale13, Scale14, Scale15, Scale16, Scale17, Scale18, Scale19, Scale20, 
        Scale21, Scale22, Scale23, Scale24, Scale25).

    COMPUTE MetStandard=MeanScale >= 0.5.

    There are 9 cases where MeanScale is 0.5, but MetStandard was assigned 0. I noticed that rounding the MeanScale before doing the comparison seems to resolve this problem (at least in this particular dataset, though I do not know why that should matter given the cases with issue were exactly 0.5). Is there some underlying limitation that I should be aware of that explains this? Are there other problems that issue can cause? 

    Thank you for any guidance! 



    ------------------------------
    A. R. Afework
    ------------------------------

    Attachment(s)

    sav
    IllustrationDataset.sav   461 KB 1 version


  • 2.  RE: Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 11:07 AM
    The calculation are correct.  If, for example, you look at case 127, it looks like the value is .50000, but if you expand the number of decimals, you can see that the exact value is actually .4999999999999999





  • 3.  RE: Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 11:39 AM

    Hi Jon, 

    Thank you for looking at my dataset and my question.

    So for rows 119-127, I can see that if I increase the variables to 16 places after the decimal, SPSS has calculated the mean as less than 0.5. On the plus side, this means that the greater than or equal to operation is fine. Unfortunately, this insight highlights a couple of issues:

    1. Originally--when created and when the calculation was done--the variable was only set to have two decimals. So SPSS is apparently storing additional decimal places but only displaying 2. This is strange.
    2. The mean function then seems to be the issue by returning an incorrect value. For rows 119-127, all of those means are exactly 0.5.
      1. Specifically case 127 has 22 individual scores, which sum to 11 so the mean is 0.5.


    ------------------------------
    A. R. Afework
    ------------------------------



  • 4.  RE: Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 01:35 PM
    The computed sum is .10.9999999999999980 for case 127.  This is due to the nature of the floating point arithmetic hardware used in all modern computers - at least all that I know about.  Double precision floating point values have 53 binary bits of precision (the rest is for the exponent), for representing numbers in terms of powers of 2.  So, although very close, there are infinitely many numbers that cannot be represented exactly.  All floating point arithmetic is therefore approximate even though the values are extremely close.  You can dig into the details of floating point hardware on Wikipedia (see below)

    If computations were carried out in base 10, decimal arithmetic, you would not see differences with decimal values.  It is possible to write code to do this, but it is hundreds of times slower than binary arithmetic since it has to be done in software rather than the usual hardware, so it is not used in scientific computation.  The Decimal data type in Python provides this computation and could be used in SPSS via the SPSSINC TRANS extension command.

    SPSS Statistics provides a fuzz bits setting for the RND and TRUNC functions that deals with boundary cases.  The Edit > Options > Data help provides information on this, but it would not affect functions like SUM and MEAN.

    If you compute all the partial sums of the scale variables, you see for case 127 that the first inexact number appears at sum16, i.e., sum(scale1 to scale16).  BTW, you can abbreviate syntax with TO in the variable list.

    In computingfloating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers.[1]: 3 [2]: 10  For example, 12.345 is a floating-point number in base ten with five digits of precision:

    However, unlike 12.345, 12.3456 is not a floating-point number in base ten with five digits of precision-it needs six digits of precision; the nearest floating-point number with only five digits is 12.346. In practice, most floating-point systems use base two, though base ten (decimal floating point) is also common.

    Floating-point arithmetic operations, such as addition and division, approximate the corresponding real number arithmetic operations by rounding any result that is not a floating-point number itself to a nearby floating-point number.[1]: 22 [2]: 10  For example, in a floating-point arithmetic with five base-ten digits of precision, the sum 12.345 + 1.0001 = 13.3451 might be rounded to 13.345.


    --





  • 5.  RE: Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 02:39 PM

    Ah. I appreciate your very thorough reply. I now understand what is happening. Still not necessarily satisficed by the behavior, but it's now clear to me why rounding the mean before doing the comparison gave a different answer--the expected one--despite the mean "already" being "exactly" 0.5. And since I understand, I can work around it. 

    Thank you for taking the time!



    ------------------------------
    A. R. Afework
    ------------------------------



  • 6.  RE: Error with Relational Operators (using >= in a COMPUTE)

    Posted Fri December 22, 2023 04:06 PM
    For interest, here is an example of doing the mean computation using decimal arithmetic.  First, a small program is defined that does the computation of its argument using decimal arithmetic.  Then that function is called for each case using the spssinc trans command.  This gives the exact decimal sum.

    spssinc trans is an extension command.  It can be installed via Extensions > Extension Hub if you don't already have it, but if you do, you should update it from the Hub as important improvements were made to it a few months ago.

    Of course, rounding the sum from the MEAN function will work, but this illustrates the principle.


    begin program python3.
    import decimal

    def decimalmean(args):
        try:
            thesum = float(sum(decimal.Decimal(item) for item in args if item is not None))
            nmcount = len(args) - args.count(None)
            return thesum / nmcount
        except:
            return None
    end program.

    spssinc trans result = themean
    /variables Scale1 to Scale25
    /formula "decimalmean([<>])".


    --