webMethods

webMethods

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Multi-Document Statistics

    Posted Tue June 04, 2002 10:09 PM

    As stated in an earlier question about sortby, I am building a repository of (anonymous) information about surgical procedures performed at a large research hospital over a period of several years. This data is used in support of statistical studies of clinical questions.

    In a typical study, we first identify a patient population that satisfies the study criteria (which often are very specific) and then extract selected data from those patients’ records for statistical analysis. So far I have not been able to find a simple way using Quip to first identify a group of patients and then do simple calculations “across” the group. I would appreciate any suggestions the user community might make to address the following example.

    In the general form of this example, we want to know the averave nunmber of red blood cell units given during or after surgery (, ) to patients in each of three groups who underwent a particular surgical procedure in each of four years (1990-1993). One XML document contains the complete relevant record for one patient and may contain reports of several surgeries and other procedures and tests. If a patient did not receive red blood cell units, the and elements will be missing, even though these patients need to be counted when calculating the average.

    In simplified form, the following query correctly identifies one group of patients and operations for the year 1990:


    {
    for $b in collection(“Demo500”)/PatientRecord
    [.//Surgery[./@SurgeryDate > “1989-12-31” and ./@SurgeryDate < “1994- 01-01”]/RevascularizationProcedure],
    $c in $b//Surgery[./@SurgeryDate > “1989-12-31” and ./@SurgeryDate < “1994-01-01”][./RevascularizationProcedure]
    where empty($c//ValveRepair) and
    empty($c//ValveReplacement) and
    empty . . .
    return
    (
    for $Pri90 in $c[substring(string-value(./@SurgeryDate), 1, 4) = ‘1990’ and
    substring(string-value(./HistoryOfCardiacOperations), 1, 2) = ‘No’]
    where not(contains(string-value($Pri90/EmergencySurgeryIndication), ‘Emergent’))
    return

    {
    (
    A-Elective Primary CABG,
    {string-value($b/@PatientID)},
    {string-value($Pri90/@SurgeryDate)},
    $Pri90//RBCUnitsIntra,
    $Pri90//RBCUnitsPost
    )
    }

    )
    }

    At this point, $Pri90 contains a forest of surgery nodes, each of which contains details about one operation of interest. I can easily extract information (e.g., ) about individual operations, but so far have not been able to calculate statistics (i.e., average ) for the entire group.

    Any suggestions? Would the ‘concatenate’ function when implemented allow me to combine these nodes into a sequence that could be processed with a Let statement?

    Thanks again.

    Bill


    #webMethods
    #Tamino
    #API-Management


  • 2.  RE: Multi-Document Statistics

    Posted Fri June 07, 2002 03:52 PM

    XQuery provides functions to calculate the sum
    or the average value of a list of numbers, e.g.
    try sum(1 to 10).

    So for your example you should some up all the
    blood values you find in the patient records
    and divide this by the size (use function ‘count’
    of all relevant patient records) (you could try to
    use the avg function, but since there are records with no entry this might get not exactly the result you want.)

    Take care that the function ‘sum’ is realy invoked
    on a list of numbers and not on a list of string
    values or even nodes. You will probably need to
    use a constructor, which converts some string into
    a number.

    Tiny example:
    let $xs :=


    s
    387.4


    e


    p
    387.4


    let $sum := sum(for $x in $xs/patient/va/text() return float($x))
    return $sum div count($xs/patient)

    Sven Eric


    #Tamino
    #webMethods
    #API-Management


  • 3.  RE: Multi-Document Statistics

    Posted Fri June 07, 2002 10:13 PM

    Thanks for responding to both of my questions. I will try to apply your strategy to the “average RBC units” calculation.

    Bill


    #Tamino
    #API-Management
    #webMethods


  • 4.  RE: Multi-Document Statistics

    Posted Tue June 11, 2002 11:36 PM

    I have been experimenting with your “Tiny Example” and would like to clarify a few things. Sometimes the simplest cases are the most enlightening, or maybe just the “dumbest.” So at the risk of asking some dumb questions, here goes.

    1.) My original question was not about the sum, avg, and count functions but rather about how to get my data into a form where I could use them. Recall, I can construct a variable $Pri90 which contains a forest of nodes, each of which contains some of the data for which I want to generate statistics.

    It’s as if, insteasd of beginning your Tiny example with “let $xs := …” you had said:

    for $xs in


    s
    387.4


    e


    p
    378.4
    200.2


    Now how do I calculate the average value of all the va’s? In other words, in the language of the W3C Xquery 1.0 (2.9) how can I duplicate the effect of the let statement to convert the tuple of variable bindings (which I have) into a variable which is bound directly to the entire expression (which sum and other functions could process correctly)? Can’t just say let $c := $xs because $c will continue to be a collection of independent values.

    2.) When I tried to actually run your Tiny example, along with an assortment of “missing parentheses” and “missing return clause” messages, I also got one that said “Unknown function - float”. You were quite right, I do need floating point to do what I need to do, so that could become a problem for us after awhile.

    3.) I also have a have a simple scope problem.
    The following code works fine:

    {
    let $xs :=


    s
    387


    e


    p
    200



    return
    {$xs/patient/va}
    }

    producing:


    387
    200


    If I replace

    return
    {$xs/patient/va}
    }

    with

    return
    count($xs/patient/va)
    }

    I also get a correct result

    2.

    But if I try to get both results using

    return
    $xs/patient/va,
    count($xs/patient/va)
    }

    or

    return
    count($xs/patient/va),
    $xs/patient/va
    }

    (with assorted arrangements of commas and curly brackets), I get the error message

    Error: (20,3) Unknown variable xs.

    What is wrong?

    Thanks.

    Bill


    #Tamino
    #API-Management
    #webMethods


  • 5.  RE: Multi-Document Statistics

    Posted Wed June 12, 2002 10:41 AM

    Let me start with the answers to your last two
    question, about the first question I need to read
    closer in order to understand, what exactly your problem is:

    2.) sorry that the ‘float’ constructor function is
    not yet in the released version of quip. The next
    version, which will be released around next
    monday, will contain that function. Furthermore I
    am sorry that my example did not work directly.
    Actually I tested it before posting it, but
    probably copy/paste was not performed correctly by me and some parts got lost


    3.) As you might have allready guessed, you have a
    grouping problem here. Simply compare:
    for $x in (1,2)
    return ,

    and
    for $x in (1,2)
    return (
    , )



    I gues now you see what was your problem. If you
    do not put the two results you want to return in
    parantheses, then only the first one belongs to
    the reurn clause, i.e.
    for $x in (1,2)
    return
    ,

    is parsed as:

    (for $x in (1,2)
    return

    , )

    Sven Eric


    #webMethods
    #API-Management
    #Tamino


  • 6.  RE: Multi-Document Statistics

    Posted Wed June 12, 2002 05:25 PM

    I am sorry, I am not sure that I grab your first
    question. What is the problem? If you have a
    document of all your patient data:


    a{6}
    b
    c{1}
    d{8}
    e{4}

    you can bind this to a variable:

    let $patients :=

    a{6}
    b
    c{1}
    d{8}
    e{4}

    return $patients


    You can bind the number of patients in a varaible:

    let $patients :=

    a{6}
    b
    c{1}
    d{8}
    e{4}

    let $numberOfPatients := count($patients/patient)
    return $numberOfPatients

    or you can sum up values in the patients records:



    let $patients :=
    let $patients :=

    a{6}
    b
    c{1}
    d{8}
    e{4}

    let $numberOfPatients := count($patients/patient)
    let $numberOfPatients
    := sum($patients/patient/val/node())
    return $numberOfPatients

    and eventually you can iterate over patient and
    filter them:


    let $patients :=

    a{6}
    b
    c{1}
    d{8}
    e{4}

    let $numberOfPatients := count($patients/patient)
    let $sumOfValues
    := sum($patients/patient/val/node())
    let $avg := $sumOfValues div $numberOfPatients
    for $patient in $patients/patient
    where empty($patient/val/node()) or $patient/val/node() < $avg
    return {$avg,$patient}



    Does this in some respect cover your question?

    Sven Eric


    #webMethods
    #API-Management
    #Tamino


  • 7.  RE: Multi-Document Statistics

    Posted Thu June 13, 2002 11:27 PM

    My original concern – possibly misguided – with using ‘let’ the way you suggest was a feeling that with a large and complex data base it might lead to unneccessarily long response times or overflow conditions somewhere, or both.

    The operational system we have in mind currently contains about 172,000 anonymous patient records with an average size in XML of more than 13 kb, based on a sample of 500 records, and grows at the rate of 25 records per day, seven days a week. I have been trying to demonstrate the use of QUIP for nine samples selected from our files of more that 200 actual searchs conducted during the last two years.

    The problem at hand is the following:

    Find the total number of elective and emergency primary and re-op isolated coronary- artery-bypass-graft (CABG) patients between 1990 and 1993, and the mean red blood cell units (RBC’s) used per patient each year. (Primary means first open-heart surgery, re-op means any open-heart surgery after the first one, isolated means no other procedure performed at the same time as the bypass.) Data are to be presented in three groups:

    1) Elective (anyone that is not an emergency) Primary Isolated CABG
    2) Reoperations (elective or emergent) Isolated CABG
    3) Emergent Primary CABG

    Combine the Intra and Post-Op RBC’s and list the mean units per patient for each year. (The Intra and Post-op units are stored in the XML data model as separate sub-elements under the complex element , which is located at the 6th level in the hierarchy. I expect to flatten this out somewhat in the next prototype design.).


    At a detail level we are completing the following table.

    —Average RBC’s per patient–
    Year 1990 1991 1992 1993

    Group 1 x x x x
    Group 2 x x x x
    Group 3 x x x x

    In addition to the detailed data requested we are usually also expected to provide a set of base data about each of the patients in the selected group. This includes items like birth date, ***, prior diagnoses for certain conditions, prior blood test results, status as a smoker, etc. It is therefore unacceptable to retrieve detailed data without the ability to retrieve other associated patient-specific information.

    My approach to this problem was to begin with:

    for $b in collection (“Demo500”)/PatientRecord[./EpisodeOfCare/TherapeuticProcedures/Surgery[./@SurgeryDate > “1989-12-31” and ./@Surgery < “1994-01-01”]/RevascularizationProcedure],

    $c in $b/EpisodeOfCare/Management/TherapeuticProcedures/Surgery[./@SurgeryDate > “1989-12-31” and ./@SurgeryDate < “1994-01-01”][./RevascularizationProcedure]

    ($b contains complete patient records, $c contains only the surgical details for those operations which included revascularization (CABG).)

    I would retrieve the same set of patients and the same set of operations if I said:

    let $b := collection(“Demo500”)/PatientRecord???.. ,
    $c := $b/Episode???..

    Next, to identify the operations that were isolated CABG, I started adding conditions in a where clause. For example,

    for $b in collection (“Demo500”)/PatientRecord??? ,
    $c in $b/Episode???.
    where empty($c//ValveReplacement)

    continues to produce a correct (but shorter) list of patients and procedures, while

    let $b := collection (“Demo500”)/PatientRecord??? ,
    $c := $b/Episode???.
    where empty($c//ValveReplacement)

    does not produce any output at all.

    To complete the query I still have to specify six more empty conditions, divide the results into twelve sets (three groups x four years), and calculate the required averages for each set, including the fact that any given operation may have both intra-op and post-op numbers that have to be taken into the same average.

    Bill


    #webMethods
    #API-Management
    #Tamino