SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  Using train dataset to predict the test dataset

    Posted Wed May 22, 2024 11:29 AM

    good morning,

    I am sure you are familiar with the titanic dataset predicting survival rate. I created regression model, one linear discriminant analysis model, one Naive Bayes model, and one K-Nearest Neighbor model to predict survival using the train set. I am suppose to predict the test set data using all four models. However, the test does not include the Survived variable. How can I complete this task?



    ------------------------------
    Hesham Dabbas
    ------------------------------


  • 2.  RE: Using train dataset to predict the test dataset

    Posted Thu May 23, 2024 11:07 AM

    Hi @Hesham Dabbas,

    What is your source for the titanic.sav dataset?

    I googled for it and found
    https://github.com/datasciencedojo/datasets/blob/master/titanic.csv

    The second variable in the file is "Survived", dichotomous, coded 0 and 1.   Is this the same dataset you are referencing?



    ------------------------------
    David Dwyer
    SPSS Technical Support
    IBM Software
    ------------------------------



  • 3.  RE: Using train dataset to predict the test dataset

    Posted Thu May 23, 2024 11:35 AM

    Thank you. The instructor provided two separate files: train and test. The train include the Survived variable but the test does not. Please see attached. 



    ------------------------------
    Hesham Dabbas
    ------------------------------

    Attachment(s)

    csv
    test.csv   29 KB 1 version
    csv
    train.csv   56 KB 1 version


  • 4.  RE: Using train dataset to predict the test dataset

    Posted Thu May 23, 2024 12:48 PM

    Well, you can predict the outcomes in the test dataset (using the Scoring Wizard), and you can summarize those outcomes, but since you can't evaluate the quality of the predictions without the survived variables, maybe the instructor wants your predictions dataset.

    You can compare the predictions across the different estimated models.  It would be interesting to see how well they agree.



    ------------------------------
    Jon Peck
    ------------------------------



  • 5.  RE: Using train dataset to predict the test dataset

    Posted Fri May 24, 2024 08:35 AM

    Thank you. Do you have or recommend instruction on how to use the scoring wizard?



    ------------------------------
    Hesham Dabbas
    ------------------------------



  • 6.  RE: Using train dataset to predict the test dataset

    Posted Fri May 24, 2024 09:44 AM
    In the estimation procedure, save the model (Export Model Information to XML file).
    Then use Utilities > Scoring Wizard after switching to the test dataset, select the model you saved and the variables you want to create.

    --