Hello, dear community,
I am currently conducting a research project at my college, but I have never encountered a situation like this before. I have many doubts and would like to find the most appropriate ethical and statistical approach to the following scenario:
As part of collecting socio-demographic data, I am asking participants, "Which substances have you consumed in the last month?" I decided that a multiple-response format would be best, as it keeps the number of items to a minimum, helps avoid participant fatigue, and allows respondents to select more than one substance (alcohol, tobacco, or drugs) if applicable. This method helps reduce response bias.
However, I am using SPSS v.24 to manage and analyze my data. After exploring the software's syntax and functions, I identified two potential solutions:
- Using the "Multiple Responses" function for the question "Which substance(s) have you consumed in the last month?" My online form generates three sub-variables for a single question-one for each substance-and each sub-variable offers the options "Yes, I have consumed it," "No, I have not consumed it," or "I would prefer not to answer." In SPSS, I went to Analyze > Multiple Responses > Define Variable Sets, selected these sub-variables, and created a new variable that combines them. However, when I request frequency tables, I only see how many participants selected each substance individually (e.g., how many chose alcohol, tobacco, or drugs), but I do not see how many selected more than one. Nevertheless, many tutorials, handbooks, and textbooks recommend this approach.
- Using a syntax-based approach to create a variable for each combination that appears in my dataset. A classmate helped me write SPSS code to obtain frequencies and graphs for how many people chose tobacco, alcohol, both alcohol and tobacco, or none of the above. I find this method more ethical because it reflects every possible response in the same way participants answered.
My questions are: Is it statistically valid to present data using the second method? Is it methodologically sound to present the data that way? And why do so many sources recommend the first method for addressing these kinds of problems?
Thank you very much for reading and for taking the time to share your knowledge.
------------------------------
JOSE ACEVEDO
------------------------------