Thanks for the help, I might have to narrow down the amount of financial figures I want to research and combine them in a list of defined dependent variables. Then play with some statistical tests to see some concrete relationships and patterns.

The amount of data and all the (country-specific) limitations and assumptions are a bit overwhelming. I agree that me just obtaining my degree is in interest of both me and the University, but I have a feeling I am just heading towards some generic conclusions, paired with numerous assumptions and limitations which impacts the significances of my findings. On the other hand, maybe I am overestimating what is expected from me.

Anyway, thanks again for your time and effort!

Original Message:

Sent: Thu April 18, 2024 11:19 AM

From: Marianne Pelletier

Subject: [Q] Which statistical tests to use for aggregated data?

It's hard to start testing data unless you know what your dependent variable is. Since you are still exploring, start with cluster analysis to see if there is a pattern that leads you to obvious dependent variables. For instance, you may find that stock market investment turns out to be the clearest delineator among your segments.

After that, discriminant analysis might help but you'll need to know what it's doing and what the results mean. something simpler, like a decision tree, might take less time to understand. After all, your goal is to get a degree, not do this all the time. But with both of these, you need a defined dependent variable.

If you want to use all of your dependent variables at once, see here: SPSS #34 - Multivariate Anova - 1 Independent & 2 Dependent (MANOVA)

You are reminding me of why I make some money helping PhD students complete their theses!

------------------------------

Marianne M. Pelletier

Managing Director

Staupell Analytics Group

marianne@staupell.com

www.staupell.com

Original Message:

Sent: Thu April 18, 2024 11:12 AM

From: Lukas Peric

Subject: [Q] Which statistical tests to use for aggregated data?

Maybe a combination of more types of variables, since the 'impact' can not be identified as one variable?

I am thinking of comparing numerous figures to ideally discover a significant change in the covid year compared to its previous years.

These variables can consist of:

1. Ranked variables, which I would need to define myself (e.g. level of risk aversion based on their portfolio composition)

2. Categorical variables (e.g. decrease investment in stocks, increased preference for real estate, etc.)

3. Scalar variables (e.g. significant percentage changes in financial figures, ratio's)

I haven't decided how to measure 'impact' exactly, but variables above should give you a better idea of what I can be looking for.

Ultimately I want to discover various financial figures which were stable before covid-19, that had a significant change during covid-19, to increase the probability of the change being the result of covid-19.

Others have pointed me to use these statistical tests; Hotelling's t-square test, weighted least squares, Fisher's Linear Discriminant Analysis

------------------------------

Lukas Peric

Original Message:

Sent: Thu April 18, 2024 10:54 AM

From: Marianne Pelletier

Subject: [Q] Which statistical tests to use for aggregated data?

Hi, Lukas, When you say "impact", do you mean that as a ranked variable, a binary variable, or a scalar variable? Once you define that, I can help.

------------------------------

Marianne Pelletier

Original Message:

Sent: Wed April 17, 2024 07:50 AM

From: Lukas Peric

Subject: [Q] Which statistical tests to use for aggregated data?

Dear Marianne,

Raw level data of household finances is quite confidential, and me as a bachelor's thesis writer I do not think I will be able to get access. Besides that, requesting raw level data now might be a bit too late and is maybe not the best idea due to time constraints of handing in my thesis.

How can I use a decision tree to parse out the relative levels of my financial figures? I believe it is some kind of machine learning technique? Does this require pre-knowledge of programming for example?

My aim is to examine the impact of the covid-19 pandemic on household portfolio choice, however this doesn't consist of one exact measure, so I have multiple dependent variables I want to test which influence household portfolio choice, these mostly consist of the before mentioned financial figures, ratio's and percentages such as: share of financial assets on total assets, percentage of households holding debt, real estate assets, etc.

I want to emphasize that I do have the standard errors for each datapoint (so for each country each separate group figure)

------------------------------

Lukas Peric

Original Message:

Sent: Wed April 17, 2024 07:34 AM

From: Marianne Pelletier

Subject: [Q] Which statistical tests to use for aggregated data?

My first question is whether you can get at the raw data from the surveys. If not, consider using a decision tree to parse out the relative levels of each of your measurements to see if you get a clearer picture of the data. What is your dependent variable?

------------------------------

Marianne Pelletier

Original Message:

Sent: Tue April 16, 2024 06:29 AM

From: Lukas Peric

Subject: [Q] Which statistical tests to use for aggregated data?

Hey all, I am writing my finance bachelor's thesis on the impact of Covid-19 on household's portfolio choices across different wealth groups in the BeNeLux area (Belgium, Netherlands, Luxembourg). The data comes from European Central Bank: (https://www.ecb.europa.eu/stats/ecb_surveys/hfcs/html/index.en.html) and consists of different financial figures of households separated per country and per household wealth group (6 groups: bottom 20%, 20-40%, 40-60%, 60-80%, 80-90%, 90-100%). I have data of 4 waves (2011, 2014, 2017, 2021). With these years being the independent variable (2021 as year of focus since this is mid-covid). Besides just plotting the figures in graphs to check for any significant changes, I would like to run some statistical tests and regressions to test the significance of any differences of the year 2021 to the other three waves (2011, 2014, 2017)

Figures I will mainly focus on include:

A3 Net wealth, medians

A4 Net wealth, means

B3 Real assets, ownership of HMR

B5 Real estate assets, conditional medians

C4 Financial assets, conditional medians

C5 Financial assets, has shares

D4 Share of financial assets on total assets

E5 Percentage of households holding debt

E6 Total debt, conditional medians

F4 Median debt to income ratio

F5 Median debt service to income ratio, among households with debt payments

F6 Median debt to assets ratio – breakdowns

G3 Regular expenses less than income

As you can see these figures consist of medians, means, ratio's and %'s of each seperate wealth group. I do have the standard errors for each datapoint (so for each country each separate group figure)

With these figures being aggregate data from a large survey, I am not sure which statistical tests and what kind of regressions I can use. I heard from my supervisor to aim for 30-50 datapoints per regression, however my data only consists of figures (means, medians, ratio's) of 6 large groups. This would leave me with 6 data points per country per financial figure, so 18 datapoints per financial figure per year, so 72 datapoints per financial figure across the 4 years. With these figures being aggregate data, do these datapoints suffice for a regression analysis? (if so, which type?)

Could anyone advise me on which statistical tests and regressions to use with this data, to check whether the year 2021 is significantly different from the others, other than just plotting graphs? Thanks in advance.

------------------------------

Lukas Peric

------------------------------