Dear Tiago / Graph Tool community,
I am running the SBM on a directed network where edges are all conditional probability of one event, given the other, approximately 60 nodes.
This is generated originally from a binary matrix of repeated samples and if these 60 nodes as variables occurred or did not. However, in this binary matrix I have some missing data, which cannot be easily imputed, and I suspect is not missing at random.
I am keen to find a way to either control for this as a covariate, or alternatively show that the missingness of data does not cause a big issue in the SBM fit.
One possibility I have been exploring is by computing a conditional probability network of the missingness, and passing that as an additional edge covariate to the model.
So, in this instance I end up with two SBMs:
- 1 with a single edge weight, the conditional probability of the events occurring. The community structure of this is meaningful in its interpretation.
- 1 with two edge weights, the conditional probability of the events occurring, and also the conditional probability of the events being missing from the dataset (missingness matrix). The community structure of this is not as meaningful - largely events that are missing together will just cluster together. (For instance, I have several nodes representing given decades of life which will all just cluster together, as if being 20 is unknown then if they are 30 is also unknown.)
What is the best way to robustly compare the two SBM models and (hopefully) illustrate the model fit is better with the conditional probability of the event alone?
Alternatively, do you have any suggestions on how I would account for the missingness in the data when constructing the model?
Thank you for your time,
James