You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have datasets of different sizes and would like to compare the network metrics of these various datasets. However, through my own troubleshooting I see that the replicate size has an effect on the network metrics and so it would be ideal to have the same replicate size when comparing networks.
My Question is, how do people handle different replicate sizes when constructing networks they intend to compare?
Here, I have four datasets, Resistant has 39 samples and the others have 150,170,200. So I ran this loop to sample 10 replicates up to 200 replicates in intervals of 10 to see if there was a saturation point where adding more samples no longer has an effect on network metrics. Observationally, anything over 50 really isnt adding much clarity. The red vertical line represents 39 samples. This is the same idea when looking at read depth in a rarefaction curve.
Should I just sample at the lowest replicate number among the datasets I plan to compare so each dataset randomly pulls 39 replicates? The thing is that the 39 samples that happen to be pulled from the larger datasets will be slightly different each time. If I wanted to capture that variation I could iteratively sample, say 1000 times, but then my Resistant dataset would have no variation because the max number of its samples are being sampled which is a bias. So then I should consider sampling less than 39 samples, but how can that number be selected in a formal way? This is just my train of thought and I am seeking a formal decision-making method for comparing networks of different replicate size. Thank you for your time!
The text was updated successfully, but these errors were encountered:
Hi!
Loving NetCoMi !!
I have datasets of different sizes and would like to compare the network metrics of these various datasets. However, through my own troubleshooting I see that the replicate size has an effect on the network metrics and so it would be ideal to have the same replicate size when comparing networks.
My Question is, how do people handle different replicate sizes when constructing networks they intend to compare?
Here, I have four datasets, Resistant has 39 samples and the others have 150,170,200. So I ran this loop to sample 10 replicates up to 200 replicates in intervals of 10 to see if there was a saturation point where adding more samples no longer has an effect on network metrics. Observationally, anything over 50 really isnt adding much clarity. The red vertical line represents 39 samples. This is the same idea when looking at read depth in a rarefaction curve.
Should I just sample at the lowest replicate number among the datasets I plan to compare so each dataset randomly pulls 39 replicates? The thing is that the 39 samples that happen to be pulled from the larger datasets will be slightly different each time. If I wanted to capture that variation I could iteratively sample, say 1000 times, but then my Resistant dataset would have no variation because the max number of its samples are being sampled which is a bias. So then I should consider sampling less than 39 samples, but how can that number be selected in a formal way? This is just my train of thought and I am seeking a formal decision-making method for comparing networks of different replicate size. Thank you for your time!
The text was updated successfully, but these errors were encountered: