Automate all steps of santaR fitting, Confidence bands estimation and p-values calculation for one or multiple variables
Source:R/santaR_auto_fit.R
santaR_auto_fit.Rd
santaR_auto_fit
encompasses all the analytical steps for the detection of significantly altered time trajectories (input data preparation: get_ind_time_matrix
, establishing group membership: get_grouping
, spline modelling of individual and group time evolutions: santaR_fit
, computation of group mean curve confidence bands: santaR_CBand
, identification of significantly altered time trajectories: santaR_pvalue_dist
and/or santaR_pvalue_fit
). As santaR is an univariate approach, multiple variables can be processed independently, which santaR_auto_fit
can execute in parallel over multiple CPU cores.
Usage
santaR_auto_fit(
inputData,
ind,
time,
group = NA,
df,
ncores = 0,
CBand = TRUE,
pval.dist = TRUE,
pval.fit = FALSE,
nBoot = 1000,
alpha = 0.05,
nPerm = 1000,
nStep = 5000,
alphaPval = 0.05,
forceParIndTimeMat = FALSE
)
Arguments
- inputData
data.frame
of measurements with observations as rows and variables as columns.- ind
Vector of subject identifier (individual) corresponding to each measurement.
- time
Vector of the time corresponding to each measurement.
- group
NA or vector of group membership for each measurement. Default is NA for no groups.
- df
(float) Degree of freedom to employ for fitting the individual and group mean
smooth.spline
.- ncores
(int) Number of cores to use for parallelisation. Default 0 for no parallelisation.
- CBand
If TRUE calculate confidence bands for group mean curves. Default is TRUE.
- pval.dist
If TRUE calculate p-value based on inter-group mean curve distance. Default is TRUE.
- pval.fit
If TRUE calculate p-value based on group mean curve improvement in fit. Default is FALSE.
- nBoot
(int) Number of bootstrapping rounds for confidence band calculation. Default 1000.
- alpha
(float) Confidence (0.05 for 95% Confidence Bands). Default 0.05.
- nPerm
(int) Number of permutations for p-value calculation. Default 1000.
- nStep
(int) Number of steps (granularity) employed for the calculation of the area between group mean curves (p-value dist). Default is 5000.
- alphaPval
(float) Confidence Interval on the permuted p-value (0.05 for 95% Confidence Interval). Default 0.05.
- forceParIndTimeMat
If TRUE parallelise the preparation of input data by
get_ind_time_matrix
. Default is FALSE.
Details
Note
The calculation of confidence bands accounts for approximately a third of the time taken by
santaR_auto_fit
, while the identification of significantly altered time trajectories (eithersantaR_pvalue_dist
orsantaR_pvalue_fit
) accounts for two third of the total time. The time taken by these steps increases linearly with the increase of their respective parameters:nBoot
for confidence bands,nPerm
andnStep
for identification of significantly altered trajectories usingsantaR_pvalue_dist
,nPerm
forsantaR_pvalue_fit
. Default values of these parameters are optimised to balance the time taken with the precision of the value estimation; increasingnPerm
can tighten the p-value confidence intervals.If the parallelisation is activated (
ncores>0
), the fit of spline models, the calculation of confidence bands on the group mean curves and the identification of altered trajectories are executed for multiple variables simultaneously. However the preparation of input data (get_ind_time_matrix
) is not parallelised by default as the parallelisation overhead cost is superior to the time potentially gained for all but the most complex datasets. The parallelisation overhead (instantiating worker nodes, duplicating and transferring inputs to the worker nodes, concatenating results) typically equals around 2 seconds, while executingget_ind_time_matrix
is usually a matter of millisecond for a single variable (ex: 7 time-points, 24 individuals, 1 variable); the parallelisation overhead far exceeding the time needed to process all variables sequentially. If the number of individual trajectories (subjects), of time-points, or of variables is very large,forceParIndTimeMat
enables the parallelisation ofget_ind_time_matrix
.
See also
Other AutoProcess:
santaR_auto_summary()
,
santaR_plot()
,
santaR_start_GUI()
Other Analysis:
get_grouping()
,
get_ind_time_matrix()
,
santaR_CBand()
,
santaR_auto_summary()
,
santaR_fit()
,
santaR_plot()
,
santaR_pvalue_dist()
,
santaR_pvalue_fit()
,
santaR_start_GUI()
Examples
## 2 variables, 56 measurements, 8 subjects, 7 unique time-points
## Default parameter values decreased to ensure an execution < 2 seconds
inputData <- acuteInflammation$data[,1:2]
ind <- acuteInflammation$meta$ind
time <- acuteInflammation$meta$time
group <- acuteInflammation$meta$group
SANTAObjList <- santaR_auto_fit(inputData, ind, time, group, df=5, ncores=0, CBand=TRUE,
pval.dist=TRUE, nBoot=100, nPerm=100)
#> Input data generated: 0.01 secs
#> Spline fitted: 0.04 secs
#> ConfBands done: 0.5 secs
#> p-val dist done: 0.71 secs
#> total time: 1.25 secs
# Input data generated: 0.02 secs
# Spline fitted: 0.03 secs
# ConfBands done: 0.53 secs
# p-val dist done: 0.79 secs
# total time: 1.37 secs
length(SANTAObjList)
#> [1] 2
# [1] 2
names(SANTAObjList)
#> [1] "var_1" "var_2"
# [1] "var_1" "var_2"