Compute "eigenSplines" across a dataset to discover the best df for spline fitting.
Steps:
UV Scale the data.
Turn each VAR in (IND x TIME) and group all VAR in (IND+VAR x TIME) using
get_eigen_spline_matrix
.Compute "eigen.splines" on the transposed table (TIME x IND+VAR).
Returns eigen$matrix = PCprojection x TIME and eigen$variance = variance explained for each PC.
Usage
get_eigen_spline(
inputData,
ind,
time,
nPC = NA,
scaling = "scaling_UV",
method = "nipals",
verbose = TRUE,
centering = TRUE,
ncores = 0
)
Arguments
- inputData
Matrix of measurements with observations as rows and variables as columns.
- ind
Vector of subject identifier (individual) corresponding to each measurement.
- time
Vector of time corresponding to each measurement.
- nPC
(int) Number of Principal Components to compute, if none given (
nPC=NA
) compute all PC (usually number TP-1 as there is 1PC less than the smallest dimension).- scaling
"scaling_UV"
or"scaling_mean"
scaling across all samples for each variable. Default"scaling_UV"
. Note: scaling takes place outside of the pcaMethods call, therefore$model
will indicate "Data was NOT scaled before running PCA".- method
PCA method
"svd"
doesn't accept missing value."nipals"
can handle missing values. Default"nipals"
.- verbose
If
TRUE
print the PCA summary. DefaultTRUE
.- centering
If
TRUE
centering for PCA, needed to remove baseline levels of each pc (often PC1). DefaultTRUE
.- ncores
(int) Number of cores to use for parallelisation of the grouping of all splines. Default 0 for no parallelisation.
Value
A list eigen: eigen$matrix
data.frame
of eigenSplines values with PCprojection as row and TIME as column. eigen$variance
Vector of variance explained for each PC. eigen$model
resulting pcaMethods model. eigen$countTP
Matrix of number of measurements for each unique timepoint (as row).
Comments:
CENTERING: Centering converts all the values to fluctuations around zero instead of around the mean of the variable measurements. Hereby, it adjusts for differences in the offset between high and low intensity variables. It is therefore used to focus on the fluctuating part of the data, and leaves only the relevant variation (being the variation between the observations) for analysis.
SCALING: Scaling methods are data pretreatment approaches that divide each variable by a factor -the scaling factor- which is different for each variable. They aim to adjust for the differences in fold differences between the various variables by converting the data into differences in values relative to the scaling factor. This often results in the inflation of small values, which can have an undesirable side effect as the influence of the measurement error -that is usually relatively large for small values- is increased as well.
UNIT VARIANCE SCALING: UV or Autoscaling, is commonly applied and uses the standard deviation as the scaling factor. After autoscaling, all variables have a standard deviation of one and therefore the data is analysed on the basis of correlations instead of covariances, as is the case with centering.
BEFORE PCA, centering must be applied on the matrix that will be submitted to PCA to remove "baseline" levels.
See also
Graphical implementation with santaR_start_GUI
Other DFsearch:
get_eigen_DF()
,
get_eigen_DFoverlay_list()
,
get_param_evolution()
,
plot_nbTP_histogram()
,
plot_param_evolution()
Examples
## 7 measurements, 3 subjects, 4 unique time-points, 2 variables
inputData <- matrix(c(1,2,3,4,5,6,7,8,9 ,10,11,12,13,14,15,16,17,18), ncol=2)
ind <- c('ind_1','ind_1','ind_1','ind_2','ind_2','ind_2','ind_3','ind_3','ind_3')
time <- c(0,5,10,0,10,15,5,10,15)
get_eigen_spline(inputData, ind, time, nPC=NA, scaling="scaling_UV", method="nipals",
verbose=TRUE, centering=TRUE, ncores=0)
#> nipals calculated PCA
#> Importance of component(s):
#> PC1 PC2 PC3
#> R2 0.7113 0.2190 0.05261
#> Cumulative R2 0.7113 0.9303 0.98287
#> total time: 0.01 secs
#> $matrix
#> 0 5 10 15
#> PC1 -1.7075707 -0.7066426 0.7075708 1.7066425
#> PC2 -0.3415271 0.9669724 1.0944005 -0.4297013
#> PC3 -0.1764657 -0.5129981 0.5110671 0.1987611
#>
#> $variance
#> [1] 0.71126702 0.21899068 0.05260949
#>
#> $model
#> nipals calculated PCA
#> Importance of component(s):
#> PC1 PC2 PC3
#> R2 0.7113 0.2190 0.05261
#> Cumulative R2 0.7113 0.9303 0.98287
#> 6 Variables
#> 4 Samples
#> 6 NAs ( 25 %)
#> 3 Calculated component(s)
#> Data was mean centered before running PCA
#> Data was NOT scaled before running PCA
#> Scores structure:
#> [1] 4 3
#> Loadings structure:
#> [1] 6 3
#>
#> $countTP
#> [,1]
#> 3 6
#>
# nipals calculated PCA
# Importance of component(s):
# PC1 PC2 PC3
# R2 0.7113 0.2190 0.05261
# Cumulative R2 0.7113 0.9303 0.98287
# total time: 0.12 secs
# $matrix
# 0 5 10 15
# PC1 -1.7075707 -0.7066426 0.7075708 1.7066425
# PC2 -0.3415271 0.9669724 1.0944005 -0.4297013
# PC3 -0.1764657 -0.5129981 0.5110671 0.1987611
#
# $variance
# [1] 0.71126702 0.21899068 0.05260949
#
# $model
# nipals calculated PCA
# Importance of component(s):
# PC1 PC2 PC3
# R2 0.7113 0.2190 0.05261
# Cumulative R2 0.7113 0.9303 0.98287
# 6 Variables
# 4 Samples
# 6 NAs ( 25 %)
# 3 Calculated component(s)
# Data was mean centered before running PCA
# Data was NOT scaled before running PCA
# Scores structure:
# [1] 4 3
# Loadings structure:
# [1] 6 3
#
# $countTP
# [,1]
# 3 6