Compute eigenSplines across a dataset

Compute "eigenSplines" across a dataset to discover the best df for spline fitting.

Steps:

UV Scale the data.
Turn each VAR in (IND x TIME) and group all VAR in (IND+VAR x TIME) using get_eigen_spline_matrix.
Compute "eigen.splines" on the transposed table (TIME x IND+VAR).
Returns eigen$matrix = PCprojection x TIME and eigen$variance = variance explained for each PC.

Usage

get_eigen_spline(
  inputData,
  ind,
  time,
  nPC = NA,
  scaling = "scaling_UV",
  method = "nipals",
  verbose = TRUE,
  centering = TRUE,
  ncores = 0
)

Arguments

inputData: Matrix of measurements with observations as rows and variables as columns.
ind: Vector of subject identifier (individual) corresponding to each measurement.
time: Vector of time corresponding to each measurement.
nPC: (int) Number of Principal Components to compute, if none given (nPC=NA) compute all PC (usually number TP-1 as there is 1PC less than the smallest dimension).
scaling: "scaling_UV" or "scaling_mean" scaling across all samples for each variable. Default "scaling_UV". Note: scaling takes place outside of the pcaMethods call, therefore $model will indicate "Data was NOT scaled before running PCA".
method: PCA method "svd" doesn't accept missing value. "nipals" can handle missing values. Default "nipals".
verbose: If TRUE print the PCA summary. Default TRUE.
centering: If TRUE centering for PCA, needed to remove baseline levels of each pc (often PC1). Default TRUE.
ncores: (int) Number of cores to use for parallelisation of the grouping of all splines. Default 0 for no parallelisation.

Value

A list eigen: eigen$matrix

data.frame of eigenSplines values with PCprojection as row and TIME as column. eigen$variance Vector of variance explained for each PC. eigen$model resulting pcaMethods model. eigen$countTP Matrix of number of measurements for each unique timepoint (as row).

Comments:

CENTERING: Centering converts all the values to fluctuations around zero instead of around the mean of the variable measurements. Hereby, it adjusts for differences in the offset between high and low intensity variables. It is therefore used to focus on the fluctuating part of the data, and leaves only the relevant variation (being the variation between the observations) for analysis.
SCALING: Scaling methods are data pretreatment approaches that divide each variable by a factor -the scaling factor- which is different for each variable. They aim to adjust for the differences in fold differences between the various variables by converting the data into differences in values relative to the scaling factor. This often results in the inflation of small values, which can have an undesirable side effect as the influence of the measurement error -that is usually relatively large for small values- is increased as well.
UNIT VARIANCE SCALING: UV or Autoscaling, is commonly applied and uses the standard deviation as the scaling factor. After autoscaling, all variables have a standard deviation of one and therefore the data is analysed on the basis of correlations instead of covariances, as is the case with centering.
BEFORE PCA, centering must be applied on the matrix that will be submitted to PCA to remove "baseline" levels.

Examples

## 7 measurements, 3 subjects, 4 unique time-points, 2 variables
inputData <- matrix(c(1,2,3,4,5,6,7,8,9 ,10,11,12,13,14,15,16,17,18), ncol=2)
ind  <- c('ind_1','ind_1','ind_1','ind_2','ind_2','ind_2','ind_3','ind_3','ind_3')
time <- c(0,5,10,0,10,15,5,10,15)
get_eigen_spline(inputData, ind, time, nPC=NA, scaling="scaling_UV", method="nipals",
                 verbose=TRUE, centering=TRUE, ncores=0)
#> nipals calculated PCA
#> Importance of component(s):
#>                  PC1    PC2     PC3
#> R2            0.7113 0.2190 0.05261
#> Cumulative R2 0.7113 0.9303 0.98287
#> total time: 0.01 secs
#> $matrix
#>              0          5        10         15
#> PC1 -1.7075707 -0.7066426 0.7075708  1.7066425
#> PC2 -0.3415271  0.9669724 1.0944005 -0.4297013
#> PC3 -0.1764657 -0.5129981 0.5110671  0.1987611
#> 
#> $variance
#> [1] 0.71126702 0.21899068 0.05260949
#> 
#> $model
#> nipals calculated PCA
#> Importance of component(s):
#>                  PC1    PC2     PC3
#> R2            0.7113 0.2190 0.05261
#> Cumulative R2 0.7113 0.9303 0.98287
#> 6 	Variables
#> 4 	Samples
#> 6 	NAs ( 25 %)
#> 3 	Calculated component(s)
#> Data was mean centered before running PCA 
#> Data was NOT scaled before running PCA 
#> Scores structure:
#> [1] 4 3
#> Loadings structure:
#> [1] 6 3
#> 
#> $countTP
#>   [,1]
#> 3    6
#> 
# nipals calculated PCA
# Importance of component(s):
#                  PC1    PC2     PC3
# R2            0.7113 0.2190 0.05261
# Cumulative R2 0.7113 0.9303 0.98287
# total time: 0.12 secs
# $matrix
#              0          5        10         15
# PC1 -1.7075707 -0.7066426 0.7075708  1.7066425
# PC2 -0.3415271  0.9669724 1.0944005 -0.4297013
# PC3 -0.1764657 -0.5129981 0.5110671  0.1987611
# 
# $variance
# [1] 0.71126702 0.21899068 0.05260949
# 
# $model
# nipals calculated PCA
# Importance of component(s):
#                  PC1    PC2     PC3
# R2            0.7113 0.2190 0.05261
# Cumulative R2 0.7113 0.9303 0.98287
# 6   Variables
# 4   Samples
# 6   NAs ( 25 %)
# 3   Calculated component(s)
# Data was mean centered before running PCA 
# Data was NOT scaled before running PCA 
# Scores structure:
# [1] 4 3
# Loadings structure:
# [1] 6 3
# 
# $countTP
#   [,1]
# 3    6