santaR Graphical User Interface

Arnaud Wolfer

2019-10-03

The santaR package is designed for the detection of significantly altered time trajectories between study groups, in short time-series. The graphical user interface implements all of santaR’s functions.

The GUI is to be prefered to understand the methodology, select the best parameters on a subset of the data before running the command line, or to visually explore results.

This vignette will: * Detail the step-by-step use of the graphical user interface using an example dataset

Example Data

This vignette employ the .csv and .RData files generated from acuteInflammation in the vignette how to prepare input data for santaR.

Getting Started

The graphical user interface is started as follow:

library(santaR)

santaR_start_GUI(browser = TRUE)
#  To exit press ESC in the command line

The graphical interface is divided in 4 main tabs, Import, DF search, Analysis and Export.

Import

The first input format is a .csv file containing as rows the observations (samples) and as columns the variables as well as metadata.

Columns corresponding to metadata are selected: the metadata describe the individual ID and collection time corresponding to each observation, with optionally class information for identification of inter-class differential trajectories.

Additionally, data previously imported as well as fitting results (in .RData format) can be loaded for further analysis or plotting (see the Export section for more details).

DF Search (optional)

Note:

The single parameter to be set by the user is the number of degrees of freedom df to fit the spline model. The \(df\) controls how closely the curve models the input data-points.

Once the \(df\) is chosen for a dataset (a given number of time-points and missing values), it can be kept constant whichever the question to investigate (the metadata and group comparison).

Some indications based on simulated data and diverse datasets can guide the selection of \(df\):

It does not seem to be possible to automatically select the degree of freedom. A choice based on visualisation of the splines while being careful with over-fitting, keeping in mind the “expected” evolution of the underlying process seems the most reasonable approach.

Even if automated approaches cannot reliably select a number of degree of freedom to employ, DF search implements some of these approaches and multiple tools to help guide optimal \(df\) selection.

Auto-Fit uses principal component analysis (PCA) to extract latent trajectories and generate eigen-trajectories that are subsequently assessed for optimal \(df\) using various goodness-of-fit metrics.

Parameter evolution plots the evolution of these metrics across the range of possible \(df\) for each latent trajectory.

To select the most suitable \(df\) parameter, Plot fit generates a visualisation of the fit on each latent projection at automatically and manually selected \(df\) values.

Finally Missing value highlight the number of trajectories that would have to be excluded as they contain less time-points than the \(df\) selected.

Analysis

With the data imported and a pertinent \(df\) value selected, Analysis regroups the fitting, visualisation and identification of variables significantly altered between groups.

Fit handles parameter selection as well as downstream computation. Calculation of inter-group differential evolutions can be performed with either initial class information or an advance option generated new grouping (e.g., including / combining / excluding input groups). The user can control the number of permutations and bootstrap rounds for significance and group mean curve confidence band calculation. The sub-sampling or the area between group mean curves can be altered to favour calculation speed at the expense of numerical precision. Parallelisation enables the selection of the number of CPU cores to employ for computation. View Input presents the dataset as fitted.

Plot enables the interactive visualisation of the raw data points, individual trajectories, group mean curves and confidence bands for all variables, which subsequently can be saved as an image figure to disk.

If inter-group differential evolution has been characterised, P-value summarise in tables all significance testing - providing multiple options for false discovery correction (e.g., Benjamini-Hochberg, Benjamini-Yekutieli and Bonferroni) as well as confidence intervals on the \(p\)-values.