Pre-processing datasets — pre

The pre_process() function aids in processing data inputs and automatically establishes a standardized format for future use. It allows for two types of data input: a list of datasets from different sources or a long dataset containing a specified last column type.

Usage

pre_process(
  data,
  typenameList = NULL,
  replaceNA = TRUE,
  scale = TRUE,
  autoColName = "Sec_"
)

Arguments

data: A data.frame to describe each feature in one row. The data should contain variables ID value on time_1, ..., value on time_k, and type for extracting patterns across the time. Note that the initial and last column must be exactly ID and type. If multiple data.frame with the above format needs to be analyzed, you could also put a list of data.frame into this argument. In this case, variable type is not required and will be generated by the next argument typenameList.
typenameList: A vector of strings. This parameter is used to clarify the source or names for each data.frame, and is only applicable when the input of data is a list of data.frame. By default, it will be set as "Dataset_1", "Dataset_2", ..., etc.
scale: Logical; if scale is TRUE (default), standardize the data.frame by row with base::scale. This converts each original value into a z-score. See also scale_by_row__().
autoColName: A string; if autoColName is not-NULL (default), it will automatically set uniform column names for all the data.frames. This parameter is only applicable when the input of data is a list of data.frame.
replaceNa: Logical; if replaceNa is TRUE (default), replace NA with 0.

Value

The function returns a long data.frame with columns ID, value on time_1, ..., value on time_k, and type.

Details

We consider two distinct scenarios for this application:

In one scenario, individuals collect several datasets from various aspects and instruments for the same objects. For example, they might be separately detecting lipids, metabolites, and peptides from a specific soil sample.
In the other scenario, all the data is of uniform quality, but it can be categorized into larger groups that exhibit significant differences. In both of these cases, the pre_process() function serves as a valuable and versatile tool. Yet, this function is optional when generating the dashboard. Users can perform their own processing as long as the format matches the required output. However, they should be mindful that the number of samples (timepoints) must be greater than 5 to avoid potential errors in the subsequent prediction section.

Examples

data(test_data)
head(test_data, 10)
#>    ID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10   type
#> 1   1  1  0  0  1  1  0  0  1  6   6 type_A
#> 2   2  6  0  0  0  0  3  1  0  2   1 type_A
#> 3   3  1  0  0  0  2  0  0  2  2   1 type_A
#> 4   4  4  5  3  3  7  2  1  1  0   0 type_A
#> 5   5  4  3 NA  2  5  5  0  0  0   0 type_A
#> 6   6  4  1  0  1  3  1  3  5 11  14 type_A
#> 7   7  1  0  0  0  1  3  3  1  1   1 type_A
#> 8   8  4  2  1  1  1  1  0  0  0   0 type_A
#> 9   9  1  1  1 19 22  1  2  1  1   2 type_A
#> 10 10  1  1  3  5  8  2  2  2  5   2 type_A
a <- pre_process(test_data)
head(a, 10)
#>    ID          T1         T2          T3          T4          T5         T6
#> 1   1 -0.25354628 -0.6761234 -0.67612340 -0.25354628 -0.25354628 -0.6761234
#> 2   2  2.41458180 -0.6678631 -0.66786305 -0.66786305 -0.66786305  0.8733594
#> 3   3  0.21764288 -0.8705715 -0.87057150 -0.87057150  1.30585725 -0.8705715
#> 4   4  0.61658123  1.0569964  0.17616607  0.17616607  1.93782672 -0.2642491
#> 5   5  0.96186009  0.5038315 -0.87025436  0.04580286  1.41988870  1.4198887
#> 6   6 -0.06459959 -0.7105955 -0.92592741 -0.71059546 -0.27993154 -0.7105955
#> 7   7 -0.09086738 -0.9995412 -0.99954118 -0.99954118 -0.09086738  1.7264802
#> 8   8  2.40535118  0.8017837  0.00000000  0.00000000  0.00000000  0.0000000
#> 9   9 -0.50260633 -0.5026063 -0.50260633  1.70395805  2.07171878 -0.5026063
#> 10 10 -0.94019379 -0.9401938 -0.04477113  0.85065153  2.19378551 -0.4924825
#>            T7          T8          T9         T10   type
#> 1  -0.6761234 -0.25354628  1.85933936  1.85933936 type_A
#> 2  -0.1541222 -0.66786305  0.35961857 -0.15412224 type_A
#> 3  -0.8705715  1.30585725  1.30585725  0.21764288 type_A
#> 4  -0.7046643 -0.70466426 -1.14507943 -1.14507943 type_A
#> 5  -0.8702544 -0.87025436 -0.87025436 -0.87025436 type_A
#> 6  -0.2799315  0.15073237  1.44272411  2.08871998 type_A
#> 7   1.7264802 -0.09086738 -0.09086738 -0.09086738 type_A
#> 8  -0.8017837 -0.80178373 -0.80178373 -0.80178373 type_A
#> 9  -0.3800194 -0.50260633 -0.50260633 -0.38001942 type_A
#> 10 -0.4924825 -0.49248246  0.85065153 -0.49248246 type_A