Data formatting and outputs

Data and annotation formatting

Kinbiont can operate directly on data files or within a Julia notebook. The format of a single time series must be a 2 x n_time_points matrix of Float64:

 0.0        2.0       4.0       6.0       8.0        10.0       10.0       12.0       14.0       16.0       18.0       20.0       22.0      24.0      26.0       28.0       30.0       32.0       34.0       36.0       …  
 0.0912154  0.107956  0.105468  0.101727  0.0931484   0.106318   0.103697   0.139821   0.173598   0.204888   0.251052   0.289018   0.31298   0.33752   0.359356   0.370861   0.376347   0.383732   0.398496   0.384511 …

The first row should represent the time, and the second row should represent the quantity to be fitted (e.g., optical density or CFU).

If the user calls APIs that require a .csv input, they must provide Kinbiont.jl with the paths to the .csv data file and the optional .csv annotation file. In these cases, Kinbiont expects a data matrix where the first row contains the names of the wells, and the other columns contain the numerical values of the measurements:

Time,  A1,     A2,      A3, 
0.0,   0.09,   0.09,    0.087,
1.0,   0.08,   0.011,   0.012,
2.0,   0.011,  0.18,    0.1,
3.0,   0.012,  0.32,    0.22,
4.0,   0.008,  0.41,    0.122,

Kinbiont.jl expects a comma (,) as the separator between columns and the first column will be used as time.

The annotation file is optional (but mandatory if blank subtraction and averaging of replicates are required) and should be a two-column .csv file where the number of rows corresponds to the number of wells. The first column should contain the name of the well (they should match the names of the wells in the data .csv), while the second column should contain a unique ID for each technical replicate. A b indicates that the well should be considered as a blank, an X indicates that the well should be discarded from the analysis, and if two wells have the same ID, they will be considered replicates:

A1, b
A2, X
A3, unique_ID

To provide a calibration curve of optical density (OD), that maps OD values obtained from a microplate reader to corresponding values obtained from an independent source, the file should be provided to KinBiont as a CSV file containing two columns:

Raw_OD: Optical density values measured using a microplate reader.
Real_OD: Optical density values measured using an independent source.

Raw_OD,Real_OD
1.9617333333333333,4.19666666666
1.57826666,2.813333333
1.1751333333,1.68333333333
0.87273,1.005
0.66826666666,0.74533333333
0.45426666666,0.492
0.2812,0.283
0.09426666,0.097
0.04726666,0.04933333333333334
0.0238,0.024
0.0,0.0

See the folder data_examples for examples.

Data and annotation formatting for downstream ML

All ML functions of Kinbiont take as input a matrix of results (i.e., the outputs of fits Kinbiont_results) and a matrix of features (e.g., the concentration of antibiotics present in all wells feature_matrix). For example:

downstream_decision_tree_regression(Kinbiont_results, 
  feature_matrix,
  row_to_learn;
)

downstream_symbolic_regression(Kinbiont_results,
  feature_matrix,
  row_to_learn;
)

The first matrix is the standard output of any of the Kinbiont fits functions. In this case, each row represents a parameter and each column a growth curve.

For example:

label_exp,       exp_2_no_corrections,    exp_2_no_corrections
well,            A1,                      A2
model,           HPM,                     HPM
gr,              0.008875566468779583,    0.010090369128600398
exit_lag_rate,   1.7249775833759684e-6,   1.4012949810152472e-6
N_max,           2.498999749717784,       1.6986904454789507
th_max_gr,       0.005778245042794245,    0.00599548534261212
emp_max_gr,      0.007951369027199616,    0.008096305651156249
loss,            0.0013005418069932683,   0.0013349159149782007

Note the first row is dedicated to the label of the experiment and will not be used by the functions. It is necessary that the second row (i.e., well) reports a unique ID for each curve. The functions will ask which is the target row of the regression (i.e., row_to_learn); please do not use the first row. The first column is dedicated to the names of the columns and will be discarded from the ML analysis.

Instead, the feature matrix specifies the conditions associated with each unique ID of the previous matrix. Only the wells where there is a macth between the first column of the feature matrix and the second row of the fitting results will be used. For example, suppose you have two different antibiotics each with two different concentrations, then the matrix could be:

ID_exp,   abx_1,   abx_2
A1,       0,       1,
A2,       2,       0,
A3,       1,       1,
A4,       1,       0,
A5,       0,       2,

Note that it is necessary to add one column for each new chemical/condition added to the experiment (even if in a specific well it is absent). The first row will not be used and is dedicated to the feature names.

See the folder data_examples for examples.

Outputs of Kinbiont

Kinbiont has different data struct as output

Kinbiont_res_one_well_log_lin

This structure stores results for a single well using a log-linear method.

method:String - The method used.
params:Vector{Any} - Parameters obtained from the fitting process.
fit:Any - The fitted function in the exponential window.
times:Any - The times at which measurements were taken.
confidence_band:Any - The confidence band of the fit.

Kinbiont_res_one_well

This structure stores results for a single well.

method:String - The method used.
params:Vector{Any} - Parameters obtained from the fitting process.
fit:Any - The fitted function.
times:Any - The times at which measurements were taken.

Kinbiont_res_bootstrap_NL

This structure stores results of the bootstrap fitting of a NL function.

method:String - The method used.
params:Matrix{Any} - Parameters obtained from the fitting process.
fit:Any - The fitted function.
times:Any - The times at which measurements were taken.
fin_param:Any - The parameters of each bootstrap fit.
new_param_fin:Any - The parameters of each bootstrap fit after considering only the best $95/%$ of the losses.
mean_param:Any - Mean of the parameters.
sd_param:Any - Standard deviation of the parameters.

Kinbiont_res_model_selection

This structure stores model selection results.

method:String - The method used.
params:Vector{Any} - Parameters obtained from the fitting process.
fit:Vector{Float64} - The fit result.
times:Vector{Float64} - The times at which measurements were taken.
rss_array:Any - Residual sum of squares array.
min_rss_array:Any - Minimum residual sum of squares array.
param_min:Any - Parameters corresponding to minimum RSS.
min_AIC:Vector{Any} - Minimum AIC values.
selected_model:String - The selected model.
full_param:Vector{Any} - Full parameter set.

Kinbiont_res_NL_model_selection

This structure stores non-linear model selection results.

method:String - The method used.
params:Vector{Any} - Parameters obtained from the fitting process.
fit:Vector{Float64} - The fit result.
times:Vector{Float64} - The times at which measurements were taken.
score_res:Any - Score results.
top_loss:Any - Top loss values.

Kinbiont_res_sensitivity_NL

This structure stores sensitivity analysis results using non-linear methods.

method:String - The method used.
params:Matrix{Any} - Parameters obtained from the fitting process.
fit:Any - The fit result.
times:Any - The times at which measurements were taken.
combinations:Matrix{Any} - The list of each of the starting hyperparameters used in sensitivity analysis.

Kinbiont_res_sensitivity

This structure stores sensitivity analysis results.

method:String - The method used.
params:Matrix{Any} - Parameters obtained from the fitting process.
combinations:Matrix{Any} - The list of each of the starting hyperparameters used in sensitivity analysis.

Kinbiont_res_segmentation_ODE

This structure stores segmentation results using ODE methods.

method:String - The method used.
params:Matrix{Any} - Parameters obtained from the fitting process.
fit:Array{Float64} - The fitted functions.
times:Array{Float64} - The times at which measurements were taken.
interval_cdp:Array{Any} - Change point intervals.
score_of_the_models:Any - Scores of the models.

Kinbiont_res_segmentation_NL

This structure stores segmentation results using non-linear methods.

method:String - The method used.
params:Matrix{Any} - Parameters obtained from the fitting process.
fit:Array{Float64} - The fitted functions.
times:Array{Float64} - The times at which measurements were taken.
interval_cdp:Array{Any} - Change point intervals.

Kinbiont_res_Log_Lin_files

This structure stores results for log-linear fits across multiple curves in one file.

method:String - The method used.
params:Matrix{Any} - The matrix with the parameters obtained from the fitting process of each curve of the file.
fits:Tuple{Any} - The fitted functions in each exponential window.
data:Tuple{Any} - The data used for fitting.
confidence_bands:Tuple{Any} - Confidence bands for the fits.

Kinbiont_res_one_file

This structure stores results of the fit for all curves in a single file.

method:String - The method used.
params:Matrix{Any} - The matrix of the parameters obtained from the fitting process.
fits:Tuple{Any} - The fitted functions for each well.
data:Tuple{Any} - The data used for fitting.

Kinbiont_res_one_file_segmentation

This structure stores segmentation results for all curves in a single file.

method:String - The method used.
params:Matrix{Any} - The matrix of the parameters obtained from the fitting process.
fits:Tuple{Any} - The fitted functions for each well.
data:Tuple{Any} - The data used for fitting.
cp:Tuple{Any} - Change points detected.
vector_AIC:Any - AIC (or AICc) values of the best model for each well.