Currently, HaDeX is limited to the cluster
file from
DynamX 3.0 or 2.0. Below we explain how the data file is processed.
Let’s investigate how the data processing of the cluster
file from DynamX is conducted on the example file from the package.
What is the difference between cluster
files from DynamX
3.0 and 2.0? The datafiles from DynamX 2.0 don’t include
Modification
and Fragment
columns. The
variations of the experiment were limited, while the DynamX 2.0 was the
latest version.
Let’s start with a glimpse of the datafile.
library(HaDeX)
dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))
head(dat, 10)
## Protein Start End Sequence Modification Fragment MaxUptake
## <char> <int> <int> <char> <lgcl> <lgcl> <num>
## 1: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 2: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 3: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 4: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 5: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 6: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 7: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 8: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 9: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## 10: db_CD160 1 15 INITSSASQEGTRLN NA NA 14
## MHP State Exposure File z RT Inten
## <num> <char> <num> <char> <int> <num> <num>
## 1: 1590.808 CD160 0.000 KD_160527_CD160_sekw_05 1 3.232479 6592
## 2: 1590.808 CD160 0.000 KD_160527_CD160_sekw_05 2 3.238079 394066
## 3: 1590.808 CD160 0.000 KD_160527_CD160_sekw_05 3 3.238759 173526
## 4: 1590.808 CD160 0.001 KD_160527_CD160_IN_01 2 3.258598 232221
## 5: 1590.808 CD160 0.001 KD_160527_CD160_IN_01 3 3.256844 110675
## 6: 1590.808 CD160 0.167 KD_160530_CD160_10s_01 2 3.262774 99894
## 7: 1590.808 CD160 0.167 KD_160530_CD160_10s_02 2 3.264558 117541
## 8: 1590.808 CD160 0.167 KD_160530_CD160_10s_03 2 3.263795 90562
## 9: 1590.808 CD160 0.167 KD_160530_CD160_10s_04 2 3.273656 66131
## 10: 1590.808 CD160 1.000 KD_160530_CD160_1min_01 2 3.262300 109301
## Center
## <num>
## 1: 1591.2584
## 2: 796.3552
## 3: 531.2633
## 4: 796.3634
## 5: 531.2849
## 6: 800.3610
## 7: 800.3852
## 8: 800.3682
## 9: 800.4242
## 10: 800.7878
As you can see, the data file has a very specific structure and is
not informative yet. In the file, we have m/z
values for
each z
value (charge) for each time point for each state of
each peptide, repeated as many times as measurement was repeated (each
measurement should be repeated at least three times).
Our aim is to have one result with an uncertainty of the measurement per each peptide in each biological state in each measured time point - data in this format allows further calculations, e.q. calculating deuterium uptake values.
For a better understanding of the process of aggregating the data,
see the Data aggregation
article.
Within each replicate of the measurement (we recognize each replicate
by the File
value), the m/z
values are
provided for each possible z
value. The m/z
values are in the Center
column, as it is a geometrical
centroid calculated from the isotopic envelope. Firstly, we have to
calculate the mass value, measured experimentally:
expMass = z × (Center − protonMass)
To aggregate data from different z
values, we have to
calculate the mean mass weighted by intensity. Additional information
about this step and how the weighted mean impacts the results can be
found in Mass calculation
article.
$$aggMass = \frac{1}{N}\sum_{k = 1}^{N}Inten_k{\cdot}pepMass_k\tag{2}$$.
Where:
As we use the aggregated result from the replicates, we need to calculate an uncertainty associated with the measurement. We use the mean value as the final result, so we need to calculate error as a standard deviation of the mean, according to the Equation 3:
$$u(\vec{x}) = \sqrt{\frac{ \sum_{i=1}^n \left( x_{i} - \overline{x} \right)^2}{n(n-1)}}\tag{3}$$.
Where:
Now we have the format we want for further calculations.
HaDeX package provides the calculated values in different forms. All of them are provided with associated uncertainty of the measurement. All of the uncertainties are derived from the formula - the Law of propagation of uncertainty:
$$u_{c}(y) = \sqrt{\sum_{k} \left[ \frac{\partial y}{\partial x_{k}} u(x_{k}) \right]^2}$$
Deuterium uptake is the increase of the mass of the peptide in time t. The minimal exchange control mt0 is mass measured directly after adding the buffer (before the start of the exchange), and mt is the mass measured in chosen time point t. The value is in Daltons [Da].
D = mt − mt0 The uncertainty associated with deuterium uptake [Da] (based on equation 3):
$$u_c(D) = \sqrt{u(m_t)^2 + u(m_{t_0})^2}$$
Fractional deuterium uptake is the ratio of the increase of the mass in time t to the maximal exchange control. The maximal exchange control mt100 is measured after a long time (chosen by the experimenter, usually 1440 min = 24 h). It is assumed that after this long time, the exchange is finished. The minimal exchange control mt0 is mass measured directly after adding the buffer (before the start of the exchange), and mt is the mass measured in chosen time point t. This value is a percentage value [%].
$$D_{frac} = \frac{m_{t} - m_{t_0}}{m_{t_{100}} - m_{t_0}}$$
The uncertainty associated with fractional deuterium uptake [%] (based on equation 3):
$$u_{c}(D_{frac}) = \sqrt{ \left[ \frac{1}{m_{t_{100}}-m_{t_0}} u(m_{t}) \right]^2 + \left[ \frac{m_{t} - m_{t_{100}}}{(m_{t_{100}}-m_{t_0})^2} u(m_{t_0}) \right]^2 + \left[ \frac{m_{t_0} - m_{t}}{(m_{t_{100}}-m_{t_0})^2} u (m_{t_{100}}) \right]^2}$$
Theoretical deuterium uptake is the increase of mass in time t compared with the theoretical value of the peptide mass without any exchange (MHP - a mass of the singly charged monoisotopic molecular ion), and mt is the mass measured in chosen time point t. This value is in daltons [Da]:
Dtheo = mt − MHP
The uncertainty associated with theoretical deuterium uptake [Da] (the MHP value is a constant without measurement uncertainty - based on the equation 3):
u(Dtheo) = u(mt)
Theoretical fractional deuterium uptake is the ratio of the increase of mass in time t compared with a theoretical value of the mass of the peptide without any exchange to the possible theoretical increase of the mass, based on the maximal potential uptake of the peptide (based on the peptide sequence). This value is a percentage value [%].
$$D_{theo, frac} = \frac{m_{t}-MHP}{MaxUptake \times protonMass}$$
The uncertainty associated with theoretical fractional deuterium uptake [%] (based on the equation 3):
$$u(D_{theo, frac}) = \left|\frac{1}{MaxUptake \times protonMass} u(D_{t}) \right|$$
Differential value is the way to see how the deuterium uptake differs between two biological states. It allows seeing if the possible difference is statistically important (more information below). This value is calculated as the difference between the previously described (in a chosen form) deuterium uptake of the first and second states.
diff = D1 − D2 The uncertainty associated with the difference of deuterium uptake (based on the equation 3):
$$u_{c}(diff) = \sqrt{u(D_{1})^2 + u(D_{2})^2}$$
The convenient way to present results calculated as described is the comparison plot and differential plot (Woods’ plot).
If the file contains modified peptides - the value from column
Modification
is added to the value from column
State
and is treated as a new biological state. The further
aspects of the analysis are the same as for non-modified peptides.