Algorithm
The initial purpose of the tool is to analyse isomorphous difference density map (DED). DED maps are generated from
difference structure factor amplitudes, obtained by subtracting reference structure factor amplitudes of
reference DBSCAN
algorithm:
the maximum distance between two points that are considered to be part of the same cluster (eps) and the minimum set of
points to form a cluster or dense region (minPts). In our implementation, we set the eps to 0.73 Å, which is
the covalent radius of carbon,
Minimum points (minPts) are set to 5 voxels that can be grouped together.
The workflow of DBSCAN
implemented in our tool is as given below. A clearer expanded version can be found here.
Workflow 1 of DBSCAN algorithm.
The main workflow iterates over each point to check if it has been visited.
If not visited, it marks the point as visited, finds its neighbors, and
determines whether to start a new cluster or mark the point as noise.
If starting a new cluster, it expands the cluster by checking neighbors.
Workflow 2 of DBSCAN algorithm for Expanding cluster.
This workflow expands the cluster by recursively checking neighbors.
If a neighbor has enough points around it, it gets added to the cluster,
and the process repeats until no more neighbors can be added.
The grid points within the protein are generally selected as the mask regions for cluster identification. After clustering, the sum of difference electron density values is calculated for each cluster by summing the value of the voxels within the cluster if the values are above the σ value (±3.0 σ as the default value) in the difference electron density map. Then, an atom that is closest to and within a 2 Å radius of the maximum peak within the cluster is identified. This atom, which is associated with the cluster, is further used to examine the density changes in individual structural intermediates. In addition, for each of the identified clustered regions, hereafter called electron density blobs, specific features such as their volumes and their electron sums are calculated. The blob volume is evaluated by providing the coordinate of the obtained cluster in “maptbx.peak_volume_estimate” function implemented in cctbx tool. Additionally, to visualize changes in difference electron density consistently throughout the time series data, we normalized the values to set their range between 0 and 1, using the minimum and maximum values within the time series data.
In a previous study, Wickstrand et al. suggested an approach to associate atoms with different electron densities implemented in Maptool, which portrays the three-dimensional changes in electron density as a one-dimensional representation plot associated with each residue. Recently, Dasgupta published the same methodology as Maptool for estimating the average amplitude using the cctbx module. Our tool has also additionally implemented this method, where we estimate the average difference density within a 2 Å radius of each atom for all residues in the protein. However, this approach may not provide information about the distribution surrounding the atoms because it is bound by a radius cutoff. The clustering approach was used to quantify the electron density features and delineate the residual contributions associated with difference densities that were not directly on top of the atoms.