Feature#
- class nearl.features.Feature(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
objectBase class for the feature generator
Notes
The input and the output of the query, run and dump function should be chained together.
- Attributes:
dimsnp.ndarrayDimensions of the 3D grid
spacingfloatThe spacing (resolution) of the 3D grid
lengthsnp.ndarrayLengths of the grid.
- cutofffloat
The cutoff distance for voxelization or marching observers
paddingfloatThe padding of the box when query the atoms within the grid.
- byresbool
The boolean flag to get the residues within the bounding box (default is by atoms)
- outfilestr, path-like
The path of HDF file to store the result features
- outkeystr
To which key/tag the result features will be dumped
- sigmafloat
The sigma of the Gaussian-based voxelization, applies to static and PDF features
Methods
hook(featurizer)
Hook the feature generator back to the feature convolutor and obtain necessary attributes from the featurizer
cache(trajectory)
Cache the needed weights for each atom in the trajectory for further Feature.query function
query(topology, frame_coords, focal_point)
Query the atoms and weights within the bounding box
run(coords, weights)
Voxelize the coordinates and weights by default
dump(result)
Dump the result feature to an HDF5 file, Feature.outfile and Feature.outkey should be set in its child class (Either via __init__ or hook function)
- property params#
Obtain the parameters of the feature object
- property dims#
Dimensions of the 3D grid
- property spacing#
The spacing (resolution) of the 3D grid
- property center#
Center of the grid, read-only property calculated by dims and spacing
- property lengths#
Lengths of the grid.
- property padding#
The padding of the box when query the atoms within the grid. The default padding is set to be the cutoff distance.
- property frame_offset#
The offset of the frame to be used in each frame-slice. This property only applies for static features.
- hook(featurizer)[source]#
Hook the feature generator back to the feature convolutor and obtain necessary attributes from the featurizer including the trajectory, active frame, convolution kernel etc
- Parameters:
- featurizernearl.featurizer.Featurizer
The featurizer object describing the feature generation process
Notes
If the following attributes are not set manually, hook function will try to inherit them from the featurizer object: sigma, cutoff, outfile, outkey, padding, byres
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.AtomicNumber(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with their atomic number.
- query(topology, frame_coords, focal_point)[source]#
Query the atomic coordinates and the atomic numbers as weights within the bounding box
Notes
By default, a slice of frame coordinates is passed to the querier function (typically 3 dimension shaped by [frames_number, atom_number, 3]) However, for static feature, only one frame is needed. Hence, the querier function by default uses the first frame, Set the frame_offset to change this behavior.
The focused part of the coords needs to be translated to the center of the box before sending to the runner function.
- class nearl.features.Mass(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with their atomic mass.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Get the atoms and weights within the bounding box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- coord_inboxnp.ndarray
The coordinates of the atoms within the bounding box
- weightsnp.ndarray
The weights of the atoms within the bounding box
Notes
If a multiple frames are put to static feature, the frame_coords will take the first frame.
Run the query method from the parent class to get the mask of the atoms within the bounding box.
Before sending the coordinates to the runner function, move the coordinates to the center of the box.
- class nearl.features.HeavyAtom(default_weight=1, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms as heavy atom or not (heavy atoms are encoded 1, otherwise 0).
- Parameters:
- default_weightint, default 1
The default weight of the heavy atoms.
- class nearl.features.Aromaticity(reverse=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms as aromatic atom or not (aromatic atoms are encoded 1, otherwise 0).
- Parameters:
- reversebool, default False
Set to True if you want to get the non-aromatic atoms to be 1; The aromaticity is calculated by OpenBabel.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Ring(reverse=False, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms as ring atom or not (ring atoms are encoded 1, otherwise 0).
- Parameters:
- reversebool, default False
Set to True if you want to get the non-ring atoms to be 1; The ring status is calculated by OpenBabel.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Selection(default_value=1, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the selection of the user (selected atoms are encoded 1, otherwise 0).
- Parameters:
- default_valueint, default 1
The default value of the weights for the selected atoms.
Notes
The argument selection is required to be set in the constructor.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.HBondDonor(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the hydrogen bond donor atoms (donor atoms are encoded 1, otherwise 0).
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.HBondAcceptor(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the hydrogen bond acceptor atoms (acceptor atoms are encoded 1, otherwise 0).
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Hybridization(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the hybridization of the atoms (integer range from 0 to 3).
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Backbone(reverse=False, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the backbone atoms (backbone atoms are encoded 1, otherwise 0).
- Parameters:
- reversebool, default False
Set to True if you want to get the non-backbone atoms (sidechain) to be 1
Notes
Backbone atoms are defined as the atoms with the name “C”, “O”, “CA”, “HA”, “N” and “HN”. The reverse parameter can be set to True to get the non-backbone atoms (sidechain) to be 1.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.AtomType(focus_element, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms as the selected atom type (focus_element) or not (selected atoms are encoded 1, otherwise 0).
- Parameters:
- focus_elementint
The atomic number of the atom type to be selected.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.PartialCharge(charge_type='topology', charge_parm=None, keep_sign='both', **kwargs)[source]#
Bases:
FeatureAnnotate each atom with their partial charge. The charges could be derived from multiple sources, including its own topology, manually set, external functions, or recomputed using ChargeFW2.
- Parameters:
- charge_typestr, “topology” by default
The supported charge types can be “topology”, “manual”, “chargefw” and “external”.
- charge_parmstr
The charge parameter. If “topology”, Nearl will refer to the charge values in the topology and this parameter will be ignored. In case of “manual”, the charge_parm should be a dictionary with its trajectory identity as the key and the charge values as the value. The “external” type allows the user to pass the reference to an external function to calculate the charge values. In case of “chargefw”, the charge_parm is the name of the charge method to be used in ChargeFW2. Note that the computation of charge could be very expensive depending on the size of the structure. For more information about the charge types and parameters, please refer to its documentation.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Electronegativity(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the electronegativity of the atoms. The electronegativity is from the https://periodictable.com/.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.Hydrophobicity(dims=None, spacing=None, outfile=None, outkey=None, cutoff=None, sigma=None, padding=None, byres=None, outshape=None, force_recache=None, selection=None, translate='origin', frame_offset=None, **kwargs)[source]#
Bases:
FeatureAnnotate each atoms with the hydrophobicity of the atoms.
Notes
The hydrophobicity is calculated by the absolute difference between the electronegativity of the atom and the electronegativity of the carbon atom.
- cache(trajectory)[source]#
Cache the needed weights for each atom in the trajectory for further Feature.query() function
- Parameters:
- trajectorynearl.io.traj.Trajectory
- query(topology, frame_coords, focal_point)[source]#
Base function to query the coordinates within the the bounding box near the focal point and translate the coordinates to the center of the box
- Parameters:
- topologypytraj.Topology
The topology object
- frame_coordsnp.ndarray
The coordinates of the atoms
- focal_pointnp.ndarray
The focal points parsed from the your registered points see the featurizer object
- Returns:
- final_masknp.ndarray
The boolean mask of the atoms within the bounding box; shape=(n_atoms,)
- final_coordsnp.ndarray
The coordinates within the bounding box; shape=(queried_n_atoms, 3)
Notes
The frame coordinates are explicitly tranlated to match the focued part to the center of the box.
- class nearl.features.DynamicFeature(agg='mean', weight_type='mass', **kwargs)[source]#
Bases:
FeatureBase class for the dynamic features.
- Parameters:
- weight_typestr, default=”mass”
The weight type for the dynamic feature. Check
nearl.features.cache_properties()for the available weight types- aggstr, default=”mean”
The aggregation function for the dynamic feature.
Notes
Visit this function
nearl.features.cache_properties()to get the available properties for the dynamic features.After processing each frames, an aggregation function is applied to the [F,D,D,D] tensor to reduce the tensor to a [D,D,D] tensor. The following aggregation functions are supported:
Aggregation Type
Aggregation Type
mean
1
standard_deviation
2
median
3
variance
4
max
5
min
6
information_entropy
7
drift
8
- property agg#
Accepted aggregation functions for the dynamic feature:
- property weight_type#
Check SUPPORTED_FEATURES for the available weight types.
Notes
Be cautious about the partial charge (which contains both positive and negative values). A lot of aggregation functions compute weighted average (e.g. weighted center). Make sure that the weight and aggregation function valid in physical sense.
- cache(trajectory)[source]#
Take the required weight type (self.weight_type) and cache the weights for each atom in the trajectory
- query(topology, frame_coords, focal_point)[source]#
Query the coordinates and weights and feed for the following self.run function
Notes
self.MAX_ALLOWED_ATOMS: Depends on the GPU cache size for each thread
self.DEFAULT_COORD: The coordinates for padding of atoms in the box across required frames. Also hard coded in GPU code.
The return weight array should be flattened to a 1D array
- class nearl.features.DensityFlow(agg='mean', weight_type='mass', **kwargs)[source]#
Bases:
DynamicFeaturePerform Density flow algorithm on the registered trajectories. Each frame is firstly voxelized into 3D grid (sized [F,D,D,D]), and then aggregated the time dimension to a 3D grid (sized [D,D,D]).
Notes
Note
For weight types, please refer to the
nearl.features.cache_properties()function.For aggregation types, please refer to the
nearl.features.DynamicFeature.
- class nearl.features.MarchingObservers(obs='existence', **kwargs)[source]#
Bases:
DynamicFeaturePerform the marching observers algorithm on the registered trajectories. Each 3D voxel serves as a observer and the algorithm marches through the voxels to observe the motion of the atoms in the trajectory (sized [F,D,D,D]). The marching observers algorithm is then aggregated to a 3D grid (sized [D,D,D]).
Notes
Direct Count-based Observables
Property Name
Property Type
existence
1
direct_count
2
distinct_count
3
Weight-based Observables
Property Name
Property Type
mean_distance
11
cumulative_weight
12
density
13
dispersion
14
eccentricity
15
radius_of_gyration
16
For weight types, please refer to the
nearl.features.cache_properties()function.For aggregation types, please refer to the
nearl.features.DynamicFeature.- property obs#
The observation type for the marching observers algorithm
- class nearl.features.LabelIdentity(**kwargs)[source]#
Bases:
FeatureReturn the identity attribute of the trajectory as meta-data.
Notes
In default Trajectory type, the identity is the file name
In MisatoTraj Trajectory type, the identity is the PDB code
- class nearl.features.LabelAffinity(baseline_map, colname='pK', **kwargs)[source]#
Bases:
FeatureRead the PDBBind table and return the affinity values according to the pdbcode (identity from the trajectory)
- Parameters:
- baseline_mapstr
The path to the baseline map for the affinity values
- **kwargsdict
The additional arguments for the parent class
nearl.features.Feature
- Attributes:
- baseline_tablepd.DataFrame
The baseline table for the affinity values.
- base_valuefloat
The base value for the affinity values, searched in the
cacheandsearch_baselinefunctions.
- search_baseline(pdbcode)[source]#
Search the baseline value based on the PDB code.
Notes
The default implementation here is to read the csv from PDBBind dataset and search through the PDB code. Override this function to customize the search method for baseline affinity.
We recommend to use a map from the trajectory identity to the affinity values.
- cache(trajectory)[source]#
Loop up the baseline values from the designated table and cache the pairwise closest distances.
Notes
In this base type, it does not the super.cache() is not needed.
- query(*args)[source]#
Return the for the baseline affinity based on the
nearl.io.traj.Trajectory.identity(). No extra trajectory information is needed.
- class nearl.features.LabelPCDT(selection=None, search_cutoff=None, **kwargs)[source]#
Bases:
LabelAffinityLabel the feature based on the cosine similarity between the PCDT of the focused point and the cached PCDT array.
- Parameters:
- selectionstr or list, tuple, np.ndarray
The selection of the atoms for the PCDT calculation
- search_cutofffloat, default=None
The cutoff distance for limiting the outliers in the PCDT calculation
- **kwargsdict
The additional arguments for the parent class
nearl.features.LabelAffinity
- class nearl.features.LabelRMSD(base_value=0, outshape=(None,), **kwargs)[source]#
Bases:
LabelAffinity- cache(trajectory)[source]#
Loop up the baseline values from the designated table and cache the pairwise closest distances.
Notes
In this base type, it does not the super.cache() is not needed.
- query(topology, frames, focus)[source]#
Return the for the baseline affinity based on the
nearl.io.traj.Trajectory.identity(). No extra trajectory information is needed.
- nearl.features.cache_properties(trajectory, property_type, **kwargs)[source]#
Cache the required atomic properties for the trajectory (called in the
nearl.features.DynamicFeature.cache()). Re-implement the previously cached properties in the Feature classesNotes
Note
Direct count-based weights:
Property Name
Property Type
atomic_id
int
residue_id
int
Atom properties-based weights:
Property Name
Property Type
atomic_number
int
hybridization
int
mass
float
radius
float
electronegativity
float
hydrophobicity
float
partial_charge
float
uniformed
float
heavy_atom
boolean
aromaticity
boolean
ring
boolean
hbond_donor
boolean
hbond_acceptor
boolean
sidechainness
boolean
backboneness
boolean
atom_type
boolean
The atom_type needs the extra argument element_type (Atomic number of the element of interest).