This tutorial is used to give a guideline for using MD trajectory to calculate the diffuse scattering pattern, which is a common tool to understand the crystal structure in experiment.
Neutron and X-ray scattering are key methods for probing atomic structure and dynamics in condensed matter, utilizing thermal neutrons and X-rays with wavelengths (~Å) comparable to atomic spacings. In such experiments, the core measurement is the doubly-differential cross section, \(\dfrac{d\sigma}{d\Omega dE}\). This quantity represents the probability density for the incident beam to scatter into a solid angle element \(d\Omega\) (defined by the direction vector \(\Omega\)) with an energy change within \(dE\).
Integrating \(\dfrac{d\sigma}{d\Omega dE}\) over all energy transfers yields the differential cross section, \(\dfrac{d\sigma}{d\Omega}\), which counts all scattering into \(d\Omega\), regardless of energy change. This is the quantity measured in diffraction experiments. Conversely, inelastic scattering experiments directly measure \(\dfrac{d\sigma}{d\Omega dE}\).
Finally, integrating \(\dfrac{d\sigma}{d\Omega}\) over the entire solid angle gives the total cross section, \(\sigma\). This represents the effective cross-sectional area of the target responsible for scattering the incident beam.
The doubly-differential (i.e. inelastic) neutron scattering cross section is
\(\(\frac{d^2\sigma}{dEd\Omega} \propto \frac{k_i}{k_f} S(\bm{Q},E)\)\)
Here, we focus on the dynamic structure factor (DSF) \(S(\bm{Q},E)\), which only depends on the microscopic dynamics of the material and not on any experimental conditions. The doubly-differential cross section is what is measured in an experiment; the properties of the material are encoded in the DSF.
In the classical approximation, the coherent DSF is
$$S(\bm Q, E ) = \left| \sum_i^N b_i \int e^{i\bm Q \cdot \bm r_i(t) - iEt/\hbar} dt \right|^2 $$
with \(i\) running over all atoms in the crystal, \(b_i\) the element-specific coherent scattering length, and \(\bm r_i(t)\) the trajectory of atom \(i\). The scattering lengths are known parameters and the trajectories are calculated with classical molecular dynamics; then this equation can be efficiently calculated as a post processing step.
All the calculations should be performed on the CPU node, for small size system (e.g. < 10000 atoms), you can use 8V100 cluster to finish the calculations, otherwise please doing the calculation on the Wesklate HPC cluster.
First you should transfer the trajectory file from .lammpstrj format to .hdf5 format:
#usage:python convert_lammps_to_hdf5.py -i pos_1.lammpstrj pos_2.lammpstrj -o pos.hdf5"""Tool to read lammps text output and convert to format that can be read by my code i.e. trajectory_format='user_hdf5'.I assume that the trajectory was written using the commands dump pos all custom ${dump_freq} pos.dat id type xu yu zu dump_modify pos sort idIt is **essential** that you used the unwrapped positions (xu, yu, zu) and sort by id!This isn't parallelized (since parallel reads from the text file are apparently not faster).It simply reads one step at a time, writing each to the hdf5 file. It isn't fast, but it is simple!If you have multiple contiguous text files you want to put in the same hdf5 file, you can do it!Simply pass all text files in the order you want them put into the hdf5 file. My code won't checkthat they are contiguous (or consecutive), so be careful! Example usage: python convert_lammps_to_hdf5.py -i pos_1.dat pos_2.dat -o pos.hdf5"""importh5py,os,sysimportnumpyasnpfromitertoolsimportislicefromtimeitimportdefault_timerimportargparse# --------------------------------------------------------------------------------------------------classc_timer:# ----------------------------------------------------------------------------------------------def__init__(self,label,units='s'):""" small tool for timing and printing timing info (copied from some other code of mine...) """self.label=labelifunits=='m':self.units='m'self.scale=1/60elifunits=='ms':self.units='ms'self.scale=1000else:self.units='s'self.scale=1self.start_time=default_timer()# ----------------------------------------------------------------------------------------------defstop(self):""" stop timer and print timing info """elapsed_time=default_timer()-self.start_timeelapsed_time*=self.scalemsg=f'\ntiming: {self.label}{elapsed_time:9.5f} [{self.units}]'print(msg)# --------------------------------------------------------------------------------------------------# --------------------------------------------------------------------------------------------------classc_compressor:# ----------------------------------------------------------------------------------------------def__init__(self,lammps_traj_files,hdf5_traj_file):""" lammps_traj_files can be a list of files or a single one. all files are read and appended to the same hdf5 file. if multiple are given, they should be consecutive and contain exactly the same number of atoms in the same order. it doesnt matter how many steps are in each file, the code will check and work accordingly. hdf5_traj_file is the output hdf5 file. if it exists, it is overwritten. """# i.e. length of data block that isnt the positionsself.header_length=9# output hdf5 fileself.hdf5_traj_file=hdf5_traj_file# enforce list of input files is list (even if single file)ifnotisinstance(lammps_traj_files,list):lammps_traj_files=[lammps_traj_files]self.lammps_traj_files=lammps_traj_filesself.num_lammps_files=len(self.lammps_traj_files)# check that the lammps files existfor_finself.lammps_traj_files:ifnotos.path.exists(_f):print(f'\nthe file {_f} is missing!')sys.exit()print('\nlammps traj files:')print(self.lammps_traj_files)# get number of steps in each file... takes a while to 'loop' over the whole fileself._get_meta_data()# now go and read the files/write to hdf5self._parse_text_files()# ----------------------------------------------------------------------------------------------def_convert_box_vecs(self,lat_params):""" converts from lammps restricted triclinic box to generic lattice vectors. see https://docs.lammps.org/Howto_triclinic.html xlo_bound xhi_bound xy ylo_bound yhi_bound xz zlo_bound zhi_bound yz take lat_params = [xlo xhi xy] [ylo yhi xz] [zlo zhi yz] and convert to box_vecs = [ax 0 0] [bx by 0] [cx cy cz] """_a=lat_params[0,1]-lat_params[0,0]_b=lat_params[1,1]-lat_params[1,0]_c=lat_params[2,1]-lat_params[2,0]_xy=lat_params[0,2];_xz=lat_params[1,2];_yz=lat_params[2,2]_cos_a=(_xy*_xz+_b*_yz)/(_b*_c);_cos_b=_xz/_c;_cos_g=_xy/_b_ax=_a_bx=_b*_cos_g_by=np.sqrt(_b**2-_bx**2)_cx=_c*_cos_b_cy=(_b*_c*_cos_a-_bx*_cx)/_by_cz=np.sqrt(_c**2-_cx**2-_cy**2)box_vecs=np.zeros((3,3),dtype=float)box_vecs[0,0]=_abox_vecs[1,0]=_bx;box_vecs[1,1]=_bybox_vecs[2,0]=_cx;box_vecs[2,1]=_cy;box_vecs[2,2]=_czreturnbox_vecs# ----------------------------------------------------------------------------------------------def_process_types(self,types):""" my code expects the types to start at 1 and be contiguous. this converts them to contiguous and starting at 1 """_unique,inverse=np.unique(types,return_inverse=True)returninverse+1# ----------------------------------------------------------------------------------------------def_parse_text_files(self):""" loop over the text files, read each step one after another, and write them to the hdf5 database """_timer=c_timer('parse_text_files')# number of steps in each block to read (header+pos)_num_read=self.header_length+self.num_atoms# have types been written yet (only done at 1st step)_write_types=True# id, type, xu, yu, zu_data=np.zeros((self.num_atoms,5),dtype=float)_lat_params=np.zeros((3,3),dtype=float)withh5py.File(self.hdf5_traj_file,'w')asdb:# initialize datasets_db_pos=db.create_dataset('cartesian_pos',shape=(self.num_steps,self.num_atoms,3),dtype=float)_db_types=db.create_dataset('types',shape=(self.num_atoms,),dtype=int)_db_box_vecs=db.create_dataset('box_vectors',shape=(self.num_steps,3,3),dtype=float)_step=0# loop over the lammps filesforii,_lammps_fileinenumerate(self.lammps_traj_files):print('\nnow reading file:',_lammps_file)print('this may take a while ...')# number of steps in this file_num_steps=self.num_steps_per_file[ii]# timer to read this file_f_timer=c_timer(f'read[{ii}]')withopen(_lammps_file,'r')as_f:forjjinrange(_num_steps):ifjj%100==0:print(f'now on step {jj}/{_num_steps}')# read this step (suposedly islice is fastest way to do it)_lines=list(islice(_f,_num_read))# lattice parameters_lat_params[0,:]=_lines[5].strip().split()_lat_params[1,:]=_lines[6].strip().split()_lat_params[2,:]=_lines[7].strip().split()_box_vecs=self._convert_box_vecs(_lat_params)# convert types, positions, etc. to numbers_lines=[_.strip().split()for_in_lines[9:]]_data[...]=_lines# write to hdf5 file_db_pos[_step,...]=_data[:,2:]_db_box_vecs[_step,...]=_box_vecs[...]# only write at the 1st stepif_write_types:_types=self._process_types(_data[:,1])_db_types[...]=_types_write_types=False_step+=1_f_timer.stop()_timer.stop()# ----------------------------------------------------------------------------------------------def_get_meta_data(self):""" get number of atoms and number of steps from the lammps files """_timer=c_timer('get_meta_data')print('\ngetting meta_data from files ...')# make sure all files have the same number of atoms_num_atom_arr=np.zeros(self.num_lammps_files,dtype=int)self.num_lines_per_file=np.zeros(self.num_lammps_files,dtype=int)self.num_steps_per_file=np.zeros(self.num_lammps_files,dtype=int)forii,_lammps_fileinenumerate(self.lammps_traj_files):# get number of atoms in the filewithopen(_lammps_file,'r')as_f:forjjinrange(3):_f.readline()_num_atoms=int(_f.readline().strip())_num_atom_arr[ii]=_num_atoms# now check total number of lines/number of steps in the filewithopen(_lammps_file,"rb")as_f:_num_lines=sum(1for_in_f)_num_steps=int(_num_lines/(self.header_length+_num_atoms))self.num_steps_per_file[ii]=_num_stepsself.num_lines_per_file[ii]=_num_linesprint('\nfile name:',_lammps_file)print('num atoms:',_num_atoms)print('num lines:',_num_lines)print('num steps:',_num_steps)# make sure all files have same number of atomsself.num_atoms=_num_atom_arr[0]ifany(self.num_atoms-_num_atom_arr):print('\nnumber of atoms in all files must be the same!')sys.exit()self.num_steps=self.num_steps_per_file.sum()print('\ntotal num steps:',self.num_steps)_timer.stop()# ----------------------------------------------------------------------------------------------# --------------------------------------------------------------------------------------------------defparse_args():""" get lammps and hdf5 file names from command line """# get cmd line argsdescription='command line args for \'compress.py\''cmd_parser=argparse.ArgumentParser(description=description)# input filehelp_msg='lammps trajectory files to be merged into an hdf5 database. give one or\n' \
'several. they should be consecutive (in time) and should have exactly\n' \
'the same number of atoms in the same order. it doenst matter how many steps\n' \
'are in each however\nthey should have be written with a command like:\n' \
'---- dump pos all custom ${dump_freq} pos.dat id type xu yu zu\n' \
'---- dump_modify pos sort id'cmd_parser.add_argument('-i','--input-files',default=['traj.lammpstrj'],help=help_msg,nargs='+')# output filehelp_msg='the name of the output hdf5 file'cmd_parser.add_argument('-o','--output-file',default='pos.hdf5',help=help_msg)# get cmd line argscmd_args=cmd_parser.parse_args()input_files=cmd_args.input_filesoutput_file=cmd_args.output_filereturninput_files,output_file# --------------------------------------------------------------------------------------------------# --------------------------------------------------------------------------------------------------if__name__=='__main__':lammps_traj_files,hdf5_traj_file=parse_args()c_compressor(lammps_traj_files,hdf5_traj_file)
You should then modify the parameters in the psf_input_diffuse.pyfile. Pay special attention to the following critical parameters:
- trajectory_file: The filename of your converted hdf5 trajectory.
- num_trajectory_blocks: The number of blocks to divide the trajectory into. Using more blocks reduces computational requirements but lowers accuracy (recommended: 5–10).
- trajectory_blocks: Specifies which trajectory blocks should be used for scattering calculations.
- calc_sqw: Determines whether to calculate the inelastic component.
- output_prefix: The filename prefix for output files.
- lattice_vectors: The average lattice vectors of trajectory, which can easily obtained by get_lattice.py:
Q_mesh_H, Q_mesh_K, Q_mesh_H. Define the resolution in \(q\)-space. Note: Calculations are for a 2D scattering pattern, so set one axis to a number rather than a range. For example:
This configuration calculates the scattering pattern on the \(H=0\) plane. The entry Q_mesh_K = [0,2.5,81] specifies the \(K\)-dimension range $0 \leq H \leq 2.5 $ with 81 points. The number of points should be chosen according to the supercell size and desired Q-space range: for a \(32 \times 32 \times 32\) supercell spanning 2.5 r.l.u., use \(32 \times 2.5+1=81\) points.
# options for where to get data/preprocessingtrajectory_format='user_hdf5'trajectory_file='pos.hdf5'# options for splitting up trajectorynum_trajectory_blocks=10trajectory_blocks=[0,1,2,3,4,5,6,7,8,9]# xu, yu, zu are already unwrappedunwrap_trajectory=Falsecalc_sqw=False# options for writing resultsoutput_directory=Noneoutput_prefix='300_PINPMNPT_32_H=0'# simulation inputsmd_time_step=200# femtoseconds; time step IN FILEmd_num_steps=5000# number of steps IN FILEmd_num_atoms=163840# unit cell used to define Q-points in cartesian coordslattice_vectors=[[4.03372121,0.00000000,0.00000000],[0.00000000,4.03916808,0.00000000],[0.00000000,0.00000000,4.03656194]]# vectors spanning the simulation cellbox_vectors=None# experiment infoexperiment_type='neutrons'# 'neutrons' or 'xrays'# 2=Pb, 9=Ti, 11=Nb, 12=Mg, 13=In, 15=Oatom_types=['Pb','Ti','Nb','Mg','In','O']# options for how to generate Q-points Qpoints_option='mesh'# for Qpoints_option == 'mesh' Q_mesh_H=0Q_mesh_K=[0,2.5,81]Q_mesh_L=[0,2.5,81]# number of processes to split Q-point parallelization overnum_Qpoint_procs=64
Then you can use the following command to start the calculation:
python PSF.py -i psf_input_diffuse.py
Where PSF.py and folder psf have been uploaded to the /docs/asset/script/scattering_pattern directory, you should add them into your working directory.