2024-07-07 13:25:16.237 INFO: Process group initialized: True 2024-07-07 13:25:16.239 INFO: Processes: 80 2024-07-07 13:25:16.240 INFO: MACE version: 0.3.0 2024-07-07 13:25:16.240 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-07 13:25:16.240 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-07 13:25:16.240 INFO: Using statistics json file 2024-07-07 13:25:16.240 INFO: Using atomic numbers from statistics file 2024-07-07 13:25:16.241 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-07 13:25:16.241 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-07 13:25:16.241 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-07 13:25:50.861 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-07 13:25:50.863 INFO: Average number of neighbors: 61.964672446250916 2024-07-07 13:25:50.863 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-07 13:25:50.863 INFO: Building model 2024-07-07 13:25:50.864 INFO: Hidden irreps: 128x0e+128x1o 2024-07-07 13:25:53.063 WARNING: Cannot find checkpoint with tag '10-128-L1-universal-branch_run-1' in 'checkpoints' 2024-07-07 13:25:53.068 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-07 13:25:53.074 INFO: Number of parameters: 4688656 2024-07-07 13:25:53.074 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-07 13:25:53.074 INFO: Using Weights and Biases for logging 2024-07-07 13:27:24.731 INFO: Using gradient clipping with tolerance=100.000 2024-07-07 13:27:24.732 INFO: Started training 2024-07-07 13:27:31.127 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.128 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.128 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.128 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.129 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.129 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.129 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.129 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.130 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.133 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:27:31.134 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 13:46:01.884 INFO: Epoch 0: loss=1.5723e-02, MAE_E_per_atom=194.6068 meV, MAE_F=116.0029 meV / A, MAE_stress_per_atom=0.2873 meV / A^3 2024-07-07 13:56:07.965 INFO: Epoch 1: loss=1.3567e-02, MAE_E_per_atom=122.4339 meV, MAE_F=110.5367 meV / A, MAE_stress_per_atom=0.3178 meV / A^3 2024-07-07 14:06:08.067 INFO: Epoch 2: loss=1.1773e-02, MAE_E_per_atom=86.6495 meV, MAE_F=103.0815 meV / A, MAE_stress_per_atom=0.2908 meV / A^3 2024-07-07 14:16:11.257 INFO: Epoch 3: loss=1.0948e-02, MAE_E_per_atom=76.8946 meV, MAE_F=95.6145 meV / A, MAE_stress_per_atom=0.2510 meV / A^3 2024-07-07 14:26:11.898 INFO: Epoch 4: loss=1.0264e-02, MAE_E_per_atom=69.0325 meV, MAE_F=89.4822 meV / A, MAE_stress_per_atom=0.2120 meV / A^3 2024-07-07 14:36:12.161 INFO: Epoch 5: loss=9.8727e-03, MAE_E_per_atom=63.8626 meV, MAE_F=85.0933 meV / A, MAE_stress_per_atom=0.2010 meV / A^3 2024-07-07 14:46:08.605 INFO: Epoch 6: loss=9.6738e-03, MAE_E_per_atom=59.6770 meV, MAE_F=83.1574 meV / A, MAE_stress_per_atom=0.2033 meV / A^3 2024-07-07 14:56:07.295 INFO: Epoch 7: loss=9.2793e-03, MAE_E_per_atom=56.0459 meV, MAE_F=80.3933 meV / A, MAE_stress_per_atom=0.1820 meV / A^3 2024-07-07 15:06:06.694 INFO: Epoch 8: loss=9.1463e-03, MAE_E_per_atom=53.8490 meV, MAE_F=78.7675 meV / A, MAE_stress_per_atom=0.1868 meV / A^3 2024-07-07 15:16:03.969 INFO: Epoch 9: loss=8.8894e-03, MAE_E_per_atom=51.5073 meV, MAE_F=77.1963 meV / A, MAE_stress_per_atom=0.1839 meV / A^3 2024-07-07 15:54:51.472 INFO: Process group initialized: True 2024-07-07 15:54:51.474 INFO: Processes: 80 2024-07-07 15:54:51.474 INFO: MACE version: 0.3.0 2024-07-07 15:54:51.474 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-07 15:54:51.474 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-07 15:54:51.475 INFO: Using statistics json file 2024-07-07 15:54:51.475 INFO: Using atomic numbers from statistics file 2024-07-07 15:54:51.475 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-07 15:54:51.475 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-07 15:54:51.476 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-07 15:55:23.086 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-07 15:55:23.088 INFO: Average number of neighbors: 61.964672446250916 2024-07-07 15:55:23.088 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-07 15:55:23.088 INFO: Building model 2024-07-07 15:55:23.089 INFO: Hidden irreps: 128x0e+128x1o 2024-07-07 15:55:25.306 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-07 15:55:25.307 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-9.pt 2024-07-07 15:55:25.541 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-07 15:55:25.547 INFO: Number of parameters: 4688656 2024-07-07 15:55:25.548 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-07 15:55:25.548 INFO: Using Weights and Biases for logging 2024-07-07 15:55:39.793 INFO: Using gradient clipping with tolerance=100.000 2024-07-07 15:55:39.793 INFO: Started training 2024-07-07 15:55:48.725 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.725 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.725 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.725 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.737 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.737 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.737 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.737 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.738 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.739 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.739 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.747 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.750 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.751 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 15:55:48.754 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 16:14:12.254 INFO: Epoch 9: loss=8.7583e-03, MAE_E_per_atom=50.1417 meV, MAE_F=76.3792 meV / A, MAE_stress_per_atom=0.1715 meV / A^3 2024-07-07 16:24:17.901 INFO: Epoch 10: loss=8.6025e-03, MAE_E_per_atom=47.8812 meV, MAE_F=75.3058 meV / A, MAE_stress_per_atom=0.1618 meV / A^3 2024-07-07 16:34:19.974 INFO: Epoch 11: loss=1.1066e-02, MAE_E_per_atom=74.6241 meV, MAE_F=96.8333 meV / A, MAE_stress_per_atom=0.2830 meV / A^3 2024-07-07 16:44:20.742 INFO: Epoch 12: loss=8.7407e-03, MAE_E_per_atom=49.9331 meV, MAE_F=76.8427 meV / A, MAE_stress_per_atom=0.1889 meV / A^3 2024-07-07 16:54:23.445 INFO: Epoch 13: loss=8.3937e-03, MAE_E_per_atom=46.5779 meV, MAE_F=74.4076 meV / A, MAE_stress_per_atom=0.1581 meV / A^3 2024-07-07 17:04:26.104 INFO: Epoch 14: loss=8.2400e-03, MAE_E_per_atom=44.1439 meV, MAE_F=73.1388 meV / A, MAE_stress_per_atom=0.1484 meV / A^3 2024-07-07 17:14:29.856 INFO: Epoch 15: loss=8.0854e-03, MAE_E_per_atom=42.4761 meV, MAE_F=72.3004 meV / A, MAE_stress_per_atom=0.1499 meV / A^3 2024-07-07 17:24:35.362 INFO: Epoch 16: loss=8.0754e-03, MAE_E_per_atom=40.4542 meV, MAE_F=72.4723 meV / A, MAE_stress_per_atom=0.1510 meV / A^3 2024-07-07 17:34:37.753 INFO: Epoch 17: loss=7.8177e-03, MAE_E_per_atom=40.0379 meV, MAE_F=70.5780 meV / A, MAE_stress_per_atom=0.1472 meV / A^3 2024-07-07 17:44:39.731 INFO: Epoch 18: loss=7.6936e-03, MAE_E_per_atom=39.0848 meV, MAE_F=69.4633 meV / A, MAE_stress_per_atom=0.1492 meV / A^3 2024-07-07 18:10:53.554 INFO: Process group initialized: True 2024-07-07 18:10:53.556 INFO: Processes: 80 2024-07-07 18:10:53.556 INFO: MACE version: 0.3.0 2024-07-07 18:10:53.556 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-07 18:10:53.556 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-07 18:10:53.556 INFO: Using statistics json file 2024-07-07 18:10:53.556 INFO: Using atomic numbers from statistics file 2024-07-07 18:10:53.557 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-07 18:10:53.557 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-07 18:10:53.557 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-07 18:11:25.185 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-07 18:11:25.187 INFO: Average number of neighbors: 61.964672446250916 2024-07-07 18:11:25.187 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-07 18:11:25.187 INFO: Building model 2024-07-07 18:11:25.188 INFO: Hidden irreps: 128x0e+128x1o 2024-07-07 18:11:27.385 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-07 18:11:27.386 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-18.pt 2024-07-07 18:11:27.606 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-07 18:11:27.612 INFO: Number of parameters: 4688656 2024-07-07 18:11:27.612 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-07 18:11:27.612 INFO: Using Weights and Biases for logging 2024-07-07 18:12:00.729 INFO: Using gradient clipping with tolerance=100.000 2024-07-07 18:12:00.729 INFO: Started training 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.075 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.075 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.068 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.075 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.070 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:12:07.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 18:29:55.947 INFO: Epoch 18: loss=7.5115e-03, MAE_E_per_atom=38.0862 meV, MAE_F=68.3334 meV / A, MAE_stress_per_atom=0.1483 meV / A^3 2024-07-07 18:39:15.579 INFO: Epoch 19: loss=7.4634e-03, MAE_E_per_atom=36.8681 meV, MAE_F=68.2253 meV / A, MAE_stress_per_atom=0.1524 meV / A^3 2024-07-07 18:48:30.951 INFO: Epoch 20: loss=7.2889e-03, MAE_E_per_atom=36.0528 meV, MAE_F=66.9958 meV / A, MAE_stress_per_atom=0.1500 meV / A^3 2024-07-07 18:57:49.022 INFO: Epoch 21: loss=7.1440e-03, MAE_E_per_atom=35.8874 meV, MAE_F=66.1016 meV / A, MAE_stress_per_atom=0.1404 meV / A^3 2024-07-07 19:07:06.510 INFO: Epoch 22: loss=7.0660e-03, MAE_E_per_atom=34.0148 meV, MAE_F=65.8472 meV / A, MAE_stress_per_atom=0.1463 meV / A^3 2024-07-07 19:16:31.183 INFO: Epoch 23: loss=6.9650e-03, MAE_E_per_atom=33.8223 meV, MAE_F=64.1642 meV / A, MAE_stress_per_atom=0.1574 meV / A^3 2024-07-07 19:25:52.903 INFO: Epoch 24: loss=6.9366e-03, MAE_E_per_atom=33.1759 meV, MAE_F=63.8355 meV / A, MAE_stress_per_atom=0.1552 meV / A^3 2024-07-07 19:35:10.814 INFO: Epoch 25: loss=6.8456e-03, MAE_E_per_atom=32.3447 meV, MAE_F=63.2208 meV / A, MAE_stress_per_atom=0.1431 meV / A^3 2024-07-07 19:44:33.497 INFO: Epoch 26: loss=6.6410e-03, MAE_E_per_atom=31.2068 meV, MAE_F=62.0054 meV / A, MAE_stress_per_atom=0.1411 meV / A^3 2024-07-07 19:53:50.586 INFO: Epoch 27: loss=6.6849e-03, MAE_E_per_atom=30.8018 meV, MAE_F=62.0667 meV / A, MAE_stress_per_atom=0.1331 meV / A^3 2024-07-07 20:03:09.223 INFO: Epoch 28: loss=6.6120e-03, MAE_E_per_atom=30.6398 meV, MAE_F=61.1355 meV / A, MAE_stress_per_atom=0.1382 meV / A^3 2024-07-07 20:56:58.057 INFO: Process group initialized: True 2024-07-07 20:56:58.059 INFO: Processes: 80 2024-07-07 20:56:58.059 INFO: MACE version: 0.3.0 2024-07-07 20:56:58.059 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-07 20:56:58.059 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-07 20:56:58.059 INFO: Using statistics json file 2024-07-07 20:56:58.059 INFO: Using atomic numbers from statistics file 2024-07-07 20:56:58.060 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-07 20:56:58.060 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-07 20:56:58.060 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-07 20:57:29.928 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-07 20:57:29.930 INFO: Average number of neighbors: 61.964672446250916 2024-07-07 20:57:29.930 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-07 20:57:29.930 INFO: Building model 2024-07-07 20:57:29.931 INFO: Hidden irreps: 128x0e+128x1o 2024-07-07 20:57:32.123 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-07 20:57:32.124 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-28.pt 2024-07-07 20:57:32.348 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-07 20:57:32.355 INFO: Number of parameters: 4688656 2024-07-07 20:57:32.355 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-07 20:57:32.355 INFO: Using Weights and Biases for logging 2024-07-07 20:57:46.575 INFO: Using gradient clipping with tolerance=100.000 2024-07-07 20:57:46.576 INFO: Started training 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.176 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.176 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.175 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.178 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.179 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 20:57:53.180 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-07 21:16:24.133 INFO: Epoch 28: loss=7.3723e-03, MAE_E_per_atom=36.1923 meV, MAE_F=68.4059 meV / A, MAE_stress_per_atom=0.1337 meV / A^3 2024-07-07 21:26:31.647 INFO: Epoch 29: loss=6.7565e-03, MAE_E_per_atom=32.2850 meV, MAE_F=62.1712 meV / A, MAE_stress_per_atom=0.1397 meV / A^3 2024-07-07 21:36:31.694 INFO: Epoch 30: loss=6.7226e-03, MAE_E_per_atom=31.6745 meV, MAE_F=61.3044 meV / A, MAE_stress_per_atom=0.1363 meV / A^3 2024-07-07 21:46:34.595 INFO: Epoch 31: loss=6.6338e-03, MAE_E_per_atom=30.1930 meV, MAE_F=60.6589 meV / A, MAE_stress_per_atom=0.1298 meV / A^3 2024-07-07 21:56:35.186 INFO: Epoch 32: loss=6.5306e-03, MAE_E_per_atom=30.4252 meV, MAE_F=59.7769 meV / A, MAE_stress_per_atom=0.1294 meV / A^3 2024-07-07 22:06:35.759 INFO: Epoch 33: loss=6.4963e-03, MAE_E_per_atom=30.0661 meV, MAE_F=59.2799 meV / A, MAE_stress_per_atom=0.1298 meV / A^3 2024-07-07 22:16:42.206 INFO: Epoch 34: loss=6.5154e-03, MAE_E_per_atom=29.7327 meV, MAE_F=59.0187 meV / A, MAE_stress_per_atom=0.1278 meV / A^3 2024-07-07 22:26:42.716 INFO: Epoch 35: loss=6.4258e-03, MAE_E_per_atom=29.4204 meV, MAE_F=58.6842 meV / A, MAE_stress_per_atom=0.1257 meV / A^3 2024-07-07 22:36:54.837 INFO: Epoch 36: loss=6.5061e-03, MAE_E_per_atom=29.2600 meV, MAE_F=58.5238 meV / A, MAE_stress_per_atom=0.1336 meV / A^3 2024-07-07 22:46:55.305 INFO: Epoch 37: loss=6.4630e-03, MAE_E_per_atom=29.0865 meV, MAE_F=58.4506 meV / A, MAE_stress_per_atom=0.1261 meV / A^3 2024-07-08 00:15:32.436 INFO: Process group initialized: True 2024-07-08 00:15:32.438 INFO: Processes: 80 2024-07-08 00:15:32.439 INFO: MACE version: 0.3.0 2024-07-08 00:15:32.439 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-08 00:15:32.439 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-08 00:15:32.440 INFO: Using statistics json file 2024-07-08 00:15:32.440 INFO: Using atomic numbers from statistics file 2024-07-08 00:15:32.440 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-08 00:15:32.440 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-08 00:15:32.441 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-08 00:16:04.480 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-08 00:16:04.482 INFO: Average number of neighbors: 61.964672446250916 2024-07-08 00:16:04.482 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-08 00:16:04.482 INFO: Building model 2024-07-08 00:16:04.483 INFO: Hidden irreps: 128x0e+128x1o 2024-07-08 00:16:06.688 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-08 00:16:06.689 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-37.pt 2024-07-08 00:16:06.904 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-08 00:16:06.910 INFO: Number of parameters: 4688656 2024-07-08 00:16:06.910 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-08 00:16:06.910 INFO: Using Weights and Biases for logging 2024-07-08 16:23:19.141 INFO: Process group initialized: True 2024-07-08 16:23:19.143 INFO: Processes: 80 2024-07-08 16:23:19.143 INFO: MACE version: 0.3.0 2024-07-08 16:23:19.143 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-08 16:23:19.143 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-08 16:23:19.143 INFO: Using statistics json file 2024-07-08 16:23:19.143 INFO: Using atomic numbers from statistics file 2024-07-08 16:23:19.144 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-08 16:23:19.144 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-08 16:23:19.144 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-08 16:23:53.927 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-08 16:23:53.930 INFO: Average number of neighbors: 61.964672446250916 2024-07-08 16:23:53.930 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-08 16:23:53.930 INFO: Building model 2024-07-08 16:23:53.931 INFO: Hidden irreps: 128x0e+128x1o 2024-07-08 16:23:56.197 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-08 16:23:56.198 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-37.pt 2024-07-08 16:23:56.426 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-08 16:23:56.432 INFO: Number of parameters: 4688656 2024-07-08 16:23:56.432 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-08 16:23:56.432 INFO: Using Weights and Biases for logging 2024-07-08 16:24:18.169 INFO: Using gradient clipping with tolerance=100.000 2024-07-08 16:24:18.170 INFO: Started training 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.943 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.945 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.945 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:27.945 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.077 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.076 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.081 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.078 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.079 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.080 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.081 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:24:28.081 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 16:42:45.328 INFO: Epoch 37: loss=6.4009e-03, MAE_E_per_atom=28.8778 meV, MAE_F=57.8408 meV / A, MAE_stress_per_atom=0.1246 meV / A^3 2024-07-08 16:52:59.530 INFO: Epoch 38: loss=6.3536e-03, MAE_E_per_atom=28.9818 meV, MAE_F=57.1367 meV / A, MAE_stress_per_atom=0.1264 meV / A^3 2024-07-08 17:02:53.982 INFO: Epoch 39: loss=6.2680e-03, MAE_E_per_atom=29.1768 meV, MAE_F=56.8391 meV / A, MAE_stress_per_atom=0.1283 meV / A^3 2024-07-08 17:12:50.606 INFO: Epoch 40: loss=7.9153e-03, MAE_E_per_atom=41.2159 meV, MAE_F=69.7881 meV / A, MAE_stress_per_atom=0.2138 meV / A^3 2024-07-08 17:22:44.235 INFO: Epoch 41: loss=6.9080e-03, MAE_E_per_atom=31.8114 meV, MAE_F=61.4739 meV / A, MAE_stress_per_atom=0.1755 meV / A^3 2024-07-08 17:32:41.584 INFO: Epoch 42: loss=6.5549e-03, MAE_E_per_atom=30.7544 meV, MAE_F=58.8363 meV / A, MAE_stress_per_atom=0.1331 meV / A^3 2024-07-08 17:42:36.997 INFO: Epoch 43: loss=6.4541e-03, MAE_E_per_atom=30.1189 meV, MAE_F=57.3367 meV / A, MAE_stress_per_atom=0.1309 meV / A^3 2024-07-08 17:52:29.826 INFO: Epoch 44: loss=6.3518e-03, MAE_E_per_atom=29.8075 meV, MAE_F=56.5746 meV / A, MAE_stress_per_atom=0.1350 meV / A^3 2024-07-08 18:02:25.044 INFO: Epoch 45: loss=6.3214e-03, MAE_E_per_atom=29.4061 meV, MAE_F=56.4764 meV / A, MAE_stress_per_atom=0.1388 meV / A^3 2024-07-08 18:12:21.314 INFO: Epoch 46: loss=6.3104e-03, MAE_E_per_atom=29.1116 meV, MAE_F=55.8126 meV / A, MAE_stress_per_atom=0.1435 meV / A^3 2024-07-08 18:24:50.758 INFO: Process group initialized: True 2024-07-08 18:24:50.759 INFO: Processes: 80 2024-07-08 18:24:50.759 INFO: MACE version: 0.3.0 2024-07-08 18:24:50.760 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-08 18:24:50.760 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-08 18:24:50.761 INFO: Using statistics json file 2024-07-08 18:24:50.761 INFO: Using atomic numbers from statistics file 2024-07-08 18:24:50.761 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-08 18:24:50.761 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-08 18:24:50.762 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-08 18:25:24.495 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-08 18:25:24.497 INFO: Average number of neighbors: 61.964672446250916 2024-07-08 18:25:24.497 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-08 18:25:24.497 INFO: Building model 2024-07-08 18:25:24.498 INFO: Hidden irreps: 128x0e+128x1o 2024-07-08 18:25:26.696 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-08 18:25:26.698 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-46.pt 2024-07-08 18:25:26.929 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-08 18:25:26.935 INFO: Number of parameters: 4688656 2024-07-08 18:25:26.935 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-08 18:25:26.935 INFO: Using Weights and Biases for logging 2024-07-08 18:25:45.430 INFO: Using gradient clipping with tolerance=100.000 2024-07-08 18:25:45.431 INFO: Started training 2024-07-08 18:25:51.972 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.972 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.972 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.972 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.977 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.989 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.994 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:25:51.995 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 18:44:05.985 INFO: Epoch 46: loss=6.2869e-03, MAE_E_per_atom=28.8083 meV, MAE_F=55.6480 meV / A, MAE_stress_per_atom=0.1450 meV / A^3 2024-07-08 18:54:15.790 INFO: Epoch 47: loss=6.3039e-03, MAE_E_per_atom=28.6102 meV, MAE_F=55.2902 meV / A, MAE_stress_per_atom=0.1382 meV / A^3 2024-07-08 19:04:16.936 INFO: Epoch 48: loss=6.2872e-03, MAE_E_per_atom=28.4554 meV, MAE_F=55.1959 meV / A, MAE_stress_per_atom=0.1374 meV / A^3 2024-07-08 19:14:16.847 INFO: Epoch 49: loss=6.3432e-03, MAE_E_per_atom=28.7872 meV, MAE_F=55.2102 meV / A, MAE_stress_per_atom=0.1375 meV / A^3 2024-07-08 19:24:17.576 INFO: Epoch 50: loss=6.2843e-03, MAE_E_per_atom=28.4914 meV, MAE_F=54.9057 meV / A, MAE_stress_per_atom=0.1400 meV / A^3 2024-07-08 19:34:20.840 INFO: Epoch 51: loss=6.2672e-03, MAE_E_per_atom=28.6121 meV, MAE_F=54.6035 meV / A, MAE_stress_per_atom=0.1354 meV / A^3 2024-07-08 19:44:22.400 INFO: Epoch 52: loss=6.2560e-03, MAE_E_per_atom=28.3262 meV, MAE_F=54.3985 meV / A, MAE_stress_per_atom=0.1370 meV / A^3 2024-07-08 19:54:24.449 INFO: Epoch 53: loss=6.2749e-03, MAE_E_per_atom=28.2907 meV, MAE_F=54.3149 meV / A, MAE_stress_per_atom=0.1365 meV / A^3 2024-07-08 20:04:23.642 INFO: Epoch 54: loss=6.2536e-03, MAE_E_per_atom=28.0564 meV, MAE_F=54.3253 meV / A, MAE_stress_per_atom=0.1378 meV / A^3 2024-07-08 20:14:22.527 INFO: Epoch 55: loss=6.2361e-03, MAE_E_per_atom=28.4002 meV, MAE_F=54.2933 meV / A, MAE_stress_per_atom=0.1332 meV / A^3 2024-07-08 21:31:24.533 INFO: Process group initialized: True 2024-07-08 21:31:24.535 INFO: Processes: 80 2024-07-08 21:31:24.535 INFO: MACE version: 0.3.0 2024-07-08 21:31:24.535 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-08 21:31:24.535 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-08 21:31:24.535 INFO: Using statistics json file 2024-07-08 21:31:24.536 INFO: Using atomic numbers from statistics file 2024-07-08 21:31:24.536 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-08 21:31:24.536 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-08 21:31:24.536 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-08 21:31:57.039 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-08 21:31:57.041 INFO: Average number of neighbors: 61.964672446250916 2024-07-08 21:31:57.041 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-08 21:31:57.041 INFO: Building model 2024-07-08 21:31:57.042 INFO: Hidden irreps: 128x0e+128x1o 2024-07-08 21:31:59.263 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-08 21:31:59.264 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-55.pt 2024-07-08 21:31:59.490 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-08 21:31:59.496 INFO: Number of parameters: 4688656 2024-07-08 21:31:59.496 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-08 21:31:59.496 INFO: Using Weights and Biases for logging 2024-07-08 21:32:20.227 INFO: Using gradient clipping with tolerance=100.000 2024-07-08 21:32:20.227 INFO: Started training 2024-07-08 21:32:26.986 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.987 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.987 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.987 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:26.988 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.002 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.003 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:32:27.006 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-08 21:50:53.248 INFO: Epoch 55: loss=6.2376e-03, MAE_E_per_atom=28.3040 meV, MAE_F=54.5784 meV / A, MAE_stress_per_atom=0.1319 meV / A^3 2024-07-08 22:00:50.419 INFO: Epoch 56: loss=6.2309e-03, MAE_E_per_atom=27.9655 meV, MAE_F=54.1231 meV / A, MAE_stress_per_atom=0.1306 meV / A^3 2024-07-08 22:10:44.856 INFO: Epoch 57: loss=6.2339e-03, MAE_E_per_atom=28.1090 meV, MAE_F=53.9832 meV / A, MAE_stress_per_atom=0.1376 meV / A^3 2024-07-08 22:20:34.110 INFO: Epoch 58: loss=6.1849e-03, MAE_E_per_atom=27.7807 meV, MAE_F=53.7314 meV / A, MAE_stress_per_atom=0.1326 meV / A^3 2024-07-08 22:30:23.863 INFO: Epoch 59: loss=6.2432e-03, MAE_E_per_atom=27.8814 meV, MAE_F=53.8200 meV / A, MAE_stress_per_atom=0.1351 meV / A^3 2024-07-08 22:40:13.682 INFO: Epoch 60: loss=6.2134e-03, MAE_E_per_atom=27.6604 meV, MAE_F=53.5038 meV / A, MAE_stress_per_atom=0.1341 meV / A^3 2024-07-08 22:50:00.490 INFO: Epoch 61: loss=6.1111e-03, MAE_E_per_atom=27.8344 meV, MAE_F=53.2316 meV / A, MAE_stress_per_atom=0.1344 meV / A^3 2024-07-08 22:59:50.943 INFO: Epoch 62: loss=6.1530e-03, MAE_E_per_atom=27.6821 meV, MAE_F=53.3191 meV / A, MAE_stress_per_atom=0.1352 meV / A^3 2024-07-08 23:09:41.260 INFO: Epoch 63: loss=6.2431e-03, MAE_E_per_atom=27.4301 meV, MAE_F=53.7625 meV / A, MAE_stress_per_atom=0.1336 meV / A^3 2024-07-08 23:19:31.881 INFO: Epoch 64: loss=6.1663e-03, MAE_E_per_atom=27.6284 meV, MAE_F=53.1728 meV / A, MAE_stress_per_atom=0.1318 meV / A^3 2024-07-09 01:17:14.059 INFO: Process group initialized: True 2024-07-09 01:17:14.061 INFO: Processes: 80 2024-07-09 01:17:14.061 INFO: MACE version: 0.3.0 2024-07-09 01:17:14.062 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 01:17:14.062 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 01:17:14.062 INFO: Using statistics json file 2024-07-09 01:17:14.062 INFO: Using atomic numbers from statistics file 2024-07-09 01:17:14.063 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 01:17:14.063 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 01:17:14.063 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 01:17:45.744 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 01:17:45.747 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 01:17:45.747 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 01:17:45.747 INFO: Building model 2024-07-09 01:17:45.748 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 01:17:47.957 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 01:17:47.958 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-64.pt 2024-07-09 01:17:48.178 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 01:17:48.185 INFO: Number of parameters: 4688656 2024-07-09 01:17:48.185 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 01:17:48.185 INFO: Using Weights and Biases for logging 2024-07-09 01:18:05.114 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 01:18:05.115 INFO: Started training 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.592 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.592 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.586 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.604 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.592 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.587 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.592 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.604 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.594 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.595 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.602 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:18:11.603 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 01:36:44.522 INFO: Epoch 64: loss=6.1634e-03, MAE_E_per_atom=27.4848 meV, MAE_F=53.2025 meV / A, MAE_stress_per_atom=0.1300 meV / A^3 2024-07-09 01:46:56.656 INFO: Epoch 65: loss=6.0673e-03, MAE_E_per_atom=27.0182 meV, MAE_F=52.4246 meV / A, MAE_stress_per_atom=0.1374 meV / A^3 2024-07-09 01:56:57.666 INFO: Epoch 66: loss=6.0544e-03, MAE_E_per_atom=26.8928 meV, MAE_F=52.7696 meV / A, MAE_stress_per_atom=0.1277 meV / A^3 2024-07-09 02:06:58.227 INFO: Epoch 67: loss=6.0457e-03, MAE_E_per_atom=27.0551 meV, MAE_F=52.7036 meV / A, MAE_stress_per_atom=0.1325 meV / A^3 2024-07-09 02:17:03.421 INFO: Epoch 68: loss=6.0601e-03, MAE_E_per_atom=27.2199 meV, MAE_F=52.3830 meV / A, MAE_stress_per_atom=0.1297 meV / A^3 2024-07-09 02:27:06.429 INFO: Epoch 69: loss=5.9885e-03, MAE_E_per_atom=27.3742 meV, MAE_F=52.1066 meV / A, MAE_stress_per_atom=0.1331 meV / A^3 2024-07-09 02:37:08.571 INFO: Epoch 70: loss=6.0426e-03, MAE_E_per_atom=27.1307 meV, MAE_F=52.2120 meV / A, MAE_stress_per_atom=0.1348 meV / A^3 2024-07-09 02:47:10.627 INFO: Epoch 71: loss=5.9649e-03, MAE_E_per_atom=26.8412 meV, MAE_F=51.9032 meV / A, MAE_stress_per_atom=0.1279 meV / A^3 2024-07-09 02:57:12.727 INFO: Epoch 72: loss=5.9663e-03, MAE_E_per_atom=26.8517 meV, MAE_F=51.6594 meV / A, MAE_stress_per_atom=0.1296 meV / A^3 2024-07-09 03:07:14.899 INFO: Epoch 73: loss=5.9896e-03, MAE_E_per_atom=26.7825 meV, MAE_F=51.9382 meV / A, MAE_stress_per_atom=0.1287 meV / A^3 2024-07-09 06:53:49.068 INFO: Process group initialized: True 2024-07-09 06:53:49.070 INFO: Processes: 80 2024-07-09 06:53:49.070 INFO: MACE version: 0.3.0 2024-07-09 06:53:49.070 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 06:53:49.070 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 06:53:49.071 INFO: Using statistics json file 2024-07-09 06:53:49.071 INFO: Using atomic numbers from statistics file 2024-07-09 06:53:49.071 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 06:53:49.072 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 06:53:49.072 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 06:54:20.772 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 06:54:20.774 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 06:54:20.774 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 06:54:20.775 INFO: Building model 2024-07-09 06:54:20.776 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 06:54:22.973 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 06:54:22.975 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-73.pt 2024-07-09 06:54:23.196 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 06:54:23.203 INFO: Number of parameters: 4688656 2024-07-09 06:54:23.203 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 06:54:23.203 INFO: Using Weights and Biases for logging 2024-07-09 06:54:37.338 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 06:54:37.338 INFO: Started training 2024-07-09 06:54:43.541 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.541 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.541 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.542 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.541 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.542 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.545 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.546 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.560 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.560 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.560 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.559 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.560 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 06:54:43.562 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 07:13:12.368 INFO: Epoch 73: loss=6.0083e-03, MAE_E_per_atom=27.0388 meV, MAE_F=52.2224 meV / A, MAE_stress_per_atom=0.1297 meV / A^3 2024-07-09 07:23:26.627 INFO: Epoch 74: loss=5.9531e-03, MAE_E_per_atom=27.0506 meV, MAE_F=51.8175 meV / A, MAE_stress_per_atom=0.1304 meV / A^3 2024-07-09 07:33:29.083 INFO: Epoch 75: loss=5.9650e-03, MAE_E_per_atom=26.4968 meV, MAE_F=52.0224 meV / A, MAE_stress_per_atom=0.1287 meV / A^3 2024-07-09 07:43:27.598 INFO: Epoch 76: loss=5.9653e-03, MAE_E_per_atom=27.1852 meV, MAE_F=51.8529 meV / A, MAE_stress_per_atom=0.1292 meV / A^3 2024-07-09 07:53:26.264 INFO: Epoch 77: loss=6.0032e-03, MAE_E_per_atom=27.0052 meV, MAE_F=52.0379 meV / A, MAE_stress_per_atom=0.1329 meV / A^3 2024-07-09 08:03:26.981 INFO: Epoch 78: loss=5.9212e-03, MAE_E_per_atom=26.6626 meV, MAE_F=51.9571 meV / A, MAE_stress_per_atom=0.1289 meV / A^3 2024-07-09 08:13:27.610 INFO: Epoch 79: loss=5.8849e-03, MAE_E_per_atom=26.4742 meV, MAE_F=51.3225 meV / A, MAE_stress_per_atom=0.1304 meV / A^3 2024-07-09 08:23:28.843 INFO: Epoch 80: loss=5.8021e-03, MAE_E_per_atom=26.6254 meV, MAE_F=51.1046 meV / A, MAE_stress_per_atom=0.1257 meV / A^3 2024-07-09 08:33:27.893 INFO: Epoch 81: loss=5.8287e-03, MAE_E_per_atom=26.4535 meV, MAE_F=51.3692 meV / A, MAE_stress_per_atom=0.1269 meV / A^3 2024-07-09 08:43:29.657 INFO: Epoch 82: loss=5.8497e-03, MAE_E_per_atom=26.9415 meV, MAE_F=51.5236 meV / A, MAE_stress_per_atom=0.1207 meV / A^3 2024-07-09 09:09:12.557 INFO: Process group initialized: True 2024-07-09 09:09:12.559 INFO: Processes: 80 2024-07-09 09:09:12.559 INFO: MACE version: 0.3.0 2024-07-09 09:09:12.560 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 09:09:12.560 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 09:09:12.560 INFO: Using statistics json file 2024-07-09 09:09:12.560 INFO: Using atomic numbers from statistics file 2024-07-09 09:09:12.560 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 09:09:12.560 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 09:09:12.561 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 09:09:44.608 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 09:09:44.610 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 09:09:44.610 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 09:09:44.610 INFO: Building model 2024-07-09 09:09:44.611 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 09:09:46.841 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 09:09:46.843 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-82.pt 2024-07-09 09:09:47.060 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 09:09:47.066 INFO: Number of parameters: 4688656 2024-07-09 09:09:47.066 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 09:09:47.066 INFO: Using Weights and Biases for logging 2024-07-09 09:10:01.070 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 09:10:01.070 INFO: Started training 2024-07-09 09:10:07.698 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.698 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.698 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.698 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.721 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.727 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.727 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.728 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.729 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:10:07.730 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 09:28:33.999 INFO: Epoch 82: loss=5.8847e-03, MAE_E_per_atom=27.0115 meV, MAE_F=51.7114 meV / A, MAE_stress_per_atom=0.1207 meV / A^3 2024-07-09 09:38:34.987 INFO: Epoch 83: loss=5.9336e-03, MAE_E_per_atom=26.4335 meV, MAE_F=51.1304 meV / A, MAE_stress_per_atom=0.1267 meV / A^3 2024-07-09 09:48:33.531 INFO: Epoch 84: loss=5.8144e-03, MAE_E_per_atom=26.2527 meV, MAE_F=51.1868 meV / A, MAE_stress_per_atom=0.1264 meV / A^3 2024-07-09 09:58:34.391 INFO: Epoch 85: loss=5.7858e-03, MAE_E_per_atom=26.0224 meV, MAE_F=51.1171 meV / A, MAE_stress_per_atom=0.1245 meV / A^3 2024-07-09 10:08:32.280 INFO: Epoch 86: loss=5.7856e-03, MAE_E_per_atom=25.9614 meV, MAE_F=50.8774 meV / A, MAE_stress_per_atom=0.1271 meV / A^3 2024-07-09 10:18:29.314 INFO: Epoch 87: loss=5.8707e-03, MAE_E_per_atom=26.0882 meV, MAE_F=51.0444 meV / A, MAE_stress_per_atom=0.1264 meV / A^3 2024-07-09 10:28:29.562 INFO: Epoch 88: loss=5.8252e-03, MAE_E_per_atom=25.6051 meV, MAE_F=51.1670 meV / A, MAE_stress_per_atom=0.1240 meV / A^3 2024-07-09 10:38:27.667 INFO: Epoch 89: loss=5.8035e-03, MAE_E_per_atom=25.7613 meV, MAE_F=50.9778 meV / A, MAE_stress_per_atom=0.1238 meV / A^3 2024-07-09 10:48:29.631 INFO: Epoch 90: loss=5.9433e-03, MAE_E_per_atom=26.4600 meV, MAE_F=51.3709 meV / A, MAE_stress_per_atom=0.1303 meV / A^3 2024-07-09 10:58:30.642 INFO: Epoch 91: loss=5.8404e-03, MAE_E_per_atom=26.5232 meV, MAE_F=50.8568 meV / A, MAE_stress_per_atom=0.1261 meV / A^3 2024-07-09 11:58:58.217 INFO: Process group initialized: True 2024-07-09 11:58:58.218 INFO: Processes: 80 2024-07-09 11:58:58.218 INFO: MACE version: 0.3.0 2024-07-09 11:58:58.219 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 11:58:58.219 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 11:58:58.219 INFO: Using statistics json file 2024-07-09 11:58:58.219 INFO: Using atomic numbers from statistics file 2024-07-09 11:58:58.219 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 11:58:58.219 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 11:58:58.220 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 11:59:29.677 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 11:59:29.680 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 11:59:29.680 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 11:59:29.680 INFO: Building model 2024-07-09 11:59:29.681 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 11:59:31.903 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 11:59:31.906 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-91.pt 2024-07-09 11:59:32.127 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 11:59:32.132 INFO: Number of parameters: 4688656 2024-07-09 11:59:32.133 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 11:59:32.133 INFO: Using Weights and Biases for logging 2024-07-09 11:59:49.405 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 11:59:49.406 INFO: Started training 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.936 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.937 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 11:59:55.938 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 12:18:12.498 INFO: Epoch 91: loss=5.7848e-03, MAE_E_per_atom=26.3110 meV, MAE_F=50.8058 meV / A, MAE_stress_per_atom=0.1230 meV / A^3 2024-07-09 12:28:18.048 INFO: Epoch 92: loss=5.7805e-03, MAE_E_per_atom=26.2537 meV, MAE_F=50.4783 meV / A, MAE_stress_per_atom=0.1228 meV / A^3 2024-07-09 12:38:20.423 INFO: Epoch 93: loss=5.8093e-03, MAE_E_per_atom=26.2371 meV, MAE_F=50.9029 meV / A, MAE_stress_per_atom=0.1229 meV / A^3 2024-07-09 12:48:22.321 INFO: Epoch 94: loss=5.7513e-03, MAE_E_per_atom=25.8989 meV, MAE_F=50.7677 meV / A, MAE_stress_per_atom=0.1192 meV / A^3 2024-07-09 12:58:20.694 INFO: Epoch 95: loss=5.7421e-03, MAE_E_per_atom=25.8444 meV, MAE_F=50.6317 meV / A, MAE_stress_per_atom=0.1238 meV / A^3 2024-07-09 13:08:21.029 INFO: Epoch 96: loss=5.7463e-03, MAE_E_per_atom=26.2087 meV, MAE_F=50.3935 meV / A, MAE_stress_per_atom=0.1229 meV / A^3 2024-07-09 13:18:21.134 INFO: Epoch 97: loss=5.7255e-03, MAE_E_per_atom=25.9672 meV, MAE_F=50.4724 meV / A, MAE_stress_per_atom=0.1221 meV / A^3 2024-07-09 13:28:21.660 INFO: Epoch 98: loss=5.7243e-03, MAE_E_per_atom=25.7959 meV, MAE_F=50.2609 meV / A, MAE_stress_per_atom=0.1202 meV / A^3 2024-07-09 13:38:23.464 INFO: Epoch 99: loss=5.7378e-03, MAE_E_per_atom=26.1411 meV, MAE_F=50.2235 meV / A, MAE_stress_per_atom=0.1255 meV / A^3 2024-07-09 13:48:26.176 INFO: Epoch 100: loss=5.6904e-03, MAE_E_per_atom=25.6813 meV, MAE_F=50.0939 meV / A, MAE_stress_per_atom=0.1234 meV / A^3 2024-07-09 15:47:21.920 INFO: Process group initialized: True 2024-07-09 15:47:21.922 INFO: Processes: 80 2024-07-09 15:47:21.922 INFO: MACE version: 0.3.0 2024-07-09 15:47:21.922 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 15:47:21.922 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 15:47:21.922 INFO: Using statistics json file 2024-07-09 15:47:21.922 INFO: Using atomic numbers from statistics file 2024-07-09 15:47:21.923 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 15:47:21.923 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 15:47:21.923 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 15:47:53.865 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 15:47:53.868 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 15:47:53.868 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 15:47:53.868 INFO: Building model 2024-07-09 15:47:53.869 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 15:47:56.097 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 15:47:56.100 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-100.pt 2024-07-09 15:47:56.317 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 15:47:56.323 INFO: Number of parameters: 4688656 2024-07-09 15:47:56.323 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 15:47:56.323 INFO: Using Weights and Biases for logging 2024-07-09 15:48:17.116 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 15:48:17.117 INFO: Started training 2024-07-09 15:48:23.372 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.372 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.372 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.372 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.385 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.386 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.392 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.393 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.394 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 15:48:23.395 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 16:07:21.845 INFO: Epoch 100: loss=5.7034e-03, MAE_E_per_atom=25.4512 meV, MAE_F=50.2126 meV / A, MAE_stress_per_atom=0.1241 meV / A^3 2024-07-09 16:17:27.432 INFO: Epoch 101: loss=5.7176e-03, MAE_E_per_atom=25.7088 meV, MAE_F=49.7478 meV / A, MAE_stress_per_atom=0.1219 meV / A^3 2024-07-09 16:27:28.094 INFO: Epoch 102: loss=5.6645e-03, MAE_E_per_atom=25.9123 meV, MAE_F=49.8597 meV / A, MAE_stress_per_atom=0.1181 meV / A^3 2024-07-09 16:37:31.130 INFO: Epoch 103: loss=5.9876e-03, MAE_E_per_atom=26.8068 meV, MAE_F=52.3826 meV / A, MAE_stress_per_atom=0.1214 meV / A^3 2024-07-09 16:47:35.446 INFO: Epoch 104: loss=6.1750e-03, MAE_E_per_atom=27.4600 meV, MAE_F=54.2113 meV / A, MAE_stress_per_atom=0.1194 meV / A^3 2024-07-09 16:57:38.436 INFO: Epoch 105: loss=5.9335e-03, MAE_E_per_atom=26.5963 meV, MAE_F=51.1565 meV / A, MAE_stress_per_atom=0.1213 meV / A^3 2024-07-09 17:07:43.550 INFO: Epoch 106: loss=5.7768e-03, MAE_E_per_atom=26.6783 meV, MAE_F=50.3482 meV / A, MAE_stress_per_atom=0.1213 meV / A^3 2024-07-09 17:17:44.273 INFO: Epoch 107: loss=5.7546e-03, MAE_E_per_atom=26.2097 meV, MAE_F=49.6599 meV / A, MAE_stress_per_atom=0.1202 meV / A^3 2024-07-09 17:27:46.336 INFO: Epoch 108: loss=5.7579e-03, MAE_E_per_atom=26.1997 meV, MAE_F=50.0296 meV / A, MAE_stress_per_atom=0.1200 meV / A^3 2024-07-09 17:37:48.232 INFO: Epoch 109: loss=5.7228e-03, MAE_E_per_atom=25.5717 meV, MAE_F=49.5679 meV / A, MAE_stress_per_atom=0.1240 meV / A^3 2024-07-09 22:21:31.191 INFO: Process group initialized: True 2024-07-09 22:21:31.193 INFO: Processes: 80 2024-07-09 22:21:31.193 INFO: MACE version: 0.3.0 2024-07-09 22:21:31.193 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-09 22:21:31.193 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-09 22:21:31.194 INFO: Using statistics json file 2024-07-09 22:21:31.194 INFO: Using atomic numbers from statistics file 2024-07-09 22:21:31.194 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-09 22:21:31.194 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-09 22:21:31.195 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-09 22:22:05.663 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-09 22:22:05.666 INFO: Average number of neighbors: 61.964672446250916 2024-07-09 22:22:05.666 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-09 22:22:05.666 INFO: Building model 2024-07-09 22:22:05.667 INFO: Hidden irreps: 128x0e+128x1o 2024-07-09 22:22:07.860 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-09 22:22:07.863 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-109.pt 2024-07-09 22:22:08.091 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-09 22:22:08.096 INFO: Number of parameters: 4688656 2024-07-09 22:22:08.097 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0032 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0032 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0032 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0032 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0032 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-09 22:22:08.097 INFO: Using Weights and Biases for logging 2024-07-09 22:22:25.293 INFO: Using gradient clipping with tolerance=100.000 2024-07-09 22:22:25.294 INFO: Started training 2024-07-09 22:22:31.719 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.719 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.720 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.731 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.732 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.734 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:22:31.736 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-09 22:40:49.077 INFO: Epoch 109: loss=5.7324e-03, MAE_E_per_atom=25.5345 meV, MAE_F=49.6925 meV / A, MAE_stress_per_atom=0.1220 meV / A^3 2024-07-09 22:50:52.543 INFO: Epoch 110: loss=5.6599e-03, MAE_E_per_atom=25.4750 meV, MAE_F=49.1913 meV / A, MAE_stress_per_atom=0.1246 meV / A^3 2024-07-09 23:00:52.720 INFO: Epoch 111: loss=5.6823e-03, MAE_E_per_atom=25.5761 meV, MAE_F=49.3494 meV / A, MAE_stress_per_atom=0.1197 meV / A^3 2024-07-09 23:10:51.033 INFO: Epoch 112: loss=5.7017e-03, MAE_E_per_atom=25.5762 meV, MAE_F=49.7760 meV / A, MAE_stress_per_atom=0.1194 meV / A^3 2024-07-09 23:20:52.957 INFO: Epoch 113: loss=5.7659e-03, MAE_E_per_atom=25.3761 meV, MAE_F=49.8041 meV / A, MAE_stress_per_atom=0.1239 meV / A^3 2024-07-09 23:30:55.397 INFO: Epoch 114: loss=5.6921e-03, MAE_E_per_atom=25.0454 meV, MAE_F=49.3711 meV / A, MAE_stress_per_atom=0.1188 meV / A^3 2024-07-09 23:40:52.928 INFO: Epoch 115: loss=5.7213e-03, MAE_E_per_atom=25.1792 meV, MAE_F=49.4671 meV / A, MAE_stress_per_atom=0.1277 meV / A^3 2024-07-09 23:50:50.678 INFO: Epoch 116: loss=5.6729e-03, MAE_E_per_atom=25.5875 meV, MAE_F=49.5092 meV / A, MAE_stress_per_atom=0.1179 meV / A^3 2024-07-10 00:00:48.022 INFO: Epoch 117: loss=5.6711e-03, MAE_E_per_atom=24.8345 meV, MAE_F=49.4315 meV / A, MAE_stress_per_atom=0.1185 meV / A^3 2024-07-10 00:10:46.615 INFO: Epoch 118: loss=5.6314e-03, MAE_E_per_atom=24.7852 meV, MAE_F=49.1545 meV / A, MAE_stress_per_atom=0.1220 meV / A^3 2024-07-10 02:07:45.219 INFO: Process group initialized: True 2024-07-10 02:07:45.221 INFO: Processes: 80 2024-07-10 02:07:45.221 INFO: MACE version: 0.3.0 2024-07-10 02:07:45.221 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 02:07:45.222 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 02:07:45.222 INFO: Using statistics json file 2024-07-10 02:07:45.222 INFO: Using atomic numbers from statistics file 2024-07-10 02:07:45.222 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 02:07:45.222 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 02:07:45.223 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 02:08:17.217 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 02:08:17.219 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 02:08:17.219 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 02:08:17.219 INFO: Building model 2024-07-10 02:08:17.220 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 02:08:19.424 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 02:08:19.426 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-118.pt 2024-07-10 02:08:19.650 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 02:08:19.656 INFO: Number of parameters: 4688656 2024-07-10 02:08:19.656 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 02:08:19.657 INFO: Using Weights and Biases for logging 2024-07-10 02:08:33.230 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 02:08:33.230 INFO: Started training 2024-07-10 02:08:39.894 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.894 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.894 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.894 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.928 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.928 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.928 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.928 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.929 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.930 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:08:39.931 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 02:26:54.210 INFO: Epoch 118: loss=5.6327e-03, MAE_E_per_atom=24.7679 meV, MAE_F=49.2374 meV / A, MAE_stress_per_atom=0.1259 meV / A^3 2024-07-10 02:37:05.504 INFO: Epoch 119: loss=5.6769e-03, MAE_E_per_atom=25.0451 meV, MAE_F=49.4403 meV / A, MAE_stress_per_atom=0.1150 meV / A^3 2024-07-10 02:47:08.533 INFO: Epoch 120: loss=5.6721e-03, MAE_E_per_atom=25.2207 meV, MAE_F=49.2650 meV / A, MAE_stress_per_atom=0.1141 meV / A^3 2024-07-10 02:57:08.811 INFO: Epoch 121: loss=5.6889e-03, MAE_E_per_atom=25.0819 meV, MAE_F=49.0803 meV / A, MAE_stress_per_atom=0.1153 meV / A^3 2024-07-10 03:07:08.103 INFO: Epoch 122: loss=5.6125e-03, MAE_E_per_atom=24.9912 meV, MAE_F=48.7956 meV / A, MAE_stress_per_atom=0.1158 meV / A^3 2024-07-10 03:17:07.874 INFO: Epoch 123: loss=5.6586e-03, MAE_E_per_atom=24.8593 meV, MAE_F=48.9651 meV / A, MAE_stress_per_atom=0.1166 meV / A^3 2024-07-10 03:27:10.650 INFO: Epoch 124: loss=5.6208e-03, MAE_E_per_atom=24.9527 meV, MAE_F=48.9533 meV / A, MAE_stress_per_atom=0.1146 meV / A^3 2024-07-10 03:37:11.550 INFO: Epoch 125: loss=5.6747e-03, MAE_E_per_atom=25.0541 meV, MAE_F=48.9886 meV / A, MAE_stress_per_atom=0.1208 meV / A^3 2024-07-10 03:47:11.496 INFO: Epoch 126: loss=5.6821e-03, MAE_E_per_atom=25.0197 meV, MAE_F=48.6936 meV / A, MAE_stress_per_atom=0.1159 meV / A^3 2024-07-10 03:57:14.376 INFO: Epoch 127: loss=5.6816e-03, MAE_E_per_atom=24.9288 meV, MAE_F=48.8223 meV / A, MAE_stress_per_atom=0.1189 meV / A^3 2024-07-10 05:42:33.806 INFO: Process group initialized: True 2024-07-10 05:42:33.808 INFO: Processes: 80 2024-07-10 05:42:33.809 INFO: MACE version: 0.3.0 2024-07-10 05:42:33.809 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 05:42:33.809 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 05:42:33.810 INFO: Using statistics json file 2024-07-10 05:42:33.810 INFO: Using atomic numbers from statistics file 2024-07-10 05:42:33.810 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 05:42:33.810 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 05:42:33.811 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 05:43:05.180 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 05:43:05.182 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 05:43:05.182 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 05:43:05.182 INFO: Building model 2024-07-10 05:43:05.183 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 05:43:07.393 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 05:43:07.395 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-127.pt 2024-07-10 05:43:07.632 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 05:43:07.638 INFO: Number of parameters: 4688656 2024-07-10 05:43:07.638 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00256 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 05:43:07.638 INFO: Using Weights and Biases for logging 2024-07-10 05:43:22.330 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 05:43:22.331 INFO: Started training 2024-07-10 05:43:28.706 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.706 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.706 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.706 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.710 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.721 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.723 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.723 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.715 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.723 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.716 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.722 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 05:43:28.723 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 06:01:33.973 INFO: Epoch 127: loss=5.6536e-03, MAE_E_per_atom=24.8778 meV, MAE_F=48.6840 meV / A, MAE_stress_per_atom=0.1233 meV / A^3 2024-07-10 06:11:45.798 INFO: Epoch 128: loss=5.6596e-03, MAE_E_per_atom=24.9901 meV, MAE_F=48.9441 meV / A, MAE_stress_per_atom=0.1211 meV / A^3 2024-07-10 06:21:49.207 INFO: Epoch 129: loss=5.6498e-03, MAE_E_per_atom=24.9033 meV, MAE_F=49.0704 meV / A, MAE_stress_per_atom=0.1168 meV / A^3 2024-07-10 06:31:49.275 INFO: Epoch 130: loss=5.6048e-03, MAE_E_per_atom=24.4911 meV, MAE_F=48.7623 meV / A, MAE_stress_per_atom=0.1165 meV / A^3 2024-07-10 06:41:49.590 INFO: Epoch 131: loss=5.6872e-03, MAE_E_per_atom=24.8761 meV, MAE_F=49.0479 meV / A, MAE_stress_per_atom=0.1170 meV / A^3 2024-07-10 06:51:49.656 INFO: Epoch 132: loss=5.5939e-03, MAE_E_per_atom=24.8613 meV, MAE_F=48.5537 meV / A, MAE_stress_per_atom=0.1177 meV / A^3 2024-07-10 07:01:49.182 INFO: Epoch 133: loss=5.6507e-03, MAE_E_per_atom=24.6524 meV, MAE_F=48.7507 meV / A, MAE_stress_per_atom=0.1157 meV / A^3 2024-07-10 07:11:48.799 INFO: Epoch 134: loss=5.6134e-03, MAE_E_per_atom=24.5261 meV, MAE_F=48.5148 meV / A, MAE_stress_per_atom=0.1168 meV / A^3 2024-07-10 07:21:51.596 INFO: Epoch 135: loss=5.5655e-03, MAE_E_per_atom=24.5799 meV, MAE_F=48.6984 meV / A, MAE_stress_per_atom=0.1145 meV / A^3 2024-07-10 07:31:54.388 INFO: Epoch 136: loss=5.6114e-03, MAE_E_per_atom=24.5646 meV, MAE_F=48.7893 meV / A, MAE_stress_per_atom=0.1170 meV / A^3 2024-07-10 10:29:32.617 INFO: Process group initialized: True 2024-07-10 10:29:32.619 INFO: Processes: 80 2024-07-10 10:29:32.619 INFO: MACE version: 0.3.0 2024-07-10 10:29:32.619 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 10:29:32.619 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 10:29:32.620 INFO: Using statistics json file 2024-07-10 10:29:32.620 INFO: Using atomic numbers from statistics file 2024-07-10 10:29:32.620 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 10:29:32.620 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 10:29:32.621 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 10:30:05.385 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 10:30:05.386 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 10:30:05.386 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 10:30:05.386 INFO: Building model 2024-07-10 10:30:05.387 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 10:30:07.616 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 10:30:07.619 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-136.pt 2024-07-10 10:30:07.844 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 10:30:07.850 INFO: Number of parameters: 4688656 2024-07-10 10:30:07.850 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0020480000000000003 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0020480000000000003 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0020480000000000003 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0020480000000000003 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0020480000000000003 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 10:30:07.850 INFO: Using Weights and Biases for logging 2024-07-10 10:30:27.858 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 10:30:27.858 INFO: Started training 2024-07-10 10:30:34.100 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.100 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.107 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.107 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.107 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.106 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.107 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.111 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:30:34.112 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 10:49:11.312 INFO: Epoch 136: loss=5.5826e-03, MAE_E_per_atom=24.6330 meV, MAE_F=48.7089 meV / A, MAE_stress_per_atom=0.1150 meV / A^3 2024-07-10 10:59:16.460 INFO: Epoch 137: loss=5.5907e-03, MAE_E_per_atom=24.4412 meV, MAE_F=48.5009 meV / A, MAE_stress_per_atom=0.1181 meV / A^3 2024-07-10 11:09:18.382 INFO: Epoch 138: loss=5.5807e-03, MAE_E_per_atom=24.6238 meV, MAE_F=48.5006 meV / A, MAE_stress_per_atom=0.1131 meV / A^3 2024-07-10 11:19:14.192 INFO: Epoch 139: loss=5.5916e-03, MAE_E_per_atom=24.4059 meV, MAE_F=48.5364 meV / A, MAE_stress_per_atom=0.1167 meV / A^3 2024-07-10 11:29:14.952 INFO: Epoch 140: loss=5.6047e-03, MAE_E_per_atom=24.5325 meV, MAE_F=48.5843 meV / A, MAE_stress_per_atom=0.1152 meV / A^3 2024-07-10 11:39:17.494 INFO: Epoch 141: loss=5.5853e-03, MAE_E_per_atom=24.1878 meV, MAE_F=48.7176 meV / A, MAE_stress_per_atom=0.1169 meV / A^3 2024-07-10 11:49:15.658 INFO: Epoch 142: loss=5.5425e-03, MAE_E_per_atom=24.1020 meV, MAE_F=48.2665 meV / A, MAE_stress_per_atom=0.1155 meV / A^3 2024-07-10 11:59:16.904 INFO: Epoch 143: loss=5.5516e-03, MAE_E_per_atom=24.2405 meV, MAE_F=48.1252 meV / A, MAE_stress_per_atom=0.1156 meV / A^3 2024-07-10 12:09:15.715 INFO: Epoch 144: loss=5.5791e-03, MAE_E_per_atom=23.9553 meV, MAE_F=48.3370 meV / A, MAE_stress_per_atom=0.1166 meV / A^3 2024-07-10 12:19:14.923 INFO: Epoch 145: loss=5.5758e-03, MAE_E_per_atom=24.3275 meV, MAE_F=48.3738 meV / A, MAE_stress_per_atom=0.1168 meV / A^3 2024-07-10 13:00:48.916 INFO: Process group initialized: True 2024-07-10 13:00:48.918 INFO: Processes: 80 2024-07-10 13:00:48.918 INFO: MACE version: 0.3.0 2024-07-10 13:00:48.918 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 13:00:48.918 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 13:00:48.918 INFO: Using statistics json file 2024-07-10 13:00:48.919 INFO: Using atomic numbers from statistics file 2024-07-10 13:00:48.919 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 13:00:48.919 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 13:00:48.919 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 13:01:20.657 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 13:01:20.660 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 13:01:20.660 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 13:01:20.660 INFO: Building model 2024-07-10 13:01:20.661 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 13:01:22.883 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 13:01:22.886 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-145.pt 2024-07-10 13:01:23.108 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 13:01:23.114 INFO: Number of parameters: 4688656 2024-07-10 13:01:23.115 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0016384000000000004 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0016384000000000004 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0016384000000000004 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0016384000000000004 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0016384000000000004 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 13:01:23.115 INFO: Using Weights and Biases for logging 2024-07-10 13:01:58.185 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 13:01:58.186 INFO: Started training 2024-07-10 13:02:05.287 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.287 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.287 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.287 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.319 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.319 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.319 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.319 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.327 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.327 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.327 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.329 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.324 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.325 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.327 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.328 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:02:05.329 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 13:20:31.851 INFO: Epoch 145: loss=5.5456e-03, MAE_E_per_atom=24.1097 meV, MAE_F=48.3229 meV / A, MAE_stress_per_atom=0.1162 meV / A^3 2024-07-10 13:30:44.301 INFO: Epoch 146: loss=5.6314e-03, MAE_E_per_atom=24.2282 meV, MAE_F=48.4467 meV / A, MAE_stress_per_atom=0.1250 meV / A^3 2024-07-10 13:40:43.852 INFO: Epoch 147: loss=5.5498e-03, MAE_E_per_atom=24.1935 meV, MAE_F=48.1623 meV / A, MAE_stress_per_atom=0.1133 meV / A^3 2024-07-10 13:50:46.919 INFO: Epoch 148: loss=5.5635e-03, MAE_E_per_atom=24.2309 meV, MAE_F=48.0768 meV / A, MAE_stress_per_atom=0.1156 meV / A^3 2024-07-10 14:00:45.640 INFO: Epoch 149: loss=5.5618e-03, MAE_E_per_atom=24.3566 meV, MAE_F=48.2129 meV / A, MAE_stress_per_atom=0.1134 meV / A^3 2024-07-10 14:10:48.296 INFO: Epoch 150: loss=5.5786e-03, MAE_E_per_atom=24.1068 meV, MAE_F=48.1922 meV / A, MAE_stress_per_atom=0.1150 meV / A^3 2024-07-10 14:20:50.320 INFO: Epoch 151: loss=5.6131e-03, MAE_E_per_atom=23.7867 meV, MAE_F=48.4052 meV / A, MAE_stress_per_atom=0.1184 meV / A^3 2024-07-10 14:30:50.680 INFO: Epoch 152: loss=5.5397e-03, MAE_E_per_atom=24.0953 meV, MAE_F=48.0240 meV / A, MAE_stress_per_atom=0.1157 meV / A^3 2024-07-10 14:40:51.187 INFO: Epoch 153: loss=5.5929e-03, MAE_E_per_atom=24.1262 meV, MAE_F=48.1717 meV / A, MAE_stress_per_atom=0.1172 meV / A^3 2024-07-10 14:50:53.046 INFO: Epoch 154: loss=5.5422e-03, MAE_E_per_atom=23.9278 meV, MAE_F=48.0536 meV / A, MAE_stress_per_atom=0.1206 meV / A^3 2024-07-10 15:26:56.496 INFO: Process group initialized: True 2024-07-10 15:26:56.498 INFO: Processes: 80 2024-07-10 15:26:56.499 INFO: MACE version: 0.3.0 2024-07-10 15:26:56.499 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 15:26:56.499 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 15:26:56.500 INFO: Using statistics json file 2024-07-10 15:26:56.500 INFO: Using atomic numbers from statistics file 2024-07-10 15:26:56.500 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 15:26:56.500 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 15:26:56.501 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 15:27:29.177 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 15:27:29.179 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 15:27:29.179 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 15:27:29.179 INFO: Building model 2024-07-10 15:27:29.180 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 15:27:31.406 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 15:27:31.409 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-154.pt 2024-07-10 15:27:31.637 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 15:27:31.643 INFO: Number of parameters: 4688656 2024-07-10 15:27:31.643 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0013107200000000005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0013107200000000005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0013107200000000005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0013107200000000005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0013107200000000005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 15:27:31.643 INFO: Using Weights and Biases for logging 2024-07-10 15:27:47.580 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 15:27:47.581 INFO: Started training 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.168 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.168 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.168 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.165 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.166 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.167 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:27:54.168 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 15:46:27.708 INFO: Epoch 154: loss=5.5463e-03, MAE_E_per_atom=23.8494 meV, MAE_F=48.0900 meV / A, MAE_stress_per_atom=0.1208 meV / A^3 2024-07-10 15:56:29.553 INFO: Epoch 155: loss=5.5140e-03, MAE_E_per_atom=23.7832 meV, MAE_F=47.8848 meV / A, MAE_stress_per_atom=0.1171 meV / A^3 2024-07-10 16:06:32.646 INFO: Epoch 156: loss=5.5548e-03, MAE_E_per_atom=24.0481 meV, MAE_F=48.0752 meV / A, MAE_stress_per_atom=0.1165 meV / A^3 2024-07-10 16:16:35.212 INFO: Epoch 157: loss=5.5360e-03, MAE_E_per_atom=23.7996 meV, MAE_F=48.2100 meV / A, MAE_stress_per_atom=0.1158 meV / A^3 2024-07-10 16:26:39.184 INFO: Epoch 158: loss=5.5488e-03, MAE_E_per_atom=23.8378 meV, MAE_F=48.1101 meV / A, MAE_stress_per_atom=0.1144 meV / A^3 2024-07-10 16:36:43.200 INFO: Epoch 159: loss=5.5548e-03, MAE_E_per_atom=24.1975 meV, MAE_F=47.9817 meV / A, MAE_stress_per_atom=0.1153 meV / A^3 2024-07-10 16:46:43.905 INFO: Epoch 160: loss=5.5601e-03, MAE_E_per_atom=23.9313 meV, MAE_F=48.0361 meV / A, MAE_stress_per_atom=0.1164 meV / A^3 2024-07-10 16:56:46.208 INFO: Epoch 161: loss=5.5738e-03, MAE_E_per_atom=23.8018 meV, MAE_F=48.1518 meV / A, MAE_stress_per_atom=0.1173 meV / A^3 2024-07-10 17:06:48.047 INFO: Epoch 162: loss=5.5455e-03, MAE_E_per_atom=23.5977 meV, MAE_F=48.1136 meV / A, MAE_stress_per_atom=0.1166 meV / A^3 2024-07-10 17:16:48.957 INFO: Epoch 163: loss=5.5451e-03, MAE_E_per_atom=23.7234 meV, MAE_F=48.0821 meV / A, MAE_stress_per_atom=0.1143 meV / A^3 2024-07-10 17:36:57.731 INFO: Process group initialized: True 2024-07-10 17:36:57.733 INFO: Processes: 80 2024-07-10 17:36:57.733 INFO: MACE version: 0.3.0 2024-07-10 17:36:57.733 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 17:36:57.733 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 17:36:57.734 INFO: Using statistics json file 2024-07-10 17:36:57.734 INFO: Using atomic numbers from statistics file 2024-07-10 17:36:57.735 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 17:36:57.735 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 17:36:57.735 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 17:37:28.638 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 17:37:28.640 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 17:37:28.640 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 17:37:28.640 INFO: Building model 2024-07-10 17:37:28.641 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 17:37:30.882 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 17:37:30.885 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-163.pt 2024-07-10 17:37:31.109 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 17:37:31.116 INFO: Number of parameters: 4688656 2024-07-10 17:37:31.116 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0010485760000000005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0010485760000000005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0010485760000000005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0010485760000000005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0010485760000000005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 17:37:31.116 INFO: Using Weights and Biases for logging 2024-07-10 17:37:47.763 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 17:37:47.763 INFO: Started training 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.258 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.259 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:37:54.260 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 17:56:24.484 INFO: Epoch 163: loss=5.5644e-03, MAE_E_per_atom=23.7675 meV, MAE_F=48.1193 meV / A, MAE_stress_per_atom=0.1144 meV / A^3 2024-07-10 18:06:41.672 INFO: Epoch 164: loss=5.5742e-03, MAE_E_per_atom=23.8390 meV, MAE_F=47.9803 meV / A, MAE_stress_per_atom=0.1170 meV / A^3 2024-07-10 18:16:46.010 INFO: Epoch 165: loss=5.5436e-03, MAE_E_per_atom=23.5808 meV, MAE_F=48.1031 meV / A, MAE_stress_per_atom=0.1173 meV / A^3 2024-07-10 18:26:47.976 INFO: Epoch 166: loss=5.5490e-03, MAE_E_per_atom=23.6378 meV, MAE_F=48.0492 meV / A, MAE_stress_per_atom=0.1149 meV / A^3 2024-07-10 18:36:54.376 INFO: Epoch 167: loss=5.5222e-03, MAE_E_per_atom=23.5473 meV, MAE_F=47.8710 meV / A, MAE_stress_per_atom=0.1158 meV / A^3 2024-07-10 18:46:57.225 INFO: Epoch 168: loss=5.4811e-03, MAE_E_per_atom=23.4484 meV, MAE_F=47.6853 meV / A, MAE_stress_per_atom=0.1157 meV / A^3 2024-07-10 18:56:58.067 INFO: Epoch 169: loss=5.5268e-03, MAE_E_per_atom=23.6167 meV, MAE_F=47.9633 meV / A, MAE_stress_per_atom=0.1175 meV / A^3 2024-07-10 19:06:59.817 INFO: Epoch 170: loss=5.5634e-03, MAE_E_per_atom=23.6286 meV, MAE_F=48.1067 meV / A, MAE_stress_per_atom=0.1179 meV / A^3 2024-07-10 19:16:57.832 INFO: Epoch 171: loss=5.5279e-03, MAE_E_per_atom=23.8437 meV, MAE_F=47.8776 meV / A, MAE_stress_per_atom=0.1148 meV / A^3 2024-07-10 19:27:00.815 INFO: Epoch 172: loss=5.5396e-03, MAE_E_per_atom=23.6387 meV, MAE_F=47.9915 meV / A, MAE_stress_per_atom=0.1149 meV / A^3 2024-07-10 21:51:48.514 INFO: Process group initialized: True 2024-07-10 21:51:48.516 INFO: Processes: 80 2024-07-10 21:51:48.516 INFO: MACE version: 0.3.0 2024-07-10 21:51:48.517 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-10 21:51:48.517 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-10 21:51:48.517 INFO: Using statistics json file 2024-07-10 21:51:48.517 INFO: Using atomic numbers from statistics file 2024-07-10 21:51:48.517 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-10 21:51:48.517 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-10 21:51:48.518 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-10 21:52:20.486 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-10 21:52:20.489 INFO: Average number of neighbors: 61.964672446250916 2024-07-10 21:52:20.489 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-10 21:52:20.489 INFO: Building model 2024-07-10 21:52:20.490 INFO: Hidden irreps: 128x0e+128x1o 2024-07-10 21:52:22.757 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-10 21:52:22.760 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-172.pt 2024-07-10 21:52:22.990 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-10 21:52:22.996 INFO: Number of parameters: 4688656 2024-07-10 21:52:22.996 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0008388608000000005 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0008388608000000005 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0008388608000000005 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0008388608000000005 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0008388608000000005 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-10 21:52:22.996 INFO: Using Weights and Biases for logging 2024-07-10 21:52:40.374 INFO: Using gradient clipping with tolerance=100.000 2024-07-10 21:52:40.374 INFO: Started training 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.749 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.749 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.740 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.749 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.741 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.748 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.749 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 21:52:46.749 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-10 22:10:27.573 INFO: Epoch 172: loss=5.5571e-03, MAE_E_per_atom=23.7270 meV, MAE_F=48.0456 meV / A, MAE_stress_per_atom=0.1158 meV / A^3 2024-07-10 22:19:40.314 INFO: Epoch 173: loss=5.5512e-03, MAE_E_per_atom=23.7061 meV, MAE_F=48.0456 meV / A, MAE_stress_per_atom=0.1176 meV / A^3 2024-07-10 22:28:51.180 INFO: Epoch 174: loss=5.5445e-03, MAE_E_per_atom=23.5883 meV, MAE_F=48.1432 meV / A, MAE_stress_per_atom=0.1174 meV / A^3 2024-07-10 22:38:01.614 INFO: Epoch 175: loss=5.5471e-03, MAE_E_per_atom=23.6778 meV, MAE_F=47.9828 meV / A, MAE_stress_per_atom=0.1147 meV / A^3 2024-07-10 22:47:10.890 INFO: Epoch 176: loss=5.5417e-03, MAE_E_per_atom=23.4919 meV, MAE_F=47.7824 meV / A, MAE_stress_per_atom=0.1172 meV / A^3 2024-07-10 22:56:24.324 INFO: Epoch 177: loss=5.5528e-03, MAE_E_per_atom=23.4025 meV, MAE_F=48.1396 meV / A, MAE_stress_per_atom=0.1165 meV / A^3 2024-07-10 23:05:36.144 INFO: Epoch 178: loss=5.4955e-03, MAE_E_per_atom=23.5371 meV, MAE_F=47.8086 meV / A, MAE_stress_per_atom=0.1147 meV / A^3 2024-07-10 23:14:45.841 INFO: Epoch 179: loss=5.5206e-03, MAE_E_per_atom=23.5637 meV, MAE_F=47.8397 meV / A, MAE_stress_per_atom=0.1166 meV / A^3 2024-07-10 23:23:55.877 INFO: Epoch 180: loss=5.4960e-03, MAE_E_per_atom=23.5244 meV, MAE_F=47.6904 meV / A, MAE_stress_per_atom=0.1155 meV / A^3 2024-07-10 23:33:04.971 INFO: Epoch 181: loss=5.5161e-03, MAE_E_per_atom=23.5269 meV, MAE_F=47.7520 meV / A, MAE_stress_per_atom=0.1149 meV / A^3 2024-07-10 23:42:16.527 INFO: Epoch 182: loss=5.5349e-03, MAE_E_per_atom=23.4546 meV, MAE_F=47.8166 meV / A, MAE_stress_per_atom=0.1164 meV / A^3 2024-07-11 02:04:26.223 INFO: Process group initialized: True 2024-07-11 02:04:26.225 INFO: Processes: 80 2024-07-11 02:04:26.225 INFO: MACE version: 0.3.0 2024-07-11 02:04:26.225 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-11 02:04:26.225 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-11 02:04:26.225 INFO: Using statistics json file 2024-07-11 02:04:26.225 INFO: Using atomic numbers from statistics file 2024-07-11 02:04:26.226 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-11 02:04:26.226 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-11 02:04:26.226 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-11 02:04:57.433 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-11 02:04:57.436 INFO: Average number of neighbors: 61.964672446250916 2024-07-11 02:04:57.436 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-11 02:04:57.436 INFO: Building model 2024-07-11 02:04:57.437 INFO: Hidden irreps: 128x0e+128x1o 2024-07-11 02:04:59.682 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-11 02:04:59.686 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-182.pt 2024-07-11 02:04:59.917 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-11 02:04:59.923 INFO: Number of parameters: 4688656 2024-07-11 02:04:59.923 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0005368709120000003 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0005368709120000003 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0005368709120000003 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0005368709120000003 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0005368709120000003 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-11 02:04:59.923 INFO: Using Weights and Biases for logging 2024-07-11 02:05:14.696 INFO: Using gradient clipping with tolerance=100.000 2024-07-11 02:05:14.697 INFO: Started training 2024-07-11 02:05:21.351 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.351 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.351 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.351 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.356 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.357 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.370 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:05:21.371 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 02:23:48.759 INFO: Epoch 182: loss=5.5392e-03, MAE_E_per_atom=23.4517 meV, MAE_F=47.8075 meV / A, MAE_stress_per_atom=0.1163 meV / A^3 2024-07-11 02:33:47.890 INFO: Epoch 183: loss=5.5414e-03, MAE_E_per_atom=23.4401 meV, MAE_F=47.9125 meV / A, MAE_stress_per_atom=0.1170 meV / A^3 2024-07-11 02:43:44.566 INFO: Epoch 184: loss=5.5319e-03, MAE_E_per_atom=23.4939 meV, MAE_F=47.7828 meV / A, MAE_stress_per_atom=0.1150 meV / A^3 2024-07-11 02:53:39.154 INFO: Epoch 185: loss=5.4963e-03, MAE_E_per_atom=23.4188 meV, MAE_F=47.7525 meV / A, MAE_stress_per_atom=0.1146 meV / A^3 2024-07-11 03:03:32.457 INFO: Epoch 186: loss=5.5259e-03, MAE_E_per_atom=23.4625 meV, MAE_F=47.8978 meV / A, MAE_stress_per_atom=0.1165 meV / A^3 2024-07-11 03:13:32.607 INFO: Epoch 187: loss=5.5128e-03, MAE_E_per_atom=23.4937 meV, MAE_F=47.7184 meV / A, MAE_stress_per_atom=0.1153 meV / A^3 2024-07-11 03:23:28.977 INFO: Epoch 188: loss=5.5040e-03, MAE_E_per_atom=23.4157 meV, MAE_F=47.7198 meV / A, MAE_stress_per_atom=0.1144 meV / A^3 2024-07-11 03:33:23.440 INFO: Epoch 189: loss=5.4973e-03, MAE_E_per_atom=23.2828 meV, MAE_F=47.7289 meV / A, MAE_stress_per_atom=0.1141 meV / A^3 2024-07-11 03:43:17.917 INFO: Epoch 190: loss=5.5201e-03, MAE_E_per_atom=23.4409 meV, MAE_F=47.8463 meV / A, MAE_stress_per_atom=0.1134 meV / A^3 2024-07-11 03:53:12.523 INFO: Epoch 191: loss=5.5031e-03, MAE_E_per_atom=23.3222 meV, MAE_F=47.6549 meV / A, MAE_stress_per_atom=0.1144 meV / A^3 2024-07-11 07:52:42.564 INFO: Process group initialized: True 2024-07-11 07:52:42.566 INFO: Processes: 80 2024-07-11 07:52:42.566 INFO: MACE version: 0.3.0 2024-07-11 07:52:42.566 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-11 07:52:42.566 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-11 07:52:42.566 INFO: Using statistics json file 2024-07-11 07:52:42.566 INFO: Using atomic numbers from statistics file 2024-07-11 07:52:42.567 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-11 07:52:42.567 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-11 07:52:42.567 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-11 07:53:14.335 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-11 07:53:14.337 INFO: Average number of neighbors: 61.964672446250916 2024-07-11 07:53:14.337 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-11 07:53:14.337 INFO: Building model 2024-07-11 07:53:14.339 INFO: Hidden irreps: 128x0e+128x1o 2024-07-11 07:53:16.582 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-11 07:53:16.586 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-191.pt 2024-07-11 07:53:16.815 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-11 07:53:16.822 INFO: Number of parameters: 4688656 2024-07-11 07:53:16.822 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0004294967296000003 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0004294967296000003 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0004294967296000003 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0004294967296000003 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0004294967296000003 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-11 07:53:16.822 INFO: Using Weights and Biases for logging 2024-07-11 07:53:31.613 INFO: Using gradient clipping with tolerance=100.000 2024-07-11 07:53:31.614 INFO: Started training 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.548 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.548 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.548 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.547 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.548 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.549 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.550 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.551 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 07:53:38.554 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 08:12:20.964 INFO: Epoch 191: loss=5.5203e-03, MAE_E_per_atom=23.3192 meV, MAE_F=47.7707 meV / A, MAE_stress_per_atom=0.1151 meV / A^3 2024-07-11 08:22:28.696 INFO: Epoch 192: loss=5.5089e-03, MAE_E_per_atom=23.3856 meV, MAE_F=47.7394 meV / A, MAE_stress_per_atom=0.1156 meV / A^3 2024-07-11 08:32:34.577 INFO: Epoch 193: loss=5.5182e-03, MAE_E_per_atom=23.3220 meV, MAE_F=47.6599 meV / A, MAE_stress_per_atom=0.1136 meV / A^3 2024-07-11 08:42:36.020 INFO: Epoch 194: loss=5.4934e-03, MAE_E_per_atom=23.4112 meV, MAE_F=47.7124 meV / A, MAE_stress_per_atom=0.1144 meV / A^3 2024-07-11 08:52:39.098 INFO: Epoch 195: loss=5.4854e-03, MAE_E_per_atom=23.3247 meV, MAE_F=47.6485 meV / A, MAE_stress_per_atom=0.1137 meV / A^3 2024-07-11 09:02:41.031 INFO: Epoch 196: loss=5.5174e-03, MAE_E_per_atom=23.3592 meV, MAE_F=47.7358 meV / A, MAE_stress_per_atom=0.1133 meV / A^3 2024-07-11 09:12:40.678 INFO: Epoch 197: loss=5.5144e-03, MAE_E_per_atom=23.3433 meV, MAE_F=47.8225 meV / A, MAE_stress_per_atom=0.1141 meV / A^3 2024-07-11 09:22:42.229 INFO: Epoch 198: loss=5.5015e-03, MAE_E_per_atom=23.4065 meV, MAE_F=47.7474 meV / A, MAE_stress_per_atom=0.1142 meV / A^3 2024-07-11 09:32:45.569 INFO: Epoch 199: loss=5.5058e-03, MAE_E_per_atom=23.4510 meV, MAE_F=47.6719 meV / A, MAE_stress_per_atom=0.1138 meV / A^3 2024-07-11 09:32:45.817 INFO: Training complete 2024-07-11 09:32:45.817 INFO: Computing metrics for training, validation, and test sets 2024-07-11 09:32:45.868 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-199.pt 2024-07-11 09:32:46.195 INFO: Loaded model from epoch 199 2024-07-11 09:32:46.196 INFO: Evaluating train ... 2024-07-11 09:36:02.617 INFO: Evaluating valid ... 2024-07-11 09:36:03.752 INFO: +-------------+--------------------+-----------------+------------------+ | config_type | MAE E / meV / atom | MAE F / meV / A | relative F MAE % | +-------------+--------------------+-----------------+------------------+ | train | 24.4 | 44.1 | 27.89 | | valid | 23.6 | 47.9 | 34.08 | +-------------+--------------------+-----------------+------------------+ 2024-07-11 09:36:03.752 INFO: Saving model to checkpoints/10-128-L1-universal-branch_run-1.model 2024-07-11 09:36:03.983 INFO: Done 2024-07-12 05:50:29.952 INFO: Process group initialized: True 2024-07-12 05:50:29.954 INFO: Processes: 80 2024-07-12 05:50:29.954 INFO: MACE version: 0.3.0 2024-07-12 05:50:29.955 INFO: Configuration: Namespace(name='10-128-L1-universal-branch', seed=1, log_dir='logs', model_dir='.', checkpoints_dir='checkpoints', results_dir='results', downloads_dir='downloads', device='cuda', default_dtype='float64', distributed=True, log_level='INFO', error_table='PerAtomMAE', model='ScaleShiftMACE', r_max=6.0, radial_type='bessel', num_radial_basis=10, num_cutoff_basis=5, interaction='RealAgnosticResidualInteractionBlock', interaction_first='RealAgnosticResidualInteractionBlock', max_ell=3, correlation=3, num_interactions=2, MLP_irreps='16x0e', radial_MLP='[64, 64, 64]', hidden_irreps='128x0e + 128x1o', num_channels=128, max_L=1, gate='silu', scaling='rms_forces_scaling', avg_num_neighbors=1, compute_avg_num_neighbors=True, compute_stress=True, compute_forces=True, train_file='../../dataset/mptrj-gga-ggapu-train', valid_file='../../dataset/mptrj-gga-ggapu-val', valid_fraction=0.1, test_file=None, test_dir=None, multi_processed_test=False, num_workers=16, pin_memory=True, atomic_numbers=None, mean=None, std=None, statistics_file='../../dataset/mptrj-gga-ggapu-statistics-isolated.json', E0s=None, energy_key='energy', forces_key='forces', virials_key='virials', stress_key='stress', dipole_key='dipole', charges_key='charges', loss='universal', forces_weight=10.0, swa_forces_weight=100.0, energy_weight=1.0, swa_energy_weight=1000.0, virials_weight=1.0, swa_virials_weight=10.0, stress_weight=100.0, swa_stress_weight=10.0, dipole_weight=1.0, swa_dipole_weight=1.0, config_type_weights='{"Default":1.0}', huber_delta=0.01, optimizer='adam', batch_size=16, valid_batch_size=32, lr=0.005, swa_lr=0.001, weight_decay=1e-08, amsgrad=True, scheduler='ReduceLROnPlateau', lr_factor=0.8, scheduler_patience=5, lr_scheduler_gamma=0.9993, swa=False, start_swa=None, ema=True, ema_decay=0.995, max_num_epochs=200, patience=50, eval_interval=1, keep_checkpoints=True, restart_latest=True, save_cpu=True, clip_grad=100.0, wandb=True, wandb_project='mace-universal', wandb_entity='astagroup', wandb_name='10-128-L1-universal-branch', wandb_log_hypers=['num_channels', 'max_L', 'correlation', 'lr', 'swa_lr', 'weight_decay', 'batch_size', 'max_num_epochs', 'start_swa', 'energy_weight', 'forces_weight']) 2024-07-12 05:50:29.955 INFO: CUDA version: 11.8, CUDA device: 0 2024-07-12 05:50:29.955 INFO: Using statistics json file 2024-07-12 05:50:29.955 INFO: Using atomic numbers from statistics file 2024-07-12 05:50:29.955 INFO: AtomicNumberTable: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 89, 90, 91, 92, 93, 94) 2024-07-12 05:50:29.955 INFO: Atomic Energies not in training file, using command line argument E0s 2024-07-12 05:50:29.956 INFO: Atomic energies: [-1.11723232, -0.00045595, -0.29734917, -0.04262353, -0.2911712, -1.26281801, -3.12555634, -1.54690765, -0.43794547, -0.01216023, -0.22858276, -0.00994627, -0.21672837, -0.82583191, -1.88719667, -0.89091719, -0.25828681, -0.0235315, -0.17827125, -0.02596217, -2.12966897, -2.40532262, -3.61232779, -5.44620624, -5.14592659, -3.30583367, -1.66614587, -0.28412403, -0.23745594, -0.01098351, -0.19854295, -0.77924665, -1.70136472, -0.78345919, -0.22687512, -0.02265396, -0.16194042, -0.02823145, -2.25679622, -2.23742918, -2.53481909, -4.60213279, -3.40289704, -1.68884293, -1.44016062, -1.47521138, -0.19840574, -0.01374787, -0.19672488, -0.67963499, -1.4302063, -0.6573123, -0.18858477, -0.01267614, -0.13452777, -0.03157029, -0.62794477, -1.43642821, -0.18584352, -0.25876002, -0.25695685, -0.2542358, -9.48606277, -8.11540027, -0.14584339, -0.19488481, -0.14569527, -0.2516406, -0.16381585, -0.25265734, -0.25255978, -3.49292389, -3.5659314, -4.57101127, -4.63436797, -2.88280809, -1.42793567, -0.50244445, -0.18479218, -0.0105212, -0.17939998, -0.63069886, -1.32462383, -0.24210133, -1.04419147, -2.03239022, -4.6443113, -7.30273499, -10.39244586] 2024-07-12 05:51:00.895 INFO: UniversalLoss(energy_weight=1.000, forces_weight=10.000, stress_weight=100.000) 2024-07-12 05:51:00.897 INFO: Average number of neighbors: 61.964672446250916 2024-07-12 05:51:00.897 INFO: Selected the following outputs: {'energy': True, 'forces': True, 'virials': False, 'stress': True, 'dipoles': False} 2024-07-12 05:51:00.897 INFO: Building model 2024-07-12 05:51:00.898 INFO: Hidden irreps: 128x0e+128x1o 2024-07-12 05:51:03.109 WARNING: No SWA checkpoint found, while SWA is enabled. Compare the swa_start parameter and the latest checkpoint. 2024-07-12 05:51:03.113 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-199.pt 2024-07-12 05:51:03.334 INFO: ScaleShiftMACE( (node_embedding): LinearNodeEmbeddingBlock( (linear): Linear(89x0e -> 128x0e | 11392 weights) ) (radial_embedding): RadialEmbeddingBlock( (bessel_fn): BesselBasis(r_max=6.0, num_basis=10, trainable=False) (cutoff_fn): PolynomialCutoff(p=5.0, r_max=6.0) ) (spherical_harmonics): SphericalHarmonics() (atomic_energies_fn): AtomicEnergiesBlock(energies=[-1.1172, -0.0005, -0.2973, -0.0426, -0.2912, -1.2628, -3.1256, -1.5469, -0.4379, -0.0122, -0.2286, -0.0099, -0.2167, -0.8258, -1.8872, -0.8909, -0.2583, -0.0235, -0.1783, -0.0260, -2.1297, -2.4053, -3.6123, -5.4462, -5.1459, -3.3058, -1.6661, -0.2841, -0.2375, -0.0110, -0.1985, -0.7792, -1.7014, -0.7835, -0.2269, -0.0227, -0.1619, -0.0282, -2.2568, -2.2374, -2.5348, -4.6021, -3.4029, -1.6888, -1.4402, -1.4752, -0.1984, -0.0137, -0.1967, -0.6796, -1.4302, -0.6573, -0.1886, -0.0127, -0.1345, -0.0316, -0.6279, -1.4364, -0.1858, -0.2588, -0.2570, -0.2542, -9.4861, -8.1154, -0.1458, -0.1949, -0.1457, -0.2516, -0.1638, -0.2527, -0.2526, -3.4929, -3.5659, -4.5710, -4.6344, -2.8828, -1.4279, -0.5024, -0.1848, -0.0105, -0.1794, -0.6307, -1.3246, -0.2421, -1.0442, -2.0324, -4.6443, -7.3027, -10.3924]) (interactions): ModuleList( (0): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e -> 128x0e | 16384 weights) (conv_tp): TensorProduct(128x0e x 1x0e+1x1o+1x2e+1x3o -> 128x0e+128x1o+128x2e+128x3o | 512 paths | 512 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 512] (linear): Linear(128x0e+128x1o+128x2e+128x3o -> 128x0e+128x1o+128x2e+128x3o | 65536 weights) (skip_tp): FullyConnectedTensorProduct(128x0e x 89x0e -> 128x0e+128x1o | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) (1): RealAgnosticResidualInteractionBlock( (linear_up): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) (conv_tp): TensorProduct(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o -> 256x0e+384x1o+384x2e+256x3o | 1280 paths | 1280 weights) (conv_tp_weights): FullyConnectedNet[10, 64, 64, 64, 1280] (linear): Linear(256x0e+384x1o+384x2e+256x3o -> 128x0e+128x1o+128x2e+128x3o | 163840 weights) (skip_tp): FullyConnectedTensorProduct(128x0e+128x1o x 89x0e -> 128x0e | 1458176 paths | 1458176 weights) (reshape): reshape_irreps() ) ) (products): ModuleList( (0): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) (1): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x6x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e+128x1o -> 128x0e+128x1o | 32768 weights) ) (1): EquivariantProductBasisBlock( (symmetric_contractions): SymmetricContraction( (contractions): ModuleList( (0): Contraction( (contractions_weighting): ModuleList( (0-1): 2 x GraphModule() ) (contractions_features): ModuleList( (0-1): 2 x GraphModule() ) (weights): ParameterList( (0): Parameter containing: [torch.float64 of size 89x4x128 (GPU 0)] (1): Parameter containing: [torch.float64 of size 89x1x128 (GPU 0)] ) (graph_opt_main): GraphModule() ) ) ) (linear): Linear(128x0e -> 128x0e | 16384 weights) ) ) (readouts): ModuleList( (0): LinearReadoutBlock( (linear): Linear(128x0e+128x1o -> 1x0e | 128 weights) ) (1): NonLinearReadoutBlock( (linear_1): Linear(128x0e -> 16x0e | 2048 weights) (non_linearity): Activation [x] (16x0e -> 16x0e) (linear_2): Linear(16x0e -> 1x0e | 16 weights) ) ) (scale_shift): ScaleShiftBlock(scale=0.804154, shift=0.164097) ) 2024-07-12 05:51:03.341 INFO: Number of parameters: 4688656 2024-07-12 05:51:03.341 INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00027487790694400024 maximize: False name: embedding weight_decay: 0.0 Parameter Group 1 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00027487790694400024 maximize: False name: interactions_decay weight_decay: 1e-08 Parameter Group 2 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00027487790694400024 maximize: False name: interactions_no_decay weight_decay: 0.0 Parameter Group 3 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00027487790694400024 maximize: False name: products weight_decay: 1e-08 Parameter Group 4 amsgrad: True betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.00027487790694400024 maximize: False name: readouts weight_decay: 0.0 ) 2024-07-12 05:51:03.341 INFO: Using Weights and Biases for logging 2024-07-12 05:51:19.528 INFO: Using gradient clipping with tolerance=100.000 2024-07-12 05:51:19.529 INFO: Started training 2024-07-12 05:51:25.927 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.927 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.927 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.927 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.940 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.956 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.956 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.954 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.955 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 05:51:25.956 INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-12 06:08:33.372 INFO: Epoch 199: loss=5.5000e-03, MAE_E_per_atom=23.3910 meV, MAE_F=47.7066 meV / A, MAE_stress_per_atom=0.1136 meV / A^3 2024-07-12 06:08:33.687 INFO: Training complete 2024-07-12 06:08:33.687 INFO: Computing metrics for training, validation, and test sets 2024-07-12 06:08:33.693 INFO: Loading checkpoint: checkpoints/10-128-L1-universal-branch_run-1_epoch-199.pt 2024-07-12 06:08:34.005 INFO: Loaded model from epoch 199 2024-07-12 06:08:34.006 INFO: Evaluating train ... 2024-07-12 06:13:24.399 INFO: Evaluating valid ... 2024-07-12 06:13:27.034 INFO: +-------------+--------------------+-----------------+------------------+ | config_type | MAE E / meV / atom | MAE F / meV / A | relative F MAE % | +-------------+--------------------+-----------------+------------------+ | train | 24.3 | 43.8 | 27.76 | | valid | 23.4 | 47.7 | 33.94 | +-------------+--------------------+-----------------+------------------+ 2024-07-12 06:13:27.034 INFO: Saving model to checkpoints/10-128-L1-universal-branch_run-1.model 2024-07-12 06:13:27.244 INFO: Done