-
Notifications
You must be signed in to change notification settings - Fork 12
Description
hello,我最近复现Predformer在Human3.6上的结果,以上是我的训练log,发现验证集loss不下降,这是正常的吗?可以请您贴下您的训练日志吗?非常感谢!
2025-11-23 10:23:45,290 - Environment info:
sys.platform: linux
Python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
CUDA available: True
CUDA_HOME: /usr
NVCC: Build cuda_12.0.r12.0/compiler.32267302_0
GPU 0: Tesla V100-SXM2-32GB
GCC: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
PyTorch: 2.9.0+cu128
PyTorch compiling details: PyTorch built with:
- GCC 13.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 12.8
- NVCC architecture flags: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_100,code=sm_100;-gencode;arch=compute_120,code=sm_120
- CuDNN 91.0.2 (built against CUDA 12.9)
- Built with CuDNN 90.8
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=0fabc3ba44823f257e70ce397d989c8de5e362c1, CUDA_VERSION=12.8, CUDNN_VERSION=9.8.0, CXX_COMPILER=/opt/rh/gcc-toolset-13/root/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF,
TorchVision: 0.24.0+cu128
OpenCV: 4.12.0
openstl: 0.3.0
2025-11-23 10:23:48,618 -
device: cuda
dist: False
display_step: 10
res_dir: work_dirs
ex_name: human/2025-11-23-10-23_PredFormer_depth3_TSST_sd0.1_dp0.1_256_8_32_lr1e-3_50ep_cos_bs8_ps8_Adamw
tb_dir: logs_tb/03_08
use_gpu: True
fp16: False
torchscript: False
seed: 42
diff_seed: False
fps: False
empty_cache: True
find_unused_parameters: False
broadcast_buffers: True
resume_from: None
auto_resume: False
test: False
inference: False
deterministic: False
launcher: none
local_rank: 0
port: 29500
batch_size: 10
val_batch_size: 16
num_workers: 4
data_root: data
dataname: human
pre_seq_length: 4
aft_seq_length: 4
total_length: 8
use_augment: False
use_prefetcher: False
drop_last: False
method: predformer
config_file: configs/human/PredFormer.py
model_type: None
drop: 0.0
overwrite: True
alpha: 0.1
top_k: 100
epoch: 50
log_step: 1
opt: adamw
opt_eps: None
opt_betas: None
momentum: 0.9
weight_decay: 0.01
clip_grad: None
clip_mode: norm
early_stop_epoch: -1
no_display_method_info: False
sched: cosine
lr: 0.001
lr_k_decay: 1.0
warmup_lr: 1e-05
min_lr: 1e-06
final_div_factor: 10000.0
warmup_epoch: 0
decay_epoch: 100
multi_decay_epoch: [20, 30, 40]
decay_rate: 0.1
filter_bias_and_bn: False
patience: 5
drop_path: 0.0
dropout: 0.0
cutoff: 0
cutmode: standard
drop_schedule: constant
model_config: {'height': 256, 'width': 256, 'num_channels': 3, 'pre_seq': 4, 'after_seq': 4, 'patch_size': 8, 'dim': 256, 'heads': 8, 'dim_head': 32, 'dropout': 0.1, 'attn_dropout': 0.1, 'drop_path': 0.1, 'scale_dim': 4, 'depth': 1, 'Ndepth': 3}
in_shape: [4, 3, 256, 256]
metrics: ['mse', 'mae', 'ssim', 'psnr', 'lpips']
2025-11-23 10:23:49,307 - Model info:
PredFormer_Model(
(to_patch_embedding): Sequential(
(0): Rearrange('b t c (h p1) (w p2) -> b t (h w) (p1 p2 c)', p1=8, p2=8)
(1): Linear(in_features=192, out_features=256, bias=True)
)
(blocks): ModuleList(
(0-2): 3 x PredFormerLayer(
(ts_temporal_transformer): GatedTransformer(
(layers): ModuleList(
(0): ModuleList(
(0): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(to_qkv): Linear(in_features=256, out_features=768, bias=False)
(to_out): Sequential(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): SwiGLU(
(fc1_g): Linear(in_features=256, out_features=1024, bias=True)
(fc1_x): Linear(in_features=256, out_features=1024, bias=True)
(act): SiLU()
(drop1): Dropout(p=0.1, inplace=False)
(norm): Identity()
(fc2): Linear(in_features=1024, out_features=256, bias=True)
(drop2): Dropout(p=0.1, inplace=False)
)
)
(2-3): 2 x DropPath(drop_prob=0.100)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(ts_space_transformer): GatedTransformer(
(layers): ModuleList(
(0): ModuleList(
(0): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(to_qkv): Linear(in_features=256, out_features=768, bias=False)
(to_out): Sequential(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): SwiGLU(
(fc1_g): Linear(in_features=256, out_features=1024, bias=True)
(fc1_x): Linear(in_features=256, out_features=1024, bias=True)
(act): SiLU()
(drop1): Dropout(p=0.1, inplace=False)
(norm): Identity()
(fc2): Linear(in_features=1024, out_features=256, bias=True)
(drop2): Dropout(p=0.1, inplace=False)
)
)
(2-3): 2 x DropPath(drop_prob=0.100)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(st_space_transformer): GatedTransformer(
(layers): ModuleList(
(0): ModuleList(
(0): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(to_qkv): Linear(in_features=256, out_features=768, bias=False)
(to_out): Sequential(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): SwiGLU(
(fc1_g): Linear(in_features=256, out_features=1024, bias=True)
(fc1_x): Linear(in_features=256, out_features=1024, bias=True)
(act): SiLU()
(drop1): Dropout(p=0.1, inplace=False)
(norm): Identity()
(fc2): Linear(in_features=1024, out_features=256, bias=True)
(drop2): Dropout(p=0.1, inplace=False)
)
)
(2-3): 2 x DropPath(drop_prob=0.100)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(st_temporal_transformer): GatedTransformer(
(layers): ModuleList(
(0): ModuleList(
(0): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(to_qkv): Linear(in_features=256, out_features=768, bias=False)
(to_out): Sequential(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(fn): SwiGLU(
(fc1_g): Linear(in_features=256, out_features=1024, bias=True)
(fc1_x): Linear(in_features=256, out_features=1024, bias=True)
(act): SiLU()
(drop1): Dropout(p=0.1, inplace=False)
(norm): Identity()
(fc2): Linear(in_features=1024, out_features=256, bias=True)
(drop2): Dropout(p=0.1, inplace=False)
)
)
(2-3): 2 x DropPath(drop_prob=0.100)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(mlp_head): Sequential(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): Linear(in_features=256, out_features=192, bias=True)
)
)
| module | #parameters or shape | #flops |
|---|---|---|
| model | 12.731M | 65.069G |
| to_patch_embedding.1 | 49.408K | 0.201G |
| to_patch_embedding.1.weight | (256, 192) | |
| to_patch_embedding.1.bias | (256,) | |
| blocks | 12.632M | 64.661G |
| blocks.0 | 4.211M | 21.554G |
| blocks.0.ts_temporal_transformer | 1.053M | 4.319G |
| blocks.0.ts_space_transformer | 1.053M | 6.458G |
| blocks.0.st_space_transformer | 1.053M | 6.458G |
| blocks.0.st_temporal_transformer | 1.053M | 4.319G |
| blocks.1 | 4.211M | 21.554G |
| blocks.1.ts_temporal_transformer | 1.053M | 4.319G |
| blocks.1.ts_space_transformer | 1.053M | 6.458G |
| blocks.1.st_space_transformer | 1.053M | 6.458G |
| blocks.1.st_temporal_transformer | 1.053M | 4.319G |
| blocks.2 | 4.211M | 21.554G |
| blocks.2.ts_temporal_transformer | 1.053M | 4.319G |
| blocks.2.ts_space_transformer | 1.053M | 6.458G |
| blocks.2.st_space_transformer | 1.053M | 6.458G |
| blocks.2.st_temporal_transformer | 1.053M | 4.319G |
| mlp_head | 49.856K | 0.207G |
| mlp_head.0 | 0.512K | 5.243M |
| mlp_head.0.weight | (256,) | |
| mlp_head.0.bias | (256,) | |
| mlp_head.1 | 49.344K | 0.201G |
| mlp_head.1.weight | (192, 256) | |
| mlp_head.1.bias | (192,) |
2025-11-23 11:43:26,482 - val mse:12629.8505859375, mae:39280.828125
2025-11-23 11:43:26,483 - Intermediate result: 12629.851 (Index 0)
2025-11-23 11:43:26,484 - Epoch: 1, Steps: 7340 | Lr: 0.0010000 | Train Loss: 0.0475791 | Vali Loss: 0.0642387
2025-11-23 13:02:46,002 - val mse:12634.5791015625, mae:39763.70703125
2025-11-23 13:02:46,002 - Intermediate result: 12634.579 (Index 1)
2025-11-23 13:02:46,003 - Epoch: 2, Steps: 7340 | Lr: 0.0009990 | Train Loss: 0.0643604 | Vali Loss: 0.0642628
2025-11-23 14:22:23,409 - val mse:12628.8076171875, mae:39400.453125
2025-11-23 14:22:23,410 - Intermediate result: 12628.808 (Index 2)
2025-11-23 14:22:23,410 - Epoch: 3, Steps: 7340 | Lr: 0.0009961 | Train Loss: 0.0643471 | Vali Loss: 0.0642334
2025-11-23 15:41:59,496 - val mse:12628.958984375, mae:39326.36328125
2025-11-23 15:41:59,497 - Intermediate result: 12628.959 (Index 3)
2025-11-23 15:41:59,498 - Epoch: 4, Steps: 7340 | Lr: 0.0009912 | Train Loss: 0.0643505 | Vali Loss: 0.0642342
2025-11-23 17:01:32,213 - val mse:12646.9404296875, mae:40085.94140625
2025-11-23 17:01:32,214 - Intermediate result: 12646.94 (Index 4)
2025-11-23 17:01:32,215 - Epoch: 5, Steps: 7340 | Lr: 0.0009843 | Train Loss: 0.0643392 | Vali Loss: 0.0643257
2025-11-23 18:21:05,215 - val mse:12629.5, mae:39498.28125
2025-11-23 18:21:05,216 - Intermediate result: 12629.5 (Index 5)
2025-11-23 18:21:05,216 - Epoch: 6, Steps: 7340 | Lr: 0.0009756 | Train Loss: 0.0643353 | Vali Loss: 0.0642370
2025-11-23 19:40:41,376 - val mse:12646.1982421875, mae:38698.20703125
2025-11-23 19:40:41,377 - Intermediate result: 12646.198 (Index 6)
2025-11-23 19:40:41,378 - Epoch: 7, Steps: 7340 | Lr: 0.0009649 | Train Loss: 0.0643307 | Vali Loss: 0.0643219
2025-11-23 21:00:16,221 - val mse:12638.4287109375, mae:38869.02734375
2025-11-23 21:00:16,222 - Intermediate result: 12638.429 (Index 7)
2025-11-23 21:00:16,222 - Epoch: 8, Steps: 7340 | Lr: 0.0009525 | Train Loss: 0.0643322 | Vali Loss: 0.0642824
2025-11-23 22:19:48,960 - val mse:12631.87890625, mae:39641.765625
2025-11-23 22:19:48,961 - Intermediate result: 12631.879 (Index 8)
2025-11-23 22:19:48,961 - Epoch: 9, Steps: 7340 | Lr: 0.0009382 | Train Loss: 0.0643300 | Vali Loss: 0.0642491
2025-11-23 23:38:47,267 - val mse:12635.4140625, mae:39795.7109375
2025-11-23 23:38:47,267 - Intermediate result: 12635.414 (Index 9)
2025-11-23 23:38:47,268 - Epoch: 10, Steps: 7340 | Lr: 0.0009222 | Train Loss: 0.0643266 | Vali Loss: 0.0642670
2025-11-24 00:57:45,815 - val mse:12628.7724609375, mae:39377.21484375
2025-11-24 00:57:45,816 - Intermediate result: 12628.772 (Index 10)
2025-11-24 00:57:45,816 - Epoch: 11, Steps: 7340 | Lr: 0.0009046 | Train Loss: 0.0643241 | Vali Loss: 0.0642333
2025-11-24 02:16:43,607 - val mse:12631.607421875, mae:39648.65234375
2025-11-24 02:16:43,608 - Intermediate result: 12631.607 (Index 11)
2025-11-24 02:16:43,608 - Epoch: 12, Steps: 7340 | Lr: 0.0008854 | Train Loss: 0.0643228 | Vali Loss: 0.0642477
2025-11-24 03:35:57,584 - val mse:12631.1494140625, mae:39619.1484375
2025-11-24 03:35:57,585 - Intermediate result: 12631.149 (Index 12)
2025-11-24 03:35:57,585 - Epoch: 13, Steps: 7340 | Lr: 0.0008646 | Train Loss: 0.0643208 | Vali Loss: 0.0642454
2025-11-24 04:55:11,483 - val mse:12630.25, mae:39563.28515625
2025-11-24 04:55:11,483 - Intermediate result: 12630.25 (Index 13)
2025-11-24 04:55:11,484 - Epoch: 14, Steps: 7340 | Lr: 0.0008424 | Train Loss: 0.0643185 | Vali Loss: 0.0642408
2025-11-24 06:14:50,974 - val mse:12636.0556640625, mae:39805.78125
2025-11-24 06:14:50,975 - Intermediate result: 12636.056 (Index 14)
2025-11-24 06:14:50,975 - Epoch: 15, Steps: 7340 | Lr: 0.0008189 | Train Loss: 0.0643184 | Vali Loss: 0.0642703
2025-11-24 07:34:21,980 - val mse:12629.3291015625, mae:39482.00390625
2025-11-24 07:34:21,981 - Intermediate result: 12629.329 (Index 15)
2025-11-24 07:34:21,981 - Epoch: 16, Steps: 7340 | Lr: 0.0007941 | Train Loss: 0.0643171 | Vali Loss: 0.0642361
2025-11-24 08:53:56,559 - val mse:12629.5283203125, mae:39218.328125
2025-11-24 08:53:56,560 - Intermediate result: 12629.528 (Index 16)
2025-11-24 08:53:56,560 - Epoch: 17, Steps: 7340 | Lr: 0.0007681 | Train Loss: 0.0643164 | Vali Loss: 0.0642371
2025-11-24 10:13:29,878 - val mse:12632.345703125, mae:39060.328125
2025-11-24 10:13:29,879 - Intermediate result: 12632.346 (Index 17)
2025-11-24 10:13:29,879 - Epoch: 18, Steps: 7340 | Lr: 0.0007411 | Train Loss: 0.0643159 | Vali Loss: 0.0642514