predformer在Human上的实验结果疑问

hello，我最近复现Predformer在Human3.6上的结果，以上是我的训练log，发现验证集loss不下降，这是正常的吗？可以请您贴下您的训练日志吗？非常感谢！

2025-11-23 10:23:45,290 - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
CUDA available: True
CUDA_HOME: /usr
NVCC: Build cuda_12.0.r12.0/compiler.32267302_0
GPU 0: Tesla V100-SXM2-32GB
GCC: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
PyTorch: 2.9.0+cu128
PyTorch compiling details: PyTorch built with:
  - GCC 13.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.8
  - NVCC architecture flags: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_100,code=sm_100;-gencode;arch=compute_120,code=sm_120
  - CuDNN 91.0.2  (built against CUDA 12.9)
    - Built with CuDNN 90.8
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=0fabc3ba44823f257e70ce397d989c8de5e362c1, CUDA_VERSION=12.8, CUDNN_VERSION=9.8.0, CXX_COMPILER=/opt/rh/gcc-toolset-13/root/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 

TorchVision: 0.24.0+cu128
OpenCV: 4.12.0
openstl: 0.3.0
------------------------------------------------------------

2025-11-23 10:23:48,618 - 
device: 	cuda	
dist: 	False	
display_step: 	10	
res_dir: 	work_dirs	
ex_name: 	human/2025-11-23-10-23_PredFormer_depth3_TSST_sd0.1_dp0.1_256_8_32_lr1e-3_50ep_cos_bs8_ps8_Adamw	
tb_dir: 	logs_tb/03_08	
use_gpu: 	True	
fp16: 	False	
torchscript: 	False	
seed: 	42	
diff_seed: 	False	
fps: 	False	
empty_cache: 	True	
find_unused_parameters: 	False	
broadcast_buffers: 	True	
resume_from: 	None	
auto_resume: 	False	
test: 	False	
inference: 	False	
deterministic: 	False	
launcher: 	none	
local_rank: 	0	
port: 	29500	
batch_size: 	10	
val_batch_size: 	16	
num_workers: 	4	
data_root: 	data	
dataname: 	human	
pre_seq_length: 	4	
aft_seq_length: 	4	
total_length: 	8	
use_augment: 	False	
use_prefetcher: 	False	
drop_last: 	False	
method: 	predformer	
config_file: 	configs/human/PredFormer.py	
model_type: 	None	
drop: 	0.0	
overwrite: 	True	
alpha: 	0.1	
top_k: 	100	
epoch: 	50	
log_step: 	1	
opt: 	adamw	
opt_eps: 	None	
opt_betas: 	None	
momentum: 	0.9	
weight_decay: 	0.01	
clip_grad: 	None	
clip_mode: 	norm	
early_stop_epoch: 	-1	
no_display_method_info: 	False	
sched: 	cosine	
lr: 	0.001	
lr_k_decay: 	1.0	
warmup_lr: 	1e-05	
min_lr: 	1e-06	
final_div_factor: 	10000.0	
warmup_epoch: 	0	
decay_epoch: 	100	
multi_decay_epoch: 	[20, 30, 40]	
decay_rate: 	0.1	
filter_bias_and_bn: 	False	
patience: 	5	
drop_path: 	0.0	
dropout: 	0.0	
cutoff: 	0	
cutmode: 	standard	
drop_schedule: 	constant	
model_config: 	{'height': 256, 'width': 256, 'num_channels': 3, 'pre_seq': 4, 'after_seq': 4, 'patch_size': 8, 'dim': 256, 'heads': 8, 'dim_head': 32, 'dropout': 0.1, 'attn_dropout': 0.1, 'drop_path': 0.1, 'scale_dim': 4, 'depth': 1, 'Ndepth': 3}	
in_shape: 	[4, 3, 256, 256]	
metrics: 	['mse', 'mae', 'ssim', 'psnr', 'lpips']	
2025-11-23 10:23:49,307 - Model info:
PredFormer_Model(
  (to_patch_embedding): Sequential(
    (0): Rearrange('b t c (h p1) (w p2) -> b t (h w) (p1 p2 c)', p1=8, p2=8)
    (1): Linear(in_features=192, out_features=256, bias=True)
  )
  (blocks): ModuleList(
    (0-2): 3 x PredFormerLayer(
      (ts_temporal_transformer): GatedTransformer(
        (layers): ModuleList(
          (0): ModuleList(
            (0): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): Attention(
                (to_qkv): Linear(in_features=256, out_features=768, bias=False)
                (to_out): Sequential(
                  (0): Linear(in_features=256, out_features=256, bias=True)
                  (1): Dropout(p=0.1, inplace=False)
                )
              )
            )
            (1): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): SwiGLU(
                (fc1_g): Linear(in_features=256, out_features=1024, bias=True)
                (fc1_x): Linear(in_features=256, out_features=1024, bias=True)
                (act): SiLU()
                (drop1): Dropout(p=0.1, inplace=False)
                (norm): Identity()
                (fc2): Linear(in_features=1024, out_features=256, bias=True)
                (drop2): Dropout(p=0.1, inplace=False)
              )
            )
            (2-3): 2 x DropPath(drop_prob=0.100)
          )
        )
        (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      )
      (ts_space_transformer): GatedTransformer(
        (layers): ModuleList(
          (0): ModuleList(
            (0): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): Attention(
                (to_qkv): Linear(in_features=256, out_features=768, bias=False)
                (to_out): Sequential(
                  (0): Linear(in_features=256, out_features=256, bias=True)
                  (1): Dropout(p=0.1, inplace=False)
                )
              )
            )
            (1): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): SwiGLU(
                (fc1_g): Linear(in_features=256, out_features=1024, bias=True)
                (fc1_x): Linear(in_features=256, out_features=1024, bias=True)
                (act): SiLU()
                (drop1): Dropout(p=0.1, inplace=False)
                (norm): Identity()
                (fc2): Linear(in_features=1024, out_features=256, bias=True)
                (drop2): Dropout(p=0.1, inplace=False)
              )
            )
            (2-3): 2 x DropPath(drop_prob=0.100)
          )
        )
        (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      )
      (st_space_transformer): GatedTransformer(
        (layers): ModuleList(
          (0): ModuleList(
            (0): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): Attention(
                (to_qkv): Linear(in_features=256, out_features=768, bias=False)
                (to_out): Sequential(
                  (0): Linear(in_features=256, out_features=256, bias=True)
                  (1): Dropout(p=0.1, inplace=False)
                )
              )
            )
            (1): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): SwiGLU(
                (fc1_g): Linear(in_features=256, out_features=1024, bias=True)
                (fc1_x): Linear(in_features=256, out_features=1024, bias=True)
                (act): SiLU()
                (drop1): Dropout(p=0.1, inplace=False)
                (norm): Identity()
                (fc2): Linear(in_features=1024, out_features=256, bias=True)
                (drop2): Dropout(p=0.1, inplace=False)
              )
            )
            (2-3): 2 x DropPath(drop_prob=0.100)
          )
        )
        (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      )
      (st_temporal_transformer): GatedTransformer(
        (layers): ModuleList(
          (0): ModuleList(
            (0): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): Attention(
                (to_qkv): Linear(in_features=256, out_features=768, bias=False)
                (to_out): Sequential(
                  (0): Linear(in_features=256, out_features=256, bias=True)
                  (1): Dropout(p=0.1, inplace=False)
                )
              )
            )
            (1): PreNorm(
              (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (fn): SwiGLU(
                (fc1_g): Linear(in_features=256, out_features=1024, bias=True)
                (fc1_x): Linear(in_features=256, out_features=1024, bias=True)
                (act): SiLU()
                (drop1): Dropout(p=0.1, inplace=False)
                (norm): Identity()
                (fc2): Linear(in_features=1024, out_features=256, bias=True)
                (drop2): Dropout(p=0.1, inplace=False)
              )
            )
            (2-3): 2 x DropPath(drop_prob=0.100)
          )
        )
        (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      )
    )
  )
  (mlp_head): Sequential(
    (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    (1): Linear(in_features=256, out_features=192, bias=True)
  )
)
| module                              | #parameters or shape   | #flops    |
|:------------------------------------|:-----------------------|:----------|
| model                               | 12.731M                | 65.069G   |
|  to_patch_embedding.1               |  49.408K               |  0.201G   |
|   to_patch_embedding.1.weight       |   (256, 192)           |           |
|   to_patch_embedding.1.bias         |   (256,)               |           |
|  blocks                             |  12.632M               |  64.661G  |
|   blocks.0                          |   4.211M               |   21.554G |
|    blocks.0.ts_temporal_transformer |    1.053M              |    4.319G |
|    blocks.0.ts_space_transformer    |    1.053M              |    6.458G |
|    blocks.0.st_space_transformer    |    1.053M              |    6.458G |
|    blocks.0.st_temporal_transformer |    1.053M              |    4.319G |
|   blocks.1                          |   4.211M               |   21.554G |
|    blocks.1.ts_temporal_transformer |    1.053M              |    4.319G |
|    blocks.1.ts_space_transformer    |    1.053M              |    6.458G |
|    blocks.1.st_space_transformer    |    1.053M              |    6.458G |
|    blocks.1.st_temporal_transformer |    1.053M              |    4.319G |
|   blocks.2                          |   4.211M               |   21.554G |
|    blocks.2.ts_temporal_transformer |    1.053M              |    4.319G |
|    blocks.2.ts_space_transformer    |    1.053M              |    6.458G |
|    blocks.2.st_space_transformer    |    1.053M              |    6.458G |
|    blocks.2.st_temporal_transformer |    1.053M              |    4.319G |
|  mlp_head                           |  49.856K               |  0.207G   |
|   mlp_head.0                        |   0.512K               |   5.243M  |
|    mlp_head.0.weight                |    (256,)              |           |
|    mlp_head.0.bias                  |    (256,)              |           |
|   mlp_head.1                        |   49.344K              |   0.201G  |
|    mlp_head.1.weight                |    (192, 256)          |           |
|    mlp_head.1.bias                  |    (192,)              |           |
--------------------------------------------------------------------------------

2025-11-23 11:43:26,482 - val	 mse:12629.8505859375, mae:39280.828125
2025-11-23 11:43:26,483 - Intermediate result: 12629.851  (Index 0)
2025-11-23 11:43:26,484 - Epoch: 1, Steps: 7340 | Lr: 0.0010000 | Train Loss: 0.0475791 | Vali Loss: 0.0642387

2025-11-23 13:02:46,002 - val	 mse:12634.5791015625, mae:39763.70703125
2025-11-23 13:02:46,002 - Intermediate result: 12634.579  (Index 1)
2025-11-23 13:02:46,003 - Epoch: 2, Steps: 7340 | Lr: 0.0009990 | Train Loss: 0.0643604 | Vali Loss: 0.0642628

2025-11-23 14:22:23,409 - val	 mse:12628.8076171875, mae:39400.453125
2025-11-23 14:22:23,410 - Intermediate result: 12628.808  (Index 2)
2025-11-23 14:22:23,410 - Epoch: 3, Steps: 7340 | Lr: 0.0009961 | Train Loss: 0.0643471 | Vali Loss: 0.0642334

2025-11-23 15:41:59,496 - val	 mse:12628.958984375, mae:39326.36328125
2025-11-23 15:41:59,497 - Intermediate result: 12628.959  (Index 3)
2025-11-23 15:41:59,498 - Epoch: 4, Steps: 7340 | Lr: 0.0009912 | Train Loss: 0.0643505 | Vali Loss: 0.0642342

2025-11-23 17:01:32,213 - val	 mse:12646.9404296875, mae:40085.94140625
2025-11-23 17:01:32,214 - Intermediate result: 12646.94  (Index 4)
2025-11-23 17:01:32,215 - Epoch: 5, Steps: 7340 | Lr: 0.0009843 | Train Loss: 0.0643392 | Vali Loss: 0.0643257

2025-11-23 18:21:05,215 - val	 mse:12629.5, mae:39498.28125
2025-11-23 18:21:05,216 - Intermediate result: 12629.5  (Index 5)
2025-11-23 18:21:05,216 - Epoch: 6, Steps: 7340 | Lr: 0.0009756 | Train Loss: 0.0643353 | Vali Loss: 0.0642370

2025-11-23 19:40:41,376 - val	 mse:12646.1982421875, mae:38698.20703125
2025-11-23 19:40:41,377 - Intermediate result: 12646.198  (Index 6)
2025-11-23 19:40:41,378 - Epoch: 7, Steps: 7340 | Lr: 0.0009649 | Train Loss: 0.0643307 | Vali Loss: 0.0643219

2025-11-23 21:00:16,221 - val	 mse:12638.4287109375, mae:38869.02734375
2025-11-23 21:00:16,222 - Intermediate result: 12638.429  (Index 7)
2025-11-23 21:00:16,222 - Epoch: 8, Steps: 7340 | Lr: 0.0009525 | Train Loss: 0.0643322 | Vali Loss: 0.0642824

2025-11-23 22:19:48,960 - val	 mse:12631.87890625, mae:39641.765625
2025-11-23 22:19:48,961 - Intermediate result: 12631.879  (Index 8)
2025-11-23 22:19:48,961 - Epoch: 9, Steps: 7340 | Lr: 0.0009382 | Train Loss: 0.0643300 | Vali Loss: 0.0642491

2025-11-23 23:38:47,267 - val	 mse:12635.4140625, mae:39795.7109375
2025-11-23 23:38:47,267 - Intermediate result: 12635.414  (Index 9)
2025-11-23 23:38:47,268 - Epoch: 10, Steps: 7340 | Lr: 0.0009222 | Train Loss: 0.0643266 | Vali Loss: 0.0642670

2025-11-24 00:57:45,815 - val	 mse:12628.7724609375, mae:39377.21484375
2025-11-24 00:57:45,816 - Intermediate result: 12628.772  (Index 10)
2025-11-24 00:57:45,816 - Epoch: 11, Steps: 7340 | Lr: 0.0009046 | Train Loss: 0.0643241 | Vali Loss: 0.0642333

2025-11-24 02:16:43,607 - val	 mse:12631.607421875, mae:39648.65234375
2025-11-24 02:16:43,608 - Intermediate result: 12631.607  (Index 11)
2025-11-24 02:16:43,608 - Epoch: 12, Steps: 7340 | Lr: 0.0008854 | Train Loss: 0.0643228 | Vali Loss: 0.0642477

2025-11-24 03:35:57,584 - val	 mse:12631.1494140625, mae:39619.1484375
2025-11-24 03:35:57,585 - Intermediate result: 12631.149  (Index 12)
2025-11-24 03:35:57,585 - Epoch: 13, Steps: 7340 | Lr: 0.0008646 | Train Loss: 0.0643208 | Vali Loss: 0.0642454

2025-11-24 04:55:11,483 - val	 mse:12630.25, mae:39563.28515625
2025-11-24 04:55:11,483 - Intermediate result: 12630.25  (Index 13)
2025-11-24 04:55:11,484 - Epoch: 14, Steps: 7340 | Lr: 0.0008424 | Train Loss: 0.0643185 | Vali Loss: 0.0642408

2025-11-24 06:14:50,974 - val	 mse:12636.0556640625, mae:39805.78125
2025-11-24 06:14:50,975 - Intermediate result: 12636.056  (Index 14)
2025-11-24 06:14:50,975 - Epoch: 15, Steps: 7340 | Lr: 0.0008189 | Train Loss: 0.0643184 | Vali Loss: 0.0642703

2025-11-24 07:34:21,980 - val	 mse:12629.3291015625, mae:39482.00390625
2025-11-24 07:34:21,981 - Intermediate result: 12629.329  (Index 15)
2025-11-24 07:34:21,981 - Epoch: 16, Steps: 7340 | Lr: 0.0007941 | Train Loss: 0.0643171 | Vali Loss: 0.0642361

2025-11-24 08:53:56,559 - val	 mse:12629.5283203125, mae:39218.328125
2025-11-24 08:53:56,560 - Intermediate result: 12629.528  (Index 16)
2025-11-24 08:53:56,560 - Epoch: 17, Steps: 7340 | Lr: 0.0007681 | Train Loss: 0.0643164 | Vali Loss: 0.0642371

2025-11-24 10:13:29,878 - val	 mse:12632.345703125, mae:39060.328125
2025-11-24 10:13:29,879 - Intermediate result: 12632.346  (Index 17)
2025-11-24 10:13:29,879 - Epoch: 18, Steps: 7340 | Lr: 0.0007411 | Train Loss: 0.0643159 | Vali Loss: 0.0642514



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predformer在Human上的实验结果疑问 #18

2025-11-23 10:23:45,290 - Environment info:

TorchVision: 0.24.0+cu128
OpenCV: 4.12.0
openstl: 0.3.0

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

module	#parameters or shape	#flops
model	12.731M	65.069G
to_patch_embedding.1	49.408K	0.201G
to_patch_embedding.1.weight	(256, 192)
to_patch_embedding.1.bias	(256,)
blocks	12.632M	64.661G
blocks.0	4.211M	21.554G
blocks.0.ts_temporal_transformer	1.053M	4.319G
blocks.0.ts_space_transformer	1.053M	6.458G
blocks.0.st_space_transformer	1.053M	6.458G
blocks.0.st_temporal_transformer	1.053M	4.319G
blocks.1	4.211M	21.554G
blocks.1.ts_temporal_transformer	1.053M	4.319G
blocks.1.ts_space_transformer	1.053M	6.458G
blocks.1.st_space_transformer	1.053M	6.458G
blocks.1.st_temporal_transformer	1.053M	4.319G
blocks.2	4.211M	21.554G
blocks.2.ts_temporal_transformer	1.053M	4.319G
blocks.2.ts_space_transformer	1.053M	6.458G
blocks.2.st_space_transformer	1.053M	6.458G
blocks.2.st_temporal_transformer	1.053M	4.319G
mlp_head	49.856K	0.207G
mlp_head.0	0.512K	5.243M
mlp_head.0.weight	(256,)
mlp_head.0.bias	(256,)
mlp_head.1	49.344K	0.201G
mlp_head.1.weight	(192, 256)
mlp_head.1.bias	(192,)

predformer在Human上的实验结果疑问 #18

Description

2025-11-23 10:23:45,290 - Environment info:

TorchVision: 0.24.0+cu128 OpenCV: 4.12.0 openstl: 0.3.0

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

TorchVision: 0.24.0+cu128
OpenCV: 4.12.0
openstl: 0.3.0