Skip to content

I get negative vae loss #34

@Yitian-Li

Description

@Yitian-Li

I'm tryting to train a model with my dataset, However I get negative vae loss, it seems quite strange.
Could you help me with this? Thanks!
vae training log with factor rot:

/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Namespace(batch_size=64, cross_entropy_loss=False, datapath='../datasets/data/coeff', epochs=600, factor='rot', gpu=0, lr=0.0001, lr_epochs=150, lr_fac=0.5, output_path='./weights', root_folder='.', val=False, write_iteration=600)
46975
2022-04-14 10:43:10.860388: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-14 10:43:11.042504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:08:00.0
totalMemory: 23.88GiB freeMemory: 22.99GiB
2022-04-14 10:43:11.042551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2022-04-14 10:43:11.427630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-14 10:43:11.427679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2022-04-14 10:43:11.427686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2022-04-14 10:43:11.427814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22300 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla P40, pci bus id: 0000:08:00.0, compute capability: 6.1)
Date: 2022-04-14 10:43:26       Epoch: [Stage 1][0/600] Loss: 2.7066.
Date: 2022-04-14 10:43:40       Epoch: [Stage 1][1/600] Loss: 2.4838.
Date: 2022-04-14 10:43:55       Epoch: [Stage 1][2/600] Loss: 2.2725.
Date: 2022-04-14 10:44:10       Epoch: [Stage 1][3/600] Loss: 2.0638.
Date: 2022-04-14 10:44:24       Epoch: [Stage 1][4/600] Loss: 1.8573.
Date: 2022-04-14 10:44:39       Epoch: [Stage 1][5/600] Loss: 1.6532.
Date: 2022-04-14 10:44:54       Epoch: [Stage 1][6/600] Loss: 1.4517.
Date: 2022-04-14 10:45:09       Epoch: [Stage 1][7/600] Loss: 1.2532.
Date: 2022-04-14 10:45:24       Epoch: [Stage 1][8/600] Loss: 1.0580.
Date: 2022-04-14 10:45:39       Epoch: [Stage 1][9/600] Loss: 0.8668.
Date: 2022-04-14 10:45:54       Epoch: [Stage 1][10/600]        Loss: 0.6800.
Date: 2022-04-14 10:46:08       Epoch: [Stage 1][11/600]        Loss: 0.4984.
Date: 2022-04-14 10:46:23       Epoch: [Stage 1][12/600]        Loss: 0.3226.
Date: 2022-04-14 10:46:38       Epoch: [Stage 1][13/600]        Loss: 0.1537.
Date: 2022-04-14 10:46:53       Epoch: [Stage 1][14/600]        Loss: -0.0077.
Date: 2022-04-14 10:47:07       Epoch: [Stage 1][15/600]        Loss: -0.1606.
Date: 2022-04-14 10:47:22       Epoch: [Stage 1][16/600]        Loss: -0.3031.
Date: 2022-04-14 10:47:37       Epoch: [Stage 1][17/600]        Loss: -0.4346.
Date: 2022-04-14 10:47:52       Epoch: [Stage 1][18/600]        Loss: -0.5539.
Date: 2022-04-14 10:48:07       Epoch: [Stage 1][19/600]        Loss: -0.6600.
Date: 2022-04-14 10:48:21       Epoch: [Stage 1][20/600]        Loss: -0.7503.
Date: 2022-04-14 10:48:37       Epoch: [Stage 1][21/600]        Loss: -0.8247.
Date: 2022-04-14 10:48:51       Epoch: [Stage 1][22/600]        Loss: -0.8824.
Date: 2022-04-14 10:49:06       Epoch: [Stage 1][23/600]        Loss: -0.9245.
Date: 2022-04-14 10:49:20       Epoch: [Stage 1][24/600]        Loss: -0.9519.
Date: 2022-04-14 10:49:35       Epoch: [Stage 1][25/600]        Loss: -0.9678.
Date: 2022-04-14 10:49:50       Epoch: [Stage 1][26/600]        Loss: -0.9839.
Date: 2022-04-14 10:50:05       Epoch: [Stage 1][27/600]        Loss: -1.0732.
Date: 2022-04-14 10:50:20       Epoch: [Stage 1][28/600]        Loss: -1.2186.
Date: 2022-04-14 10:50:34       Epoch: [Stage 1][29/600]        Loss: -1.2832.
Date: 2022-04-14 10:50:49       Epoch: [Stage 1][30/600]        Loss: -1.3243.
Date: 2022-04-14 10:51:04       Epoch: [Stage 1][31/600]        Loss: -1.3485.
Date: 2022-04-14 10:51:19       Epoch: [Stage 1][32/600]        Loss: -1.3644.
Date: 2022-04-14 10:51:34       Epoch: [Stage 1][33/600]        Loss: -1.3820.
Date: 2022-04-14 10:51:49       Epoch: [Stage 1][34/600]        Loss: -1.3819.

vae training log with factor gamma:

root@train-disco3-0:/data1/DiscoFaceGAN/vae# python demo.py --datapath ../datasets/data/coeff --factor gamma
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Namespace(batch_size=64, cross_entropy_loss=False, datapath='../datasets/data/coeff', epochs=600, factor='gamma', gpu=0, lr=0.0001, lr_epochs=150, lr_fac=0.5, output_path='./weights', root_folder='.', val=False, write_iteration=600)
46975
2022-04-14 10:42:52.898209: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-14 10:42:53.063619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:08:00.0
totalMemory: 23.88GiB freeMemory: 23.22GiB
2022-04-14 10:42:53.063673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2022-04-14 10:42:53.419503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-14 10:42:53.419554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2022-04-14 10:42:53.419561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2022-04-14 10:42:53.419681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22532 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla P40, pci bus id: 0000:08:00.0, compute capability: 6.1)
Date: 2022-04-14 10:43:07       Epoch: [Stage 1][0/600] Loss: 23.9197.
Date: 2022-04-14 10:43:22       Epoch: [Stage 1][1/600] Loss: 21.9015.
Date: 2022-04-14 10:43:36       Epoch: [Stage 1][2/600] Loss: 19.9280.
Date: 2022-04-14 10:43:50       Epoch: [Stage 1][3/600] Loss: 17.9590.
Date: 2022-04-14 10:44:04       Epoch: [Stage 1][4/600] Loss: 15.9933.
Date: 2022-04-14 10:44:18       Epoch: [Stage 1][5/600] Loss: 14.0305.
Date: 2022-04-14 10:44:32       Epoch: [Stage 1][6/600] Loss: 12.0710.
Date: 2022-04-14 10:44:46       Epoch: [Stage 1][7/600] Loss: 10.1152.
Date: 2022-04-14 10:45:01       Epoch: [Stage 1][8/600] Loss: 8.1635.
Date: 2022-04-14 10:45:15       Epoch: [Stage 1][9/600] Loss: 6.2167.
Date: 2022-04-14 10:45:29       Epoch: [Stage 1][10/600]        Loss: 4.2754.
Date: 2022-04-14 10:45:44       Epoch: [Stage 1][11/600]        Loss: 2.3405.
Date: 2022-04-14 10:45:58       Epoch: [Stage 1][12/600]        Loss: 0.4129.
Date: 2022-04-14 10:46:12       Epoch: [Stage 1][13/600]        Loss: -1.5061.
Date: 2022-04-14 10:46:27       Epoch: [Stage 1][14/600]        Loss: -3.4153.
Date: 2022-04-14 10:46:41       Epoch: [Stage 1][15/600]        Loss: -5.3130.
Date: 2022-04-14 10:46:55       Epoch: [Stage 1][16/600]        Loss: -7.1975.
Date: 2022-04-14 10:47:10       Epoch: [Stage 1][17/600]        Loss: -9.0669.
Date: 2022-04-14 10:47:23       Epoch: [Stage 1][18/600]        Loss: -10.9186.
Date: 2022-04-14 10:47:38       Epoch: [Stage 1][19/600]        Loss: -12.7502.
Date: 2022-04-14 10:47:52       Epoch: [Stage 1][20/600]        Loss: -14.5583.
Date: 2022-04-14 10:48:07       Epoch: [Stage 1][21/600]        Loss: -16.3394.
Date: 2022-04-14 10:48:21       Epoch: [Stage 1][22/600]        Loss: -18.0895.
Date: 2022-04-14 10:48:36       Epoch: [Stage 1][23/600]        Loss: -19.8038.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions