Skip to content

GPU hang error when run model #3

@helloworlde

Description

@helloworlde

Hi, thanks so much for your documentation, I'm using AMD 8845HS with 780M GPU run deepseek-r1:1.5b by Ollama follow by your document. But there has GPU hang error after several rounds of conversation:

HW Exception by GPU node-1 (Agent handle: 0x7e6eb7d0bb40) reason :GPU Hang
  • Hardware:

    • CPU: AMD 8845HS
    • GPU: 780M with 16GB VRAM
    • Memory: DDR5 5600Mhz 48G
    • OS: LXC Container in PVE 8.3
  • docker-compose:

services:
  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    restart: unless-stopped
    devices:
      - "/dev/kfd"
      - "/dev/dri"
    volumes:
      - ./data:/root/.ollama
    environment:
      - OLLAMA_ORIGINS='chrome-extension://*,moz-extension://*'
      - HSA_OVERRIDE_GFX_VERSION=11.0.0
      - HCC_AMDGPU_TARGETS=gfx1103
      - OLLAMA_LLM_LIBRARY=rocm_v60002
      - OLLAMA_DEBUG=1 
    ports:
      - "11434:11434"  
  • error message:
ollama  | time=2025-02-24T09:45:48.152Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
ollama  | time=2025-02-24T09:45:48.152Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
ollama  | llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
ollama  | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama  | llama_model_loader: - kv   0:                       general.architecture str              = qwen2
ollama  | llama_model_loader: - kv   1:                               general.type str              = model
ollama  | llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
ollama  | llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
ollama  | llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
ollama  | llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
ollama  | llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
ollama  | llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
ollama  | llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
ollama  | llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
ollama  | llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
ollama  | llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
ollama  | llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
ollama  | llama_model_loader: - kv  13:                          general.file_type u32              = 15
ollama  | llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
ollama  | llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
ollama  | llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
ollama  | llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
ollama  | llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
ollama  | llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
ollama  | llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
ollama  | llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
ollama  | llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
ollama  | llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
ollama  | llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
ollama  | llama_model_loader: - kv  25:               general.quantization_version u32              = 2
ollama  | llama_model_loader: - type  f32:  141 tensors
ollama  | llama_model_loader: - type q4_K:  169 tensors
ollama  | llama_model_loader: - type q6_K:   29 tensors
ollama  | llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
ollama  | llm_load_vocab: special tokens cache size = 22
ollama  | llm_load_vocab: token to piece cache size = 0.9310 MB
ollama  | llm_load_print_meta: format           = GGUF V3 (latest)
ollama  | llm_load_print_meta: arch             = qwen2
ollama  | llm_load_print_meta: vocab type       = BPE
ollama  | llm_load_print_meta: n_vocab          = 151936
ollama  | llm_load_print_meta: n_merges         = 151387
ollama  | llm_load_print_meta: vocab_only       = 1
ollama  | llm_load_print_meta: model type       = ?B
ollama  | llm_load_print_meta: model ftype      = all F32
ollama  | llm_load_print_meta: model params     = 1.78 B
ollama  | llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW) 
ollama  | llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
ollama  | llm_load_print_meta: BOS token        = 151646 '<|beginofsentence>'
ollama  | llm_load_print_meta: EOS token        = 151643 '<|endofsentence>'
ollama  | llm_load_print_meta: PAD token        = 151643 '<|endofsentence>'
ollama  | llm_load_print_meta: LF token         = 148848Ĭ'
ollama  | llm_load_print_meta: EOG token        = 151643 '<|endofsentence>'
ollama  | llm_load_print_meta: max token length = 256
ollama  | llama_model_load: vocab only - skipping tensors
ollama  | time=2025-02-24T09:45:48.455Z level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="You are a professional, authentic machine translation engine.\n\nYou are about to translate text from an article. Title: “OLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama”, Summary: {{imt_theme}}\n\nThis content may include the following terms {{imt_terms}}. Please handle these terms carefully.<|User|>; 把下一行文本作为纯文本输入,并将其翻译为简体中文,, if the text contains html tags, please consider after translate, where the tags should be in translated result, meanwhile keep the result fluently.仅输出翻译。如果某些内容无需翻译(如专有名词、代码等),则保持原文不变。不要解释,输入文本:\nOLLAMA_ORIGINS=chrome-extension://etc does not work #1686<|Assistant|>"
ollama  | time=2025-02-24T09:45:48.457Z level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="You are a professional, authentic machine translation engine.\n\nYou are about to translate text from an article. Title: “OLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama”, Summary: {{imt_theme}}\n\nThis content may include the following terms {{imt_terms}}. Please handle these terms carefully.<|User|>; 把下一行文本作为纯文本输入,并将其翻译为简体中文,, if the text contains html tags, please consider after translate, where the tags should be in translated result, meanwhile keep the result fluently.仅输出翻译。如果某些内容无需翻译(如专有名词、代码等),则保持原文不变。不要解释,输入文本:\nOLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama<|Assistant|>"
ollama  | time=2025-02-24T09:45:48.458Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=174 used=0 remaining=174
// ... part of chat message
ollama  | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=1 cache=0 prompt=183 used=0 remaining=183
ollama  | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=2 cache=0 prompt=190 used=0 remaining=190
ollama  | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=3 cache=0 prompt=168 used=0 remaining=168
ollama  | HW Exception by GPU node-1 (Agent handle: 0x7e6eb7d0bb40) reason :GPU Hang
ollama  | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
ollama  | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=14
ollama  | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
ollama  | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=13
ollama  | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
  • rocm
rocminfo              
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5137                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32638500(0x1f20624) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    32638500(0x1f20624) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32638500(0x1f20624) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32638500(0x1f20624) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1103                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 6400(0x1900)                       
  ASIC Revision:           12(0xc)                            
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   50432                              
  Internal Node ID:        1                                  
  Compute Unit:            12                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 40                                 
  SDMA engine uCode::      21                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16319248(0xf90310) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16319248(0xf90310) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1103         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***        
  • os
uname -a  
Linux dev 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 x86_64 x86_64 GNU/Linux

Did you meet same issues or can you give me some message to fix this issue, thank you so much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions