Ubuntu24.04 gpu driver crash fixed

ubuntu

Ubuntu24.04LTS で nvidia の driver を使っているとクラッシュしていました。これが最近になって修正されたようなので確認してみようと思います。

クラッシュの内容(dmesg)

nvidia のグラフィックボードを使っている PC に Ubuntu24.04LTS デスクトップ環境をインストール した後に sudo dmesg を実行すると、以下の起動ログが表示されます。

$ sudo dmesg
(非常に長いのでクラッシュの該当部分のみを表示します)

[  154.960660] caller os_map_kernel_space+0xf4/0x120 [nvidia] mapping multiple BARs
[  166.232331] ------------[ cut here ]------------
[  166.232339] simple-framebuffer simple-framebuffer.0: drm_WARN_ON(map->is_iomem)
[  166.232360] WARNING: CPU: 2 PID: 1331 at drivers/gpu/drm/drm_gem_shmem_helper.c:319 drm_gem_shmem_vmap+0x1a5/0x1e0
[  166.232374] Modules linked in: snd_seq_dummy snd_hrtimer qrtr nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel nvidia(POE) snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr intel_rapl_common cfg80211 gpio_ich x86_pkg_temp_thermal snd_hda_core intel_powerclamp joydev input_leds mei_pxp mei_hdcp at24 video coretemp snd_hwdep wmi snd_pcm kvm_intel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd i2c_i801 kvm soundcore irqbypass mei_me mei rapl lpc_ich i2c_smbus mac_hid intel_cstate msr parport_pc ppdev lp parport dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic uas usbhid usb_storage hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 r8169 realtek xhci_pci ahci
[  166.232490]  xhci_pci_renesas libahci crypto_simd cryptd
[  166.232498] CPU: 2 PID: 1331 Comm: Xorg Tainted: P           OE      6.8.0-41-generic #41-Ubuntu
[  166.232503] Hardware name: MouseComputer Co.,Ltd. H61MU-S01 (MS-7680)/H61MU-S01 (MS-7680), BIOS V22.1B2 10/25/2011
[  166.232506] RIP: 0010:drm_gem_shmem_vmap+0x1a5/0x1e0
[  166.232512] Code: 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 64 a0 ec ff 48 c7 c1 0e 85 e2 8f 4c 89 ea 48 c7 c7 4d 60 e2 8f 48 89 c6 e8 1b d2 43 ff <0f> 0b 48 8b 83 f0 00 00 00 4c 89 e6 48 8b 38 e8 c7 45 f5 ff b8 fb
[  166.232516] RSP: 0018:ffffa9a2417f39f8 EFLAGS: 00010246
[  166.232521] RAX: 0000000000000000 RBX: ffff8af943780800 RCX: 0000000000000000
[  166.232524] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  166.232526] RBP: ffffa9a2417f3a18 R08: 0000000000000000 R09: 0000000000000000
[  166.232528] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8af941834cc8
[  166.232530] R13: ffff8af9418ec9c0 R14: ffff8af941834cc8 R15: ffff8af941834cc8
[  166.232533] FS:  000076d98e08fac0(0000) GS:ffff8afc6f500000(0000) knlGS:0000000000000000
[  166.232537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  166.232540] CR2: 0000646fe9d956f0 CR3: 000000010acf8001 CR4: 00000000000606f0
[  166.232543] Call Trace:
[  166.232547]  <TASK>
[  166.232551]  ? show_regs+0x6d/0x80
[  166.232558]  ? __warn+0x89/0x160
[  166.232566]  ? drm_gem_shmem_vmap+0x1a5/0x1e0
[  166.232570]  ? report_bug+0x17e/0x1b0
[  166.232576]  ? handle_bug+0x51/0xa0
[  166.232583]  ? exc_invalid_op+0x18/0x80
[  166.232605]  ? asm_exc_invalid_op+0x1b/0x20
[  166.232621]  ? drm_gem_shmem_vmap+0x1a5/0x1e0
[  166.232629]  ? drm_gem_shmem_vmap+0x1a5/0x1e0
[  166.232633]  drm_gem_shmem_object_vmap+0x9/0x20
[  166.232639]  drm_gem_vmap+0x26/0x80
[  166.232644]  drm_gem_vmap_unlocked+0x2b/0x50
[  166.232648]  drm_gem_fb_vmap+0x40/0x150
[  166.232656]  drm_gem_begin_shadow_fb_access+0x25/0x40
[  166.232662]  drm_atomic_helper_prepare_planes.part.0+0x142/0x1e0
[  166.232668]  drm_atomic_helper_prepare_planes+0x5d/0x70
[  166.232674]  drm_atomic_helper_commit+0x84/0x160
[  166.232680]  drm_atomic_commit+0x99/0xd0
[  166.232687]  ? __pfx___drm_printfn_info+0x10/0x10
[  166.232692]  drm_atomic_helper_set_config+0x82/0xd0
[  166.232698]  drm_mode_setcrtc+0x535/0x8b0
[  166.232707]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[  166.232712]  drm_ioctl_kernel+0xbc/0x120
[  166.232718]  drm_ioctl+0x2d4/0x550
[  166.232723]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[  166.232731]  __x64_sys_ioctl+0xa3/0xf0
[  166.232738]  x64_sys_call+0x143b/0x25c0
[  166.232743]  do_syscall_64+0x7f/0x180
[  166.232748]  ? count_memcg_events.constprop.0+0x2a/0x50
[  166.232755]  ? handle_mm_fault+0xad/0x380
[  166.232761]  ? irqentry_exit_to_user_mode+0x7e/0x260
[  166.232767]  ? irqentry_exit+0x43/0x50
[  166.232772]  ? exc_page_fault+0x94/0x1b0
[  166.232777]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  166.232783] RIP: 0033:0x76d98e524ded
[  166.232801] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[  166.232804] RSP: 002b:00007fffa45c4ff0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  166.232809] RAX: ffffffffffffffda RBX: 0000646fe9d956e0 RCX: 000076d98e524ded
[  166.232811] RDX: 00007fffa45c5080 RSI: 00000000c06864a2 RDI: 0000000000000016
[  166.232814] RBP: 00007fffa45c5040 R08: 0000000000000000 R09: 0000646fea2b5fd0
[  166.232816] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffa45c5080
[  166.232819] R13: 00000000c06864a2 R14: 0000000000000016 R15: 0000646fe9cec998
[  166.232824]  </TASK>
[  166.232826] ---[ end trace 0000000000000000 ]---
[  170.102761] rfkill: input handler disabled
[  206.968995] systemd-journald[414]: /var/log/journal/7fc2462810e74b9d90feb61551522fdc/user-1000.journal: Journal file uses a different sequence number ID, rotating.
[  207.464428] rfkill: input handler enabled
[  210.450849] ------------[ cut here ]------------
[  210.450857] simple-framebuffer simple-framebuffer.0: drm_WARN_ON(map->is_iomem)
[  210.450880] WARNING: CPU: 3 PID: 1846 at drivers/gpu/drm/drm_gem_shmem_helper.c:319 drm_gem_shmem_vmap+0x1a5/0x1e0
[  210.450894] Modules linked in: snd_seq_dummy snd_hrtimer qrtr nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel nvidia(POE) snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr intel_rapl_common cfg80211 gpio_ich x86_pkg_temp_thermal snd_hda_core intel_powerclamp joydev input_leds mei_pxp mei_hdcp at24 video coretemp snd_hwdep wmi snd_pcm kvm_intel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd i2c_i801 kvm soundcore irqbypass mei_me mei rapl lpc_ich i2c_smbus mac_hid intel_cstate msr parport_pc ppdev lp parport dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic uas usbhid usb_storage hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 r8169 realtek xhci_pci ahci
[  210.451009]  xhci_pci_renesas libahci crypto_simd cryptd
[  210.451018] CPU: 3 PID: 1846 Comm: Xorg Tainted: P        W  OE      6.8.0-41-generic #41-Ubuntu
[  210.451023] Hardware name: MouseComputer Co.,Ltd. H61MU-S01 (MS-7680)/H61MU-S01 (MS-7680), BIOS V22.1B2 10/25/2011
[  210.451026] RIP: 0010:drm_gem_shmem_vmap+0x1a5/0x1e0
[  210.451032] Code: 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 64 a0 ec ff 48 c7 c1 0e 85 e2 8f 4c 89 ea 48 c7 c7 4d 60 e2 8f 48 89 c6 e8 1b d2 43 ff <0f> 0b 48 8b 83 f0 00 00 00 4c 89 e6 48 8b 38 e8 c7 45 f5 ff b8 fb
[  210.451036] RSP: 0018:ffffa9a2414977e0 EFLAGS: 00010246
[  210.451040] RAX: 0000000000000000 RBX: ffff8af940fb0400 RCX: 0000000000000000
[  210.451043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  210.451045] RBP: ffffa9a241497800 R08: 0000000000000000 R09: 0000000000000000
[  210.451048] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8af949dd84c8
[  210.451050] R13: ffff8af9418ec9c0 R14: ffff8af949dd84c8 R15: ffff8af949dd84c8
[  210.451053] FS:  00007ebc5f4abac0(0000) GS:ffff8afc6f580000(0000) knlGS:0000000000000000
[  210.451057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  210.451059] CR2: 000064de0dad84d8 CR3: 0000000114ef0004 CR4: 00000000000606f0
[  210.451063] Call Trace:
[  210.451066]  <TASK>
[  210.451071]  ? show_regs+0x6d/0x80
[  210.451078]  ? __warn+0x89/0x160
[  210.451085]  ? drm_gem_shmem_vmap+0x1a5/0x1e0
[  210.451090]  ? report_bug+0x17e/0x1b0
[  210.451095]  ? handle_bug+0x51/0xa0
[  210.451103]  ? exc_invalid_op+0x18/0x80
[  210.451108]  ? asm_exc_invalid_op+0x1b/0x20
[  210.451118]  ? drm_gem_shmem_vmap+0x1a5/0x1e0
[  210.451123]  drm_gem_shmem_object_vmap+0x9/0x20
[  210.451128]  drm_gem_vmap+0x26/0x80
[  210.451134]  drm_gem_vmap_unlocked+0x2b/0x50
[  210.451138]  drm_gem_fb_vmap+0x40/0x150
[  210.451146]  drm_gem_begin_shadow_fb_access+0x25/0x40
[  210.451152]  drm_atomic_helper_prepare_planes.part.0+0x142/0x1e0
[  210.451158]  drm_atomic_helper_prepare_planes+0x5d/0x70
[  210.451164]  drm_atomic_helper_commit+0x84/0x160
[  210.451170]  drm_atomic_commit+0x99/0xd0
[  210.451177]  ? __pfx___drm_printfn_info+0x10/0x10
[  210.451182]  drm_atomic_helper_set_config+0x82/0xd0
[  210.451188]  drm_mode_setcrtc+0x535/0x8b0
[  210.451197]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[  210.451202]  drm_ioctl_kernel+0xbc/0x120
[  210.451209]  drm_ioctl+0x2d4/0x550
[  210.451213]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[  210.451221]  __x64_sys_ioctl+0xa3/0xf0
[  210.451242]  x64_sys_call+0x143b/0x25c0
[  210.451251]  do_syscall_64+0x7f/0x180
[  210.451263]  ? _nv000722kms+0xe0/0xe0 [nvidia_modeset]
[  210.451313]  ? _nv010260rm+0x52/0xa0 [nvidia]
[  210.451931]  ? check_heap_object+0x186/0x1e0
[  210.451938]  ? ptep_set_access_flags+0x4a/0x70
[  210.451945]  ? wp_page_reuse+0x95/0xc0
[  210.451951]  ? do_wp_page+0xed/0x490
[  210.451956]  ? handle_pte_fault+0x1be/0x1d0
[  210.451961]  ? __handle_mm_fault+0x653/0x790
[  210.451967]  ? __count_memcg_events+0x6b/0x120
[  210.451972]  ? count_memcg_events.constprop.0+0x2a/0x50
[  210.451978]  ? handle_mm_fault+0xad/0x380
[  210.451983]  ? irqentry_exit_to_user_mode+0x7e/0x260
[  210.451999]  ? irqentry_exit+0x43/0x50
[  210.452004]  ? exc_page_fault+0x94/0x1b0
[  210.452010]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  210.452017] RIP: 0033:0x7ebc5f924ded
[  210.452034] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[  210.452038] RSP: 002b:00007fffae7f1250 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  210.452043] RAX: ffffffffffffffda RBX: 000064de0dc0f770 RCX: 00007ebc5f924ded
[  210.452046] RDX: 00007fffae7f12e0 RSI: 00000000c06864a2 RDI: 0000000000000017
[  210.452049] RBP: 00007fffae7f12a0 R08: 0000000000000000 R09: 000064de0da599d0
[  210.452051] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffae7f12e0
[  210.452054] R13: 00000000c06864a2 R14: 0000000000000017 R15: 000064de0db66a28
[  210.452059]  </TASK>
[  210.452062] ---[ end trace 0000000000000000 ]---
[  211.989394] rfkill: input handler disabled

上記の冒頭部分を見ると、GPUに関する問題 (drivers/gpu/drm/drm_gem_shmem_helper.c) のクラッシュであることが分かります。
[ 166.232339] simple-framebuffer simple-framebuffer.0: drm_WARN_ON(map->is_iomem)
[ 166.232360] WARNING: CPU: 2 PID: 1331 at drivers/gpu/drm/drm_gem_shmem_helper.c:319 drm_gem_shmem_vmap+0x1a5/0x1e0

バグと修正の情報が記載されているページ

  • バグに関する情報が記されていたページ

こちらのページにバグに関する情報が記載されています。
Bug #2062426 “simple-framebuffer: swiotlb buffer is full” : Bugs : nvidia-graphics-drivers-550 package : Ubuntu (launchpad.net)
この #12 に以下の投稿がありました。

This might be fixed by ubuntu-drivers-common in Oracular now (via bug 2060268).
  • bug 2060268 と 修正パッケージ

Bug #2060268 “Phantom “Unknown Display” shown in Settings after …” : Bugs : linux package : Ubuntu (launchpad.net)
上記リンク先の #94 に以下の投稿がありました。

You don't have the fix installed. The fix is in:

  ubuntu-drivers-common 1:0.9.7.6ubuntu3.1

and not in:

  ubuntu-drivers-common 1:0.9.7.6ubuntu3

ubuntu-drivers-common 1:0.9.7.6ubuntu3.1 の結果

ubuntu-drivers-common 1:0.9.7.6ubuntu3.1 をインストールした後の dmesg はこちらです。クラッシュは修正されていました。

[   30.498863] nvidia: loading out-of-tree module taints kernel.
[   30.498876] nvidia: module license 'NVIDIA' taints kernel.
[   30.498878] Disabling lock debugging due to kernel taint
[   30.498881] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   30.498882] nvidia: module license taints kernel.
[   30.594165] nvidia-nvlink: Nvlink Core is being initialized, major device number 238

[   30.594839] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[   31.899000] EXT4-fs (sdb2): mounted filesystem ff83d6c5-4ddc-4c0b-9b23-b04fe166f7f5 r/w with ordered data mode. Quota mode: none.
[   32.148313] audit: type=1400 audit(1725778313.354:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="1password" pid=809 comm="apparmor_parser"
[   32.148323] audit: type=1400 audit(1725778313.354:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name=4D6F6E676F444220436F6D70617373 pid=811 comm="apparmor_parser"
[   32.148326] audit: type=1400 audit(1725778313.354:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="Discord" pid=810 comm="apparmor_parser"
[   32.148328] audit: type=1400 audit(1725778313.354:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="QtWebEngineProcess" pid=812 comm="apparmor_parser"
[   32.151036] audit: type=1400 audit(1725778313.356:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="brave" pid=813 comm="apparmor_parser"
[   32.151046] audit: type=1400 audit(1725778313.357:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="buildah" pid=814 comm="apparmor_parser"
[   32.151049] audit: type=1400 audit(1725778313.357:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="busybox" pid=815 comm="apparmor_parser"
[   32.151259] audit: type=1400 audit(1725778313.357:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="cam" pid=816 comm="apparmor_parser"
[   32.153002] audit: type=1400 audit(1725778313.359:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="ch-checkns" pid=817 comm="apparmor_parser"
[   32.153241] audit: type=1400 audit(1725778313.359:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="ch-run" pid=818 comm="apparmor_parser"
[   33.289940] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.256.02  Thu May  2 14:37:44 UTC 2024
[   33.329715] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.256.02  Thu May  2 14:50:40 UTC 2024
[   33.411326] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   33.411334] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[   33.667027] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[   33.703383] nvidia-uvm: Loaded the UVM driver, major device number 236.
[   38.531323] NET: Registered PF_QIPCRTR protocol family
[   38.718626] loop6: detected capacity change from 0 to 8
[   39.689629] kauditd_printk_skb: 119 callbacks suppressed
[   39.689633] audit: type=1400 audit(1725778320.895:131): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=1169 comm="apparmor_parser"
[   39.695514] audit: type=1400 audit(1725778320.901:132): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=1169 comm="apparmor_parser"
[   41.614468] RTL8211DN Gigabit Ethernet r8169-0-400:00: attached PHY driver (mii_bus:phy_addr=r8169-0-400:00, irq=MAC)
[   41.865855] r8169 0000:04:00.0 enp4s0: Link is Down
[   42.806809] audit: type=1400 audit(1725778324.012:133): apparmor="DENIED" operation="capable" class="cap" profile="/usr/sbin/cupsd" pid=1248 comm="cupsd" capability=12  capname="net_admin"
[   44.006273] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control off
[  156.627790] resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
[  156.627799] caller os_map_kernel_space+0xf4/0x120 [nvidia] mapping multiple BARs
[  169.808190] rfkill: input handler disabled
[  183.274769] systemd-journald[415]: /var/log/journal/7fc2462810e74b9d90feb61551522fdc/user-1000.journal: Journal file uses a different sequence number ID, rotating.
[  183.791583] rfkill: input handler enabled
[  188.125410] rfkill: input handler disabled