Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ 2687.794321] opal: Hardware platform error: Unrecoverable HMI exception #174

Open
pridhiviraj opened this issue Apr 22, 2018 · 4 comments

Comments

@pridhiviraj
Copy link
Contributor

pridhiviraj commented Apr 22, 2018

By running TOD error recovery stress test, hitting a platform error followed by a system reboot.

[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:12 T:1: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:12 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:22 T:1: TFMR(2a12000980a94000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:7 T:0: TFMR(2a12000980a85000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:15 T:1: TFMR(2a12000980a94000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:23 T:3: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:15 T:2: TFMR(2a12000980a94000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:23 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:23 T:1: TFMR(2a12000980a85000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:23 T:0: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:20 T:0: TFMR(2a12000980a85000) Timer Facility Error
[    1.039373791,3] CHIPTOD: TB "Not Set" TOD in error state
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:13 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:20 T:3: TFMR(2a12000980a85000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:20 T:1: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:20 T:2: TF2000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:13 T:1: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:14 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[    1.039373791,4] HMI: Failed to get TB in running state! CPU=85f, TFMR=2a12800980a04000
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:14 T:3: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:14 T:0: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:14 T:1: TFMR(2a12000980a84000) Timer Facility Error
[    1.039373791,4] HMI: Failed to get TB in running state! CPU=85d, TFMR=2a12800980a04000
[    1.039373791,4] HMI: Failed to get TB in running state! CPU=85c, TFMR=2a12800980a04000
[    1.039373791,4] HMI: Failed to get TB in running state! CPU=85e, TFMR=2a12800980a04000
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:6 T:1: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:6 T:0: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:6 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:7 T:2: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:7 T:1: TFMR(2a12000980a84000) Timer Facility Error
[ 2853.104254431,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[ 2853.104254431,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc1]: P:8 C:7 T:3: TFMR(2a12000980a84000) Timer Facility Error
[ 2687.794321] opal: Hardware platform error: Unrecoverable HMI exception
[ 2688.909240] WARNING: CPU: 118 PID: 904 at /build/linux-GKZ1fU/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250
[ 2688.909365] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter input_leds joydev idt_89hpesx mac_hid ofpart cmdlinepart powernv_flash at24 ipmi_powernv ipmi_devintf uio_pdrv_genirq ipmi_msghandler mtd uio opal_prd vmx_crypto ibmpowernv kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast i2c_algo_bit hid_generic
[ 2688.910069]  ttm mpt3sas drm_kms_helper syscopyarea sysfillrect sysimgblt usbhid fb_sys_fops nvme hid crct10dif_vpmsum raid_class crc32c_vpmsum drm nvme_core i40e aacraid scsi_transport_sas
[ 2688.910249] CPU: 118 PID: 904 Comm: kworker/118:1 Not tainted 4.15.0-18-generic #19-Ubuntu
[ 2688.910327] Workqueue: events hmi_event_handler
[ 2688.910377] NIP:  c00000000014d6e0 LR: c00000000014e30c CTR: c00000000015a240
[ 2688.910450] REGS: c000203974b26f50 TRAP: 0700   Not tainted  (4.15.0-18-generic)
[ 2688.910536] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008444  XER: 00000000
[ 2688.910634] CFAR: c00000000014d54c SOFTE: 0 
[ 2688.910634] GPR00: c00000000014e30c c000203974b271d0 c0000000016eae00 c000003e982d5900 
[ 2688.910634] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[ 2688.910634] GPR08: c000000001721ee0 0000000000000000 0000000000000000 0000000000005d04 
[ 2688.910634] GPR12: 0000000028008244 c00000000fad1200 c00000000013c788 c0002039759510c0 
[ 2688.910634] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000203993860000 
[ 2688.910634] GPR20: c000203994a235c0 0000000000000000 0000000000000000 c000203974b27350 
[ 2688.910634] GPR24: c000003e982d5d28 c00000000171dd78 c0000000011d8580 0000000000000000 
[ 2688.910634] GPR28: 0000000000000004 0000000000000000 0000000000000000 c000003e982d5900 
[ 2688.911257] NIP [c00000000014d6e0] set_task_cpu+0x240/0x250
[ 2688.911307] LR [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[ 2688.911367] Call Trace:
[ 2688.911397] [c000203974b271d0] [c0000000011d8580] runqueues+0x0/0xc00 (unreliable)
[ 2688.911472] [c000203974b27210] [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[ 2688.911548] [c000203974b27290] [c0000000001725d8] autoremove_wake_function+0x28/0x70
[ 2688.911622] [c000203974b272c0] [c000000000171b60] __wake_up_common+0xd0/0x200
[ 2688.911697] [c000203974b27330] [c000000000171d4c] __wake_up_common_lock+0xbc/0x110
[ 2688.911771] [c000203974b273c0] [c00000000018ea40] wake_up_klogd_work_func+0x60/0xc0
[ 2688.911846] [c000203974b273f0] [c000000000295d10] irq_work_run_list+0xb0/0x100
[ 2688.911937] [c000203974b27430] [c0000000001b7220] update_process_times+0x60/0x90
[ 2688.912011] [c000203974b27460] [c0000000001cef54] tick_sched_handle.isra.5+0x34/0xd0
[ 2688.912084] [c000203974b27490] [c0000000001cf050] tick_sched_timer+0x60/0xe0
[ 2688.912158] [c000203974b274d0] [c0000000001b7db4] __hrtimer_run_queues+0x144/0x370
[ 2688.912232] [c000203974b27550] [c0000000001b8d0c] hrtimer_interrupt+0xfc/0x350
[ 2688.912307] [c000203974b27620] [c0000000000248f0] __timer_interrupt+0x90/0x260
[ 2688.912382] [c000203974b27670] [c000000000024d08] timer_interrupt+0x98/0xe0
[ 2688.912446] [c000203974b276a0] [c000000000009014] decrementer_common+0x114/0x120
[ 2688.912522] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[ 2688.912522]     LR = arch_local_irq_restore+0x74/0x90
[ 2688.912619] [c000203974b27990] [c000203974b279d0] 0xc000203974b279d0 (unreliable)
[ 2688.912695] [c000203974b279b0] [c000000000cff180] _raw_spin_unlock_irqrestore+0x40/0xa0
[ 2688.912770] [c000203974b279d0] [c00000000057d5fc] pstore_dump+0x31c/0x3e0
[ 2688.918818] [c000203974b27b10] [c00000000018ec64] kmsg_dump+0x134/0x1a0
[ 2688.925755] [c000203974b27b70] [c0000000000a2f14] pnv_platform_error_reboot+0x94/0x110
[ 2688.934081] [c000203974b27be0] [c0000000000a842c] hmi_event_handler+0x1bc/0x1c0
[ 2688.941023] [c000203974b27c90] [c000000000133958] process_one_work+0x298/0x5a0
[ 2688.947953] [c000203974b27d20] [c000000000133cf8] worker_thread+0x98/0x630
[ 2688.954876] [c000203974b27dc0] [c00000000013c928] kthread+0x1a8/0x1b0
[ 2688.961814] [c000203974b27e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
[ 2688.968751] Instruction dump:
[ 2688.971525] 7faa3670 7d4a0194 57a706be 7d4a07b4 794a1f24 7d28502a 7d293c36 71290001 
[ 2688.979834] 4082fe80 60000000 60000000 60420000 <0fe00000> 4bfffe6c 60000000 60420000 
[ 2688.988160] ---[ end trace c3fbb130bac0f441 ]---
[ 2855.374688141,0] OPAL: Reboot requested due to Platform error.
[ 2855.377232501,3] OPAL: Reboot requested due to Platform error.[ 2855.380086941,5] Software initiated checkstop disabled.
[ 2689.3640[ 2855.383112058,5] OPAL: Reboot request...
36] opal: Reboot type 1 not supported for Unrecoverable HMI exception


--== Welcome to Hostboot  ==--

  2.77291|secure|SecureROM valid - enabling functionality
  5.37968|Ignoring boot flags, incorrect version 0x0
  5.38625|Booting from SBE side 0 on master proc=00050000
  5.58026|ISTEP  6. 5 - host_init_fsi
  5.68847|ISTEP  6. 6 - host_set_ipl_parms
  5.94385|ISTEP  6. 7 - host_discover_targets
  7.74862|HWAS|PRESENT> DIMM[03]=F0F0000000000000
  7.74863|HWAS|PRESENT> Proc[05]=8800000000000000
  7.74864|HWAS|PRESENT> Core[07]=03FFCFCFCF0F0000
  7.94847|ISTEP  6. 8 - host_update_master_tpm
 15.48415|SECURE|Security Access Bit> 0xC000000000000000

OPAL level:

[   70.104446560,5] OPAL skiboot-v5.11-70-g5307c0ec7899-pc34e21f starting...
@maheshsal
Copy link
Contributor

Looks like the TB resync has failed and hence the hmi recovery. TOD error will generate HMI on all the cores. Can you reproduce this issue with a TB error that generates HMI on one single core and then loop over to see after what count TB fails to re-sync ?

@pridhiviraj
Copy link
Contributor Author

pridhiviraj commented Apr 23, 2018

@maheshsal Yes, after 20k TB error's on a single core it is hitting the same issue.

0000080000000000
18517
0000080000000000
18518
0000080000000000
18519
0000080000000000
18520

[console-pexpect]#
[console-pexpect]#[    1.374114245,3] CHIPTOD: TB "Not Set" TOD in error state
[18014398511.108890053,4] HMI: Failed to get TB in running state! CPU=41, TFMR=2a12800980a44000
[18014398511.108890053,4] HMI: Failed to get TB in running state! CPU=43, TFMR=2a12800980a44000
[18014398511.108890053,4] HMI: Failed to get TB in running state! CPU=40, TFMR=2a12800980a44000
[18014398511.108890053,4] HMI: Failed to get TB in running state! CPU=42, TFMR=2a12800980a44000
[66901.841558] opal: Hardware platform error: Unrecoverable HMI exception
[66903.108438] WARNING: CPU: 42 PID: 826 at /build/linux-GKZ1fU/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250
[66903.108541] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter devlink ip6_tables iptable_filter joydev input_leds mac_hid ofpart idt_89hpesx cmdlinepart powernv_flash mtd ipmi_powernv ipmi_devintf vmx_crypto at24 uio_pdrv_genirq uio ipmi_msghandler ibmpowernv opal_prd kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast i2c_algo_bit ttm hid_generic
[66903.109196]  drm_kms_helper mpt3sas syscopyarea sysfillrect sysimgblt fb_sys_fops nvme usbhid crct10dif_vpmsum raid_class hid crc32c_vpmsum drm nvme_core i40e aacraid scsi_transport_sas
[66903.109352] CPU: 42 PID: 826 Comm: kworker/42:1 Not tainted 4.15.0-18-generic #19-Ubuntu
[66903.109424] Workqueue: events hmi_event_handler
[66903.109471] NIP:  c00000000014d6e0 LR: c00000000014e30c CTR: c00000000015a240
[66903.109538] REGS: c0000000fe66af50 TRAP: 0700   Not tainted  (4.15.0-18-generic)
[66903.109603] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 28008424  XER: 00000000
[66903.109707] CFAR: c00000000014d54c SOFTE: 0 
[66903.109707] GPR00: c00000000014e30c c0000000fe66b1d0 c0000000016eae00 c00020395bcac900 
[66903.109707] GPR04: 0000000000000040 0000000000000040 0000000000000000 0000000000000000 
[66903.109707] GPR08: c000000001721ee0 0000000000000000 0000000000000008 00000000000748d0 
[66903.109707] GPR12: 0000000028008224 c00000000fa9ce00 c00000000013c788 c000003fe0b0ca80 
[66903.109707] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000003fee840000 
[66903.109707] GPR20: c000003fefa035c0 0000000000000000 0000000000000000 c0000000fe66b350 
[66903.109707] GPR24: c00020395bcacd28 c00000000171dd78 c0000000011d8580 0000000000000000 
[66903.109707] GPR28: 0000000000000004 0000000000000040 0000000000000040 c00020395bcac900 
[66903.110286] NIP [c00000000014d6e0] set_task_cpu+0x240/0x250
[66903.110332] LR [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[66903.110387] Call Trace:
[66903.110415] [c0000000fe66b1d0] [c0000000011d8580] runqueues+0x0/0xc00 (unreliable)
[66903.110484] [c0000000fe66b210] [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[66903.110554] [c0000000fe66b290] [c0000000001725d8] autoremove_wake_function+0x28/0x70
[66903.110622] [c0000000fe66b2c0] [c000000000171b60] __wake_up_common+0xd0/0x200
[66903.110691] [c0000000fe66b330] [c000000000171d4c] __wake_up_common_lock+0xbc/0x110
[66903.110760] [c0000000fe66b3c0] [c00000000018ea40] wake_up_klogd_work_func+0x60/0xc0
[66903.110830] [c0000000fe66b3f0] [c000000000295d10] irq_work_run_list+0xb0/0x100
[66903.110901] [c0000000fe66b430] [c0000000001b7220] update_process_times+0x60/0x90
[66903.110970] [c0000000fe66b460] [c0000000001cef54] tick_sched_handle.isra.5+0x34/0xd0
[66903.111037] [c0000000fe66b490] [c0000000001cf050] tick_sched_timer+0x60/0xe0
[66903.111120] [c0000000fe66b4d0] [c0000000001b7db4] __hrtimer_run_queues+0x144/0x370
[66903.111188] [c0000000fe66b550] [c0000000001b8d0c] hrtimer_interrupt+0xfc/0x350
[66903.111258] [c0000000fe66b620] [c0000000000248f0] __timer_interrupt+0x90/0x260
[66903.111327] [c0000000fe66b670] [c000000000024d08] timer_interrupt+0x98/0xe0
[66903.111387] [c0000000fe66b6a0] [c000000000009014] decrementer_common+0x114/0x120
[66903.111460] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[66903.111460]     LR = arch_local_irq_restore+0x74/0x90
[66903.111551] [c0000000fe66b990] [c0000000fe66b9d0] 0xc0000000fe66b9d0 (unreliable)
[66903.111622] [c0000000fe66b9b0] [c000000000cff180] _raw_spin_unlock_irqrestore+0x40/0xa0
[66903.111691] [c0000000fe66b9d0] [c00000000057d5fc] pstore_dump+0x31c/0x3e0
[66903.118016] [c0000000fe66bb10] [c00000000018ec64] kmsg_dump+0x134/0x1a0
[66903.124952] [c0000000fe66bb70] [c0000000000a2f14] pnv_platform_error_reboot+0x94/0x110
[66903.131898] [c0000000fe66bbe0] [c0000000000a842c] hmi_event_handler+0x1bc/0x1c0
[66903.140209] [c0000000fe66bc90] [c000000000133958] process_one_work+0x298/0x5a0
[66903.147152] [c0000000fe66bd20] [c000000000133cf8] worker_thread+0x98/0x630
[66903.154074] [c0000000fe66bdc0] [c00000000013c928] kthread+0x1a8/0x1b0
[66903.159636] [c0000000fe66be30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
[66903.167942] Instruction dump:
[66903.170729] 7faa3670 7d4a0194 57a706be 7d4a07b4 794a1f24 7d28502a 7d293c36 71290001 
[66903.177660] 4082fe80 60000000 60000000 60420000 <0fe00000> 4bfffe6c 60000000 60420000 
[66903.185965] ---[ end trace 739c2a2d69ccdc2b ]---
[67070.201983351,0] OPAL: Reboot requested due to Platform error.
[67070.205232792,3] OPAL: Reboot requested due to Platform error.[67070.208087613,5] Software initiated checkstop disabled.
[6[67070.210438942,5] OPAL: Reboot request...
6903.561762] opal: Reboot type 1 not supported for Unrecoverable HMI exception

@pridhiviraj
Copy link
Contributor Author

pridhiviraj commented Apr 23, 2018

On a second re-create also at the same count system got hung this time.

[   90.410845] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.416395]  ErroLOCK ERROR: Unlocked non-owned lock @0x30303698 (state: 0x0000004000000001)
[18014398766.211691469,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[18014398766.211691469,7] HMI: [Loc: UOPWR.1302LDA-Node0-Proc0]: P:0 C:16 T:3: TFMR(2a12000980a44000) Timer Facility Error
[  256.481751932,3] OPAL exiting with locks held, token=145 retval=0
[  256.481756229,3]   core/lock.c:216
[  256.481758491,3]   core/lock.c:216
[18014398766.211691469,0] Aborting!
r detail: TiCPU 0040 Backtrace:
 S: 0000000031e07910 R: 000000003001a4f8 E ._abort+0x4c
 S: 0000000031e07990 R: 0000000030017e88 E .lock_error+0x64
 S: 0000000031e07a10 R: 0000000030017988 E .unlock+0x60
 S: 0000000031e07a80 R: 0000000030017b40 E .lock_caller+0xf4
 S: 0000000031e07b30 R: 0000000030039b60 E .__uart_do_poll+0x40
 S: 0000000031e07c20 R: 000000003001ba70 E .opal_run_pollers+0x168
 S: 0000000031e07ca0 R: 000000003001bafc E .opal_poll_events+0x74
 S: 0000000031e07d20 R: 00000000300051e4 E opal_entry+0x134
 --- OPAL call token: 0xa caller R1: 0xc000003fe6a77a30 ---
mer facility experienced an error
[   90.422007] 	HMER: 0840000000000000
[   90.426216] 	TFMR: 2a12000980a44000
[   90.428871] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.435800]  Error detail: Timer facility experienced an error
[   90.441469] 	HMER: 0840000000000000
[   90.444120] 	TFMR: 2a12000980a54000
[   90.448281] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.453833]  Error detail: Timer facility experienced an error
[   90.459507] 	HMER: 0840000000000000
[   90.463537] 	TFMR: 2a12000980a54000
[   90.466319] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.473242]  Error detail: Timer facility experienced an error
[   90.478941] 	HMER: 0840000000000000
[   90.481570] 	TFMR: 2a12000980a44000
[   90.485722] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.491279]  Error detail: Timer facility experienced an error
[   90.496955] 	HMER: 0840000000000000
[   9[  256.476915661,0] Assert fail: core/mem_region.c:444:lock_held_by_me(&region->free_list_lock)
0.500979] 	TFMR: 2a12000980a44000
[   90.503752] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.509306]  Error detail: Timer facility experienced an error
[   90.516383] 	HMER: 0840000000000000
[   90.519016] 	TFMR: 2a12000980a44000
[   90.523163] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.528719]  Error detail: Timer facility experienced an error
[   90.534420] 	HMER: 0840000000000000
[   90.538418] 	TFMR: 2a12000980a54000
[   90.541197] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.546747]  Error detail: Timer facility experienced an error
[   90.553675] 	HMER: 0840000000000000
[   90.556448] 	TFMR: 2a12000980a44000
[   90.560606] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.566279]  Error detail: Timer facility experienced an error
[   90.571708] 	HMER: 0840000000000000
[   90.574484] 	TFMR: 2a12000980a54000
[   90.578635] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.584245]  Error detail: Timer facility experienced an error
[   90.589738] 	HMER: 0840000000000000
[   90.593888] 	TFMR: 2a12000980a54000
[   90.596664] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.603734]  Error detail: Timer facility experienced an error
[   90.609148] 	HMER: 0840000000000000
[   90.611916] 	TFMR: 2a12000980a44000
[   90.616103] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.621632]  Error detail: Timer facility experienced an error
[   90.627178] 	HMER: 0840000000000000
[   90.631331] 	TFMR: 2a12000980a44000
[   90.634145] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.641035]  Error detail: Timer facility experienced an error
[   90.646593] 	HMER: 0840000000000000
[   90.649362] 	TFMR: 2a12000980a54000
[   90.653689] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.659068]  Error detail: Timer facility experienced an error
[   90.664621] 	HMER: 0840000000000000
[   90.668769] 	TFMR: 2a12000980a44000
[   90.671584] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.677099]  Error detail: Timer facility experienced an error
[   90.684024] 	HMER: 0840000000000000
[   90.686806] 	TFMR: 2a12000980a44000
[   90.691016] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.696514]  Error detail: Timer facility experienced an error
[   90.702060] 	HMER: 0840000000000000
[   90.706207] 	TFMR: 2a12000980a54000
[   90.708994] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.714674]  Error detail: Timer facility experienced an error
[   90.720088] 	HMER: 0840000000000000
[   90.724239] 	TFMR: 2a12000980a44000
[   90.727020] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.733942]  Error detail: Timer facility experienced an error
[   90.739503] 	HMER: 0840000000000000
[   90.742275] 	TFMR: 2a12000980a44000
[   90.746464] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.751982]  Error detail: Timer facility experienced an error
[   90.757531] 	HMER: 0840000000000000
[   90.761680] 	TFMR: 2a12000980a44000
[   90.764552] Severe Hypervisor Maintenance interrupt [Recovered]
[   90.771385]  [  256.476915661,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 0 (sptr=00001ccc)
[18014398766.211691469,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 1 (sptr=00001ccc)
[  256.476915661,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 2 (sptr=00001ccc)
[18014398514.386672589,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 0 (sptr=00002ccc)
[18014398514.386672589,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 1 (sptr=00002ccc)
[    5.139896781,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 2 (sptr=00002ccc)
[    5.139896781,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 0 (sptr=00003ccc)
[    5.139896781,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 1 (sptr=00003ccc)
[18014398514.386672589,3] HMI: Rendez-vous stage 1 timeout, CPU 0x43 waiting for thread 2 (sptr=00003ccc)
[    5.139896781,4] HMI: Failed to get TB in running state! CPU=43, TFMR=2a12000980a44000

count:

0000080000000000
18282
0000080000000000
18283
0000080000000000

@ghost
Copy link

ghost commented Apr 23, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants