Skip to content

inventory collection should ignore errors that mean a slot has no valid caboose #8044

Open
@wfchandler

Description

@wfchandler

After performing the upgrade to R14, a PSC that was added to the rack at a later date is reporting an error in inventory collection:

Error response from SP: failed to read data from the caboose

Full details from inventory show:

collection: 757558e3-243f-4667-88f6-f55cb90f1529
collector:  9c9e005d-29e9-42b0-b294-4b22cab216dc (likely a Nexus instance)
started:    2025-04-24T18:33:13.338Z
done:       2025-04-24T18:33:44.923Z
errors:     2
  error 0: MGS "http://[fd00:1122:3344:106::2]:12225": SP SpIdentifier { slot: 1, type_: Power }: caboose Stage0Next: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "ba7e4c15-52e0-4e1e-8eb6-0dce689cc5e5", "content-length": "242", "date": "Thu, 24 Apr 2025 18:33:33 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Power, slot: 1 }: Error response from SP: failed to read data from the caboose", request_id: "ba7e4c15-52e0-4e1e-8eb6-0dce689cc5e5" }
  error 1: MGS "http://[fd00:1122:3344:10a::2]:12225": SP SpIdentifier { slot: 1, type_: Power }: caboose Stage0Next: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "8ebb0096-e188-42f5-a282-0b71b93065f2", "content-length": "242", "date": "Thu, 24 Apr 2025 18:33:36 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Power, slot: 1 }: Error response from SP: failed to read data from the caboose", request_id: "8ebb0096-e188-42f5-a282-0b71b93065f2" }

SP and RoT state:

support@oxz_switch0:~$ pilot sp exec -e 'state' BRM11230005
Apr 24 18:46:32.025 INFO creating SP handle on interface psc1, component: faux-mgs
Apr 24 18:46:32.026 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe06:169%77]:11111, interface: psc1, component: faux-mgs
Apr 24 18:46:32.032 INFO V2(SpStateV2 { hubris_archive_id: [2, 11, 186, 167, 122, 72, 124, 190], serial_number: [66, 82, 77, 49, 49, 50, 51, 48, 48, 48, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], model: [57, 49, 51, 45, 48, 48, 48, 48, 48, 48, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], revision: 9, base_mac_address: [168, 64, 37, 6, 1, 104], power_state: A2, rot: Ok(RotStateV2 { active: B, persistent_boot_preference: B, pending_persistent_boot_preference: None, transient_boot_preference: None, slot_a_sha3_256_digest: Some([35, 129, 249, 66, 225, 121, 128, 142, 202, 145, 227, 186, 213, 51, 169, 122, 134, 226, 5, 120, 63, 37, 92, 222, 242, 115, 167, 176, 195, 32, 4, 79]), slot_b_sha3_256_digest: Some([228, 73, 120, 241, 95, 148, 75, 64, 9, 80, 222, 176, 55, 159, 77, 166, 126, 247, 49, 56, 141, 16, 11, 69, 254, 89, 249, 4, 60, 212, 88, 249]) }) }), component: faux-mgs
hubris archive: 020bbaa77a487cbe
serial number: BRM11230005
model: 913-0000003
revision: 9
base MAC address: a8:40:25:06:01:68
power state: A2
rot: Ok(RotStateV2 {
active: B,
persistent_boot_preference: B,
pending_persistent_boot_preference: None,
transient_boot_preference: None,
slot_a_sha3_256_digest: Some("2381f942e179808eca91e3bad533a97a86e205783f255cdef273a7b0c320044f"),
slot_b_sha3_256_digest: Some("e44978f15f944b400950deb0379f4da67ef731388d100b45fe59f9043cd458f9"),
}

)

support@oxz_switch0:~$ pilot sp exec -e 'rot-boot-info' BRM11230005
Apr 24 18:48:08.415 INFO creating SP handle on interface psc1, component: faux-mgs
Apr 24 18:48:08.416 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe06:169%77]:11111, interface: psc1, component: faux-mgs
Apr 24 18:48:08.424 INFO V3(RotStateV3 { active: B, persistent_boot_preference: B, pending_persistent_boot_preference: None, transient_boot_preference: None, slot_a_fwid: Sha3_256([23, 81, f9, 42, e1, 79, 80, 8e, ca, 91, e3, ba, d5, 33, a9, 7a, 86, e2, 5, 78, 3f, 25, 5c, de, f2, 73, a7, b0, c3, 20, 4, 4f]), slot_b_fwid: Sha3_256([e4, 49, 78, f1, 5f, 94, 4b, 40, 9, 50, de, b0, 37, 9f, 4d, a6, 7e, f7, 31, 38, 8d, 10, b, 45, fe, 59, f9, 4, 3c, d4, 58, f9]), stage0_fwid: Sha3_256([d4, d2, ad, ff, 3f, e9, 1e, 6b, dd, 86, 18, b4, 6f, 2b, 42, f9, 14, 2e, 25, 52, 2f, 8b, 8f, 24, 2d, d6, 23, 36, a6, b3, f7, 49]), stage0next_fwid: Sha3_256([a7, ff, c6, f8, bf, 1e, d7, 66, 51, c1, 47, 56, a0, 61, d6, 62, f5, 80, ff, 4d, e4, 3b, 49, fa, 82, d8, a, 4b, 80, f8, 43, 4a]), slot_a_status: Ok(()), slot_b_status: Ok(()), stage0_status: Ok(()), stage0next_status: Err(FirstPageErased) }), component: faux-mgs
 RotBootInfo { V3(RotStateV3 {
active: B,
persistent_boot_preference: B,
pending_persistent_boot_preference: None,
transient_boot_preference: None,
slot_a_fwid: Fwid::Sha3_256("2381f942e179808eca91e3bad533a97a86e205783f255cdef273a7b0c320044f"),
slot_b_fwid: Fwid::Sha3_256("e44978f15f944b400950deb0379f4da67ef731388d100b45fe59f9043cd458f9"),
stage0_fwid: Fwid::Sha3_256("d4d2adff3fe91e6bdd8618b46f2b42f9142e25522f8b8f242dd62336a6b3f749"),
stage0next_fwid: Fwid::Sha3_256("a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a"),
slot_a_status: Ok(()),
slot_b_status: Ok(()),
stage0_status: Ok(()),
stage0next_status: Err(FirstPageErased)
})}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that isn't working.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions