The dataset has four files:
train.zip / test.zip includes the log files named by serial_number of DIMM,
collected via mcelog in 23
columns. The columns are defined as follows:
Field
|
Type
|
Description
|
cpuid |
integer
|
The CPU ID, note that a server attaches multiple CPUs |
channelid |
integer
|
The Channel ID, note that a CPU has multiple channels |
dimmid |
integer
|
The DIMM ID, note that a channel attaches multiple DIMMs |
rankid |
integer
|
The rank ID, range from 0 to 1, each DIMM has 1 or 2 ranks |
deviceid |
integer
|
The device ID, range from 0 to 17, each DIMM has multiple devices |
bankgroupid |
integer
|
The bank group ID of DRAM |
bankid |
integer
|
The bank ID of DRAM |
rowid |
integer
|
The row ID of DRAM |
columnid |
integer
|
The column ID of DRAM |
retryrderrlogparity |
integer
|
The parity info on retried reads in decimal format |
retryrderrlog |
integer
|
Logs info on retried reads in decimal format, validating error types and parity.
Hint: If the result of logical AND operation between retryrderrlog and 0x0001 is 1 (e.g., retryrderrlog & 0x0001 = 1), it indicates RETRY_RD_ERR_LOG_PARITY is valid.
|
burst_info |
integer
|
The decoded parity error bits in memory DQ and Beat |
error_type |
integer
|
The error type including read and scrubbing error |
log_time |
string
|
The time when the error is detected in timestamp |
manufacturter |
category
|
The server manufacturer in anonymized format |
model |
category
|
The CPU model in anonymized format |
PN |
category
|
The part number of DIMMs in anonymized format |
Capacity |
integer
|
The capacity of DIMM |
FrequencyMHz |
integer
|
Base frequency of the CPU resource, in MHz |
MaxSpeedMHz |
integer
|
Maximum frequency of the CPU resource |
McaBank |
category
|
Machine Check Architecture bank code of the CPU |
memory_type |
string
|
The type of DIMM, e.g., DDR4 |
region |
category
|
The region of server location in anonymized format |
failure_ticket.csv includes the failures of each DIMM
in this dataset in 3
columns. The columns are defined as follows:
Field
|
Type
|
Description
|
serial_number |
string
|
The DIMM ID |
failure_time |
string
|
The time when the DIMM
failures happened in timestamp
|
serial_number_type |
string
|
The server type of
failure, including A and B
|
Note that we have annoymized the DIMM manufacturer, part number, etc to avoid sensitive
information being inferred.
submission.csv includes the serial_number (sn_name),
serial_number_type and prediction_timestamp, where prediction_timestamp represents the time when
DIMM failures are predicted. The submission format allows multiple timestamps predictions as
below:
sn_name |
prediction_timestamp |
serial_number_type |
sn_1xx |
1708987825 |
A |
sn_1xx |
1708987853 |
A |
sn_2xx |
1716206512 |
B |