Go Wiki: LinuxKernelSignalVectorBug

引言

如果您看到此頁面是因為 Go 程式列印了類似這樣的訊息，

runtime: note: your Linux kernel may be buggy
runtime: note: see https://golang.com.tw/wiki/LinuxKernelSignalVectorBug
runtime: note: mlock workaround for kernel bug failed with errno <number>

那麼您正在使用可能存在 bug 的 Linux 核心。此核心 bug 可能已導致您的 Go 程式記憶體損壞，並可能導致您的 Go 程式崩潰。

如果您明白程式崩潰的原因，則可以忽略此頁面。

否則，此頁面將解釋核心 bug 是什麼，並提供一個 C 程式，您可以使用它來檢查您的核心是否存在此 bug。

Bug 描述

Linux 核心版本 5.2 中引入了一個 bug：如果向執行緒傳遞訊號，並且傳遞訊號需要將執行緒訊號棧的頁面置為無效，則在返回訊號到程式時，AVX YMM 暫存器可能會損壞。如果程式正在執行使用 YMM 暫存器的函式，則該函式的行為可能不可預測。

此 bug 僅發生在具有 x86 處理器的系統上。此 bug 會影響用任何語言編寫的程式。此 bug 僅影響接收訊號的程式。在接收訊號的程式中，此 bug 更可能影響使用備用訊號棧的程式。此 bug 僅影響使用 YMM 暫存器的程式。特別是在 Go 程式中，此 bug 通常會導致記憶體損壞，因為 Go 程式主要使用 YMM 暫存器來實現將一個記憶體緩衝區複製到另一個緩衝區。

此 bug 已報告給 Linux 核心開發人員。它很快就被修復了。此 bug 修復未向 Linux 核心 5.2 系列回溯。此 bug 在 Linux 核心版本 5.3.15、5.4.2 和 5.5 及更高版本中已修復。

僅當核心使用 GCC 9 或更高版本編譯時，此 bug 才存在。

此 bug 存在於 vanilla Linux 核心版本 5.2.x（無論 x 為何值）、5.3.0 到 5.3.14、以及 5.4.0 和 5.4.1 中。但是，許多使用這些核心版本的發行版實際上已經回溯了補丁（它非常小）。而且，一些發行版仍使用 GCC 8 編譯其核心，在這種情況下，核心不存在此 bug。

換句話說，即使您的核心處於易受攻擊的範圍內，也有很大可能它不會受到此 bug 的影響。

Bug 測試

要測試您的核心是否存在此 bug，可以執行以下 C 程式（點選“詳細資訊”檢視程式）。在存在 bug 的核心上，它將幾乎立即失敗。在沒有 bug 的核心上，它將執行 60 秒然後以狀態碼 0 退出。

// Build with: gcc -pthread test.c
//
// This demonstrates an issue where AVX state becomes corrupted when a
// signal is delivered where the signal stack pages aren't faulted in.
//
// There appear to be three necessary ingredients, which are marked
// with "!!!" below:
//
// 1. A thread doing AVX operations using YMM registers.
//
// 2. A signal where the kernel must fault in stack pages to write the
//    signal context.
//
// 3. Context switches. Having a single task isn't sufficient.

##include <errno.h>
##include <signal.h>
##include <stdio.h>
##include <stdlib.h>
##include <string.h>
##include <unistd.h>
##include <pthread.h>
##include <sys/mman.h>
##include <sys/prctl.h>
##include <sys/wait.h>

static int sigs;

static stack_t altstack;
static pthread_t tid;

static void die(const char* msg, int err) {
  if (err != 0) {
    fprintf(stderr, "%s: %s\n", msg, strerror(err));
  } else {
    fprintf(stderr, "%s\n", msg);
  }
  exit(EXIT_FAILURE);
}

void handler(int sig __attribute__((unused)),
             siginfo_t* info __attribute__((unused)),
             void* context __attribute__((unused))) {
  sigs++;
}

void* sender(void *arg) {
  int err;

  for (;;) {
    usleep(100);
    err = pthread_kill(tid, SIGWINCH);
    if (err != 0)
      die("pthread_kill", err);
  }
  return NULL;
}

void dump(const char *label, unsigned char *data) {
  printf("%s =", label);
  for (int i = 0; i < 32; i++)
    printf(" %02x", data[i]);
  printf("\n");
}

void doAVX(void) {
  unsigned char input[32];
  unsigned char output[32];

  // Set input to a known pattern.
  for (int i = 0; i < sizeof input; i++)
    input[i] = i;
  // Mix our PID in so we detect cross-process leakage, though this
  // doesn't appear to be what's happening.
  pid_t pid = getpid();
  memcpy(input, &pid, sizeof pid);

  while (1) {
    for (int i = 0; i < 1000; i++) {
      // !!! Do some computation we can check using YMM registers.
      asm volatile(
        "vmovdqu %1, %%ymm0;"
        "vmovdqa %%ymm0, %%ymm1;"
        "vmovdqa %%ymm1, %%ymm2;"
        "vmovdqa %%ymm2, %%ymm3;"
        "vmovdqu %%ymm3, %0;"
        : "=m" (output)
        : "m" (input)
        : "memory", "ymm0", "ymm1", "ymm2", "ymm3");
      // Check that input == output.
      if (memcmp(input, output, sizeof input) != 0) {
        dump("input ", input);
        dump("output", output);
        die("mismatch", 0);
      }
    }

    // !!! Release the pages of the signal stack. This is necessary
    // because the error happens when copy_fpstate_to_sigframe enters
    // the failure path that handles faulting in the stack pages.
    // (mmap with MMAP_FIXED also works.)
    //
    // (We do this here to ensure it doesn't race with the signal
    // itself.)
    if (madvise(altstack.ss_sp, altstack.ss_size, MADV_DONTNEED) != 0)
      die("madvise", errno);
  }
}

void doTest() {
  // Create an alternate signal stack so we can release its pages.
  void *altSigstack = mmap(NULL, SIGSTKSZ, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (altSigstack == MAP_FAILED)
    die("mmap failed", errno);
  altstack.ss_sp = altSigstack;
  altstack.ss_size = SIGSTKSZ;
  if (sigaltstack(&altstack, NULL) < 0)
    die("sigaltstack", errno);

  // Install SIGWINCH handler.
  struct sigaction sa = {
    .sa_sigaction = handler,
    .sa_flags = SA_ONSTACK | SA_RESTART,
  };
  sigfillset(&sa.sa_mask);
  if (sigaction(SIGWINCH, &sa, NULL) < 0)
    die("sigaction", errno);

  // Start thread to send SIGWINCH.
  int err;
  pthread_t ctid;
  tid = pthread_self();
  if ((err = pthread_create(&ctid, NULL, sender, NULL)) != 0)
    die("pthread_create sender", err);

  // Run test.
  doAVX();
}

void *exiter(void *arg) {
  sleep(60);
  exit(0);
}

int main() {
  int err;
  pthread_t ctid;

  // !!! We need several processes to cause context switches. Threads
  // probably also work. I don't know if the other tasks also need to
  // be doing AVX operations, but here we do.
  int nproc = sysconf(_SC_NPROCESSORS_ONLN);
  for (int i = 0; i < 2 * nproc; i++) {
    pid_t child = fork();
    if (child < 0) {
      die("fork failed", errno);
    } else if (child == 0) {
      // Exit if the parent dies.
      prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
      doTest();
    }
  }

  // Exit after a while.
  if ((err = pthread_create(&ctid, NULL, exiter, NULL)) != 0)
    die("pthread_create exiter", err);

  // Wait for a failure.
  int status;
  if (wait(&status) < 0)
    die("wait", errno);
  if (status == 0)
    die("child unexpectedly exited with success", 0);
  fprintf(stderr, "child process failed\n");
  exit(1);
}

操作方法

如果您的核心版本處於可能存在此 bug 的範圍內，請執行上面的 C 程式以檢視它是否失敗。如果失敗，則您的核心存在 bug。您應該升級到較新的核心。此 bug 沒有解決辦法。

使用 1.14 構建的 Go 程式將嘗試透過使用 mlock 系統呼叫將訊號棧頁面鎖定在記憶體中來緩解此 bug。這是有效的，因為 bug 僅在訊號棧頁面需要被置為無效時才會發生。但是，此 mlock 用法可能會失敗。如果您看到訊息

runtime: note: mlock workaround for kernel bug failed with errno 12

errno 12（也稱為 ENOMEM）表示 mlock 失敗，因為系統設定了程式可以鎖定的記憶體量限制。如果您可以增加限制，程式可能會成功。這可以使用 ulimit -l 完成。在 Docker 容器中執行程式時，可以透過呼叫 docker 並帶上選項 -ulimit memlock=67108864 來增加限制。

如果無法增加 mlock 限制，那麼您可以透過在執行 Go 程式時設定環境變數 GODEBUG=asyncpreemptoff=1 來降低此 bug 干擾程式的可能性。但是，這隻會降低您的程式遭受記憶體損壞的可能性（因為它減少了您的程式將接收的訊號數量）。bug 仍然存在，記憶體損壞仍可能發生。

問題？

請在郵件列表 golang-nuts@googlegroups.com 或任何 Go 論壇（如 Questions 中所述）上提問。

詳細資訊

要檢視有關此 bug 如何影響 Go 程式以及如何檢測和理解它的更多詳細資訊，請參見 #35777 和 #35326。

此內容是 Go Wiki 的一部分。