源码文件结构

这里我们直接在AFL的网站上下载最新的代码之后，即得到如下的文件夹

lyyl@lyyl:~/Desktop/afl/afl-2.52b$ ls
afl-analyze    afl-as.c   afl-clang++  afl-fuzz.c  afl-gcc.c     afl-plot       afl-tmin     alloc-inl.h  debug.h       experimental   libtokencap  qemu_mode            testcases
afl-analyze.c  afl-as.h   afl-cmin     afl-g++     afl-gotcpu    afl-showmap    afl-tmin.c   as           dictionaries  hash.h         llvm_mode    QuickStartGuide.txt  test-instr.c
afl-as         afl-clang  afl-fuzz     afl-gcc     afl-gotcpu.c  afl-showmap.c  afl-whatsup  config.h     docs          libdislocator  Makefile     README               types.h
lyyl@lyyl:~/Desktop/afl/afl-2.52b$

这里主要的代码位于afl-fuzz.c这个文件中是整个afl的核心代码。这里我们先来看一下各个模块的主要功能以及作用

afl-as.c，afl-as.h，afl-gcc.c	普通的插桩模式，gcc和clang都适用。
llvm_mode	llvm插桩模式，只适用于clang
qemu_mode	qemu插桩模式的，针对的是二进制文件
afl-fuzz.c	fuzzer实现的核心代码，是整个AFL的核心
libdislocator	简单的内存检测的工具
libtokencap	语法的关键字提取并生成相应的字典文件
afl-analyze.c	对测试样例的字典进行分析
afl-cmin	对fuzzing用到的语料库进行一个精简的操作
afl-tmin.c	对fuzzing中用到的测试用例进行最小化的操作
afl-gotcpu.c	统计CPU的占用率
afl-plot	绘制报告图标
afl-showmap.c	打印目标程序单轮fuzz之后的tuple也就是路径信息
afl-whatsup	各个并行的例程的fuzzing结果统计
alloc-inl.h	定义带有检测功能的内存分配以及释放的操作
config.h	配置信息
debug.h	跟提示信息相关的一些宏定义
Hash.h	hash函数的实现以及定义
Types.h	部分的类型以及宏定义
test-instr.c	作为测试的目标程序
Docs	项目的一些相关的文档
experimental	一些新特性的实验研究

Main函数

这里我们先来看一下main函数，该函数定义在afl-fuzz.c文件中，整个main函数共分为5个部分

参数处理：主要是根据用户启动afl二进制程序是传入的参数进行相应的mode等配置设置
初始化：获取特殊环境变量的值、设置输出文件夹、设置共享内存、读取测试用例等
Dry Run：第一次运行，即首轮Fuz
Main Loop：循环Fuzz运行
Exit：Fuzz结束之后的保存测试结果等

参数处理

这里是通过一个while循环来实现参数处理逻辑的。主要的代码如

main函数参数处理部分源码

while ((opt = getopt(argc, argv, "+i:o:f:m:t:T:dnCB:S:M:x:Q")) > 0)

    switch (opt) {

      case 'i': /* input dir */

        if (in_dir) FATAL("Multiple -i options not supported");
        in_dir = optarg;

        if (!strcmp(in_dir, "-")) in_place_resume = 1;

        break;

      case 'o': /* output dir */

        if (out_dir) FATAL("Multiple -o options not supported");
        out_dir = optarg;
        break;

      case 'M': { /* master sync ID */

          u8* c;

          if (sync_id) FATAL("Multiple -S or -M options not supported");
          sync_id = ck_strdup(optarg);

          if ((c = strchr(sync_id, ':'))) {

            *c = 0;

            if (sscanf(c + 1, "%u/%u", &master_id, &master_max) != 2 ||
                !master_id || !master_max || master_id > master_max ||
                master_max > 1000000) FATAL("Bogus master ID passed to -M");

          }

          force_deterministic = 1;

        }

        break;

      case 'S': 

        if (sync_id) FATAL("Multiple -S or -M options not supported");
        sync_id = ck_strdup(optarg);
        break;

      case 'f': /* target file */

        if (out_file) FATAL("Multiple -f options not supported");
        out_file = optarg;
        break;

      case 'x': /* dictionary */

        if (extras_dir) FATAL("Multiple -x options not supported");
        extras_dir = optarg;
        break;

      case 't': { /* timeout */

          u8 suffix = 0;

          if (timeout_given) FATAL("Multiple -t options not supported");

          if (sscanf(optarg, "%u%c", &exec_tmout, &suffix) < 1 ||
              optarg[0] == '-') FATAL("Bad syntax used for -t");

          if (exec_tmout < 5) FATAL("Dangerously low value of -t");

          if (suffix == '+') timeout_given = 2; else timeout_given = 1;

          break;

      }

      case 'm': { /* mem limit */

          u8 suffix = 'M';

          if (mem_limit_given) FATAL("Multiple -m options not supported");
          mem_limit_given = 1;

          if (!strcmp(optarg, "none")) {

            mem_limit = 0;
            break;

          }

          if (sscanf(optarg, "%llu%c", &mem_limit, &suffix) < 1 ||
              optarg[0] == '-') FATAL("Bad syntax used for -m");

          switch (suffix) {

            case 'T': mem_limit *= 1024 * 1024; break;
            case 'G': mem_limit *= 1024; break;
            case 'k': mem_limit /= 1024; break;
            case 'M': break;

            default:  FATAL("Unsupported suffix or bad syntax for -m");

          }

          if (mem_limit < 5) FATAL("Dangerously low value of -m");

          if (sizeof(rlim_t) == 4 && mem_limit > 2000)
            FATAL("Value of -m out of range on 32-bit systems");

        }

        break;

      case 'd': /* skip deterministic */

        if (skip_deterministic) FATAL("Multiple -d options not supported");
        skip_deterministic = 1;
        use_splicing = 1;
        break;

      case 'B': /* load bitmap */

        /* This is a secret undocumented option! It is useful if you find
           an interesting test case during a normal fuzzing process, and want
           to mutate it without rediscovering any of the test cases already
           found during an earlier run.

           To use this mode, you need to point -B to the fuzz_bitmap produced
           by an earlier run for the exact same binary... and that's it.

           I only used this once or twice to get variants of a particular
           file, so I'm not making this an official setting. */

        if (in_bitmap) FATAL("Multiple -B options not supported");

        in_bitmap = optarg;
        read_bitmap(in_bitmap);
        break;

      case 'C': /* crash mode */

        if (crash_mode) FATAL("Multiple -C options not supported");
        crash_mode = FAULT_CRASH;
        break;

      case 'n': /* dumb mode */

        if (dumb_mode) FATAL("Multiple -n options not supported");
        if (getenv("AFL_DUMB_FORKSRV")) dumb_mode = 2; else dumb_mode = 1;

        break;

      case 'T': /* banner */

        if (use_banner) FATAL("Multiple -T options not supported");
        use_banner = optarg;
        break;

      case 'Q': /* QEMU mode */

        if (qemu_mode) FATAL("Multiple -Q options not supported");
        qemu_mode = 1;

        if (!mem_limit_given) mem_limit = MEM_LIMIT_QEMU;

        break;

      default:

        usage(argv[0]);

    }

这里我们可以将AFL的参数总结如下

-i, -o：分别更改全局变量in_dir以及out_dir的值。表示的就是输入文件夹以及输出文件夹的位置。
-M，-S：更改全局变量sync_id 的值，用来进行并行设置
-f：指定临时的输出文件，更改全局变量out_file 的值。
-x：更改全局变量extras_dir 的值，表示的是用户指定的字典
-t：%u%c 的格式分别读取到全局变量exec_tmout 与局部变量suffix 变量中，最终改变的是timeout_given 的值
-m：%llu%c 的格式分别读取到全局变量mem_limit 与局部变量suffix 中，这里最小的内存不能小于5M，最大不超过2000M
-d：更改全局变量skip_deterministic 与use_splicing 的值，表示的是跳过确定性变异过程
-B：更改全局变量in_bitmap 的值，表示指定bitmap，并将其读取到in_bitmap 中。这表示我们可以选择自己该兴趣的bitmap然后从其开始进行变异
-C：更改全局变量crash_mode 的值，这里我们可以将一个崩溃的测试用例输入到AFL中，此时不发生崩溃的或者执行路径没有明显变化的变异样本将会被丢弃
-n：更改全局变量dumb_mode 的值，表示不再进行插桩
-T：指定banner
-Q：更改全局变量qemu_mode 的值，表示的是进入到qemu插桩模式中

初始化

main函数中初始化相关部分代码如下

if (optind == argc || !in_dir || !out_dir) usage(argv[0]);

  setup_signal_handlers();
  check_asan_opts();

  if (sync_id) fix_up_sync();

  if (!strcmp(in_dir, out_dir))
    FATAL("Input and output directories can't be the same");

  if (dumb_mode) {

    if (crash_mode) FATAL("-C and -n are mutually exclusive");
    if (qemu_mode)  FATAL("-Q and -n are mutually exclusive");

  }

  if (getenv("AFL_NO_FORKSRV"))    no_forkserver    = 1;
  if (getenv("AFL_NO_CPU_RED"))    no_cpu_meter_red = 1;
  if (getenv("AFL_NO_ARITH"))      no_arith         = 1;
  if (getenv("AFL_SHUFFLE_QUEUE")) shuffle_queue    = 1;
  if (getenv("AFL_FAST_CAL"))      fast_cal         = 1;

  if (getenv("AFL_HANG_TMOUT")) {
    hang_tmout = atoi(getenv("AFL_HANG_TMOUT"));
    if (!hang_tmout) FATAL("Invalid value of AFL_HANG_TMOUT");
  }

  if (dumb_mode == 2 && no_forkserver)
    FATAL("AFL_DUMB_FORKSRV and AFL_NO_FORKSRV are mutually exclusive");

  if (getenv("AFL_PRELOAD")) {
    setenv("LD_PRELOAD", getenv("AFL_PRELOAD"), 1);
    setenv("DYLD_INSERT_LIBRARIES", getenv("AFL_PRELOAD"), 1);
  }

  if (getenv("AFL_LD_PRELOAD"))
    FATAL("Use AFL_PRELOAD instead of AFL_LD_PRELOAD");

  save_cmdline(argc, argv);

  fix_up_banner(argv[optind]);

  check_if_tty();

  get_core_count();

#ifdef HAVE_AFFINITY
  bind_to_free_cpu();
#endif /* HAVE_AFFINITY */

  check_crash_handling();
  check_cpu_governor();

  setup_post();
  setup_shm();
  init_count_class16();

  setup_dirs_fds();
  read_testcases();
  load_auto();

  pivot_inputs();

  if (extras_dir) load_extras(extras_dir);

  if (!timeout_given) find_timeout();

  detect_file_args(argv + optind + 1);

  if (!out_file) setup_stdio_file();

  check_binary(argv[optind]);

  start_time = get_cur_time();

  if (qemu_mode)
    use_argv = get_qemu_argv(argv[0], argv + optind, argc - optind);
  else
    use_argv = argv + optind;

在这一部分将会检查前面我们设置的参数是否存在某些冲突或者错误，例如这里的输入输出文件夹必须要要指定。

之后调用setup_shm 函数进行了共享内存的设置，这里是通过shmget 函数进行共享内存的分配的。并初始化了virgin_bits,virgin_tmout,virgin_crash,trace_bits等变量。

测试用例读取

在初始化过程中还会存在一个比较重要的函数就是read_testcases ，函数的功能就是从输入文件夹中读取所有的测试用例，并将其加入到队列中。函数的关键部分的代码如下

nl_cnt = scandir(in_dir, &nl, NULL, alphasort);
if (shuffle_queue && nl_cnt > 1) { // shuffle_queue通过环境变量进行定义的

  ACTF("Shuffling queue...");
  shuffle_ptrs((void**)nl, nl_cnt); // 随机打乱顺序

}

for (i = 0; i < nl_cnt; i++) {

    struct stat st;

    u8* fn = alloc_printf("%s/%s", in_dir, nl[i]->d_name);
    u8* dfn = alloc_printf("%s/.state/deterministic_done/%s", in_dir, nl[i]->d_name);

    u8  passed_det = 0;

    free(nl[i]); /* not tracked */
 
    if (lstat(fn, &st) || access(fn, R_OK))
      PFATAL("Unable to access '%s'", fn);

    /* This also takes care of . and .. */

    if (!S_ISREG(st.st_mode) || !st.st_size || strstr(fn, "/README.txt")) {

      ck_free(fn);
      ck_free(dfn);
      continue;

    }

    if (st.st_size > MAX_FILE) 
      FATAL("Test case '%s' is too big (%s, limit is %s)", fn,
            DMS(st.st_size), DMS(MAX_FILE));

    /* Check for metadata that indicates that deterministic fuzzing
       is complete for this entry. We don't want to repeat deterministic
       fuzzing when resuming aborted scans, because it would be pointless
       and probably very time-consuming. */

    if (!access(dfn, F_OK)) passed_det = 1;
    ck_free(dfn);

    add_to_queue(fn, st.st_size, passed_det);

  }

这里的scandir 函数调用的作用是扫描in_dir的目录结构，经过第三个参数也就是select函数的挑选之后，按照第四个参数给定的排序规则存储到第二个参数指向的内存空间中。这里我们的排序参数是alphasort 其就是按照字母顺序进行一个排序。

接着就是判断是否要随机打乱顺序。之后对获取得到的文件路径依次调用access函数看是否可以访问，以及是否可以访问对应的in_dir/.state/deterministic_done/file_name 文件，如果这个文件存在那么这里表示的是确定性变异已经完成了，此时这个文件送入队列的时候就会标记passed_det 表示跳过确定性变异的阶段。

读取完毕测试用例之后即会调用load_auto 函数在in_dir/.state/auto_extras/auto_%06u 所表示的文件，对其调用maybe_add_auto 函数按照规则加入到字典中。

pivot_inputs 函数用于在输出文件夹中创建测试用例的硬链接。这里的命名规则如下

// 如果SIMPLE_FILE定义了
"in_dir/queue/id:%06u" % id
// 如果SIMPLE_FILE没有定义
"in_dir/queue/id:%06u,orig:%s" % id, use_name

如果测试用例标记了跳过确定性变异，那么就生成对应的out_dir/queue/.state/deterministic_done/file_name 文件。最后调用nuke_resume_dir 函数删除out_dir/_resume/.state/ 下面的所有的临时文件夹。

后续还包含设置启动时间，检查目标程序的信息等步骤。

Dry Run

在一切都准备好之后，接下来就开始跑第一轮的fuzz，代码如下

main函数DryRun部分

perform_dry_run(use_argv);

cull_queue();

show_init_stats();

seek_to = find_start_position();

write_stats_file(0, 0, 0);
save_auto();

if (stop_soon) goto stop_fuzzing;

/* Woop woop woop */

if (!not_on_tty) {
  sleep(4);
  start_time += 4000;
  if (stop_soon) goto stop_fuzzing;
}

这里首先调用的是perform_dry_run 函数，该函数就会依次读取queue中的内容，对其中的每一个测试文件调用calibrate_case 函数进行校准。此时目标程序至少执行一次测试用例。然后根据返回结果来判断发生错误的类型。

接着调用的是cull_queue 函数，该函数的作用是精简队列，也就是执行优胜者策略，选出最受欢迎的优胜者测试用例。

接着调用show_init_stats 函数输出这一轮的Fuzz相关的信息。输出完毕之后调用find_start_position 函数从out_dir/fuzzer_stats 中cur_path 找到当前的测试用例，并从这个测试用例开始。接着调用write_stats_file 函数更新out_dir/fuzzer_stats 文件，这里我们可以看到out_dir/fuzzer_stats 文件中存储的信息

out_dir/fuzzer_stats 文件内容

fprintf(f, "start_time        : %llu\n"
           "last_update       : %llu\n"
           "fuzzer_pid        : %u\n"
           "cycles_done       : %llu\n"
           "execs_done        : %llu\n"
           "execs_per_sec     : %0.02f\n"
           "paths_total       : %u\n"
           "paths_favored     : %u\n"
           "paths_found       : %u\n"
           "paths_imported    : %u\n"
           "max_depth         : %u\n"
           "cur_path          : %u\n" /* Must match find_start_position() */
           "pending_favs      : %u\n"
           "pending_total     : %u\n"
           "variable_paths    : %u\n"
           "stability         : %0.02f%%\n"
           "bitmap_cvg        : %0.02f%%\n"
           "unique_crashes    : %llu\n"
           "unique_hangs      : %llu\n"
           "last_path         : %llu\n"
           "last_crash        : %llu\n"
           "last_hang         : %llu\n"
           "execs_since_crash : %llu\n"
           "exec_timeout      : %u\n"
           "afl_banner        : %s\n"
           "afl_version       : " VERSION "\n"
           "target_mode       : %s%s%s%s%s%s%s\n"
           "command_line      : %s\n",
           start_time / 1000, get_cur_time() / 1000, getpid(),
           queue_cycle ? (queue_cycle - 1) : 0, total_execs, eps,
           queued_paths, queued_favored, queued_discovered, queued_imported,
           max_depth, current_entry, pending_favored, pending_not_fuzzed,
           queued_variable, stability, bitmap_cvg, unique_crashes,
           unique_hangs, last_path_time / 1000, last_crash_time / 1000,
           last_hang_time / 1000, total_execs - last_crash_execs,
           exec_tmout, use_banner,
           qemu_mode ? "qemu " : "", dumb_mode ? " dumb " : "",
           no_forkserver ? "no_forksrv " : "", crash_mode ? "crash " : "",
           persistent_mode ? "persistent " : "", deferred_mode ? "deferred " : "",
           (qemu_mode || dumb_mode || no_forkserver || crash_mode ||
            persistent_mode || deferred_mode) ? "" : "default",
           orig_cmdline);
           /* ignore errors */

更新完毕文件内容之后即调用save_auto 函数保存这一轮产生的Token的信息。这些Token和词法分析中的Token概念相似，用来作为字典指导变异。

Main Loop

前面需要首先执行一次Fuzz的目的实际上就是去判断我们输入的测试用例的情况，如果存在某些错误的话，那么这里将会直接Stop。执行完毕首轮Fuzz之后接下来就进入到了Main Loop中，也就是Fuzz的主逻辑中。

main函数中Main Loop部分代码

while (1) {

    u8 skipped_fuzz;

    cull_queue();

    if (!queue_cur) {

      queue_cycle++;
      current_entry     = 0;
      cur_skipped_paths = 0;
      queue_cur         = queue;

      while (seek_to) {
        current_entry++;
        seek_to--;
        queue_cur = queue_cur->next;
      }

      show_stats();

      if (not_on_tty) {
        ACTF("Entering queue cycle %llu.", queue_cycle);
        fflush(stdout);
      }

      /* If we had a full queue cycle with no new finds, try
         recombination strategies next. */

      if (queued_paths == prev_queued) {

        if (use_splicing) cycles_wo_finds++; else use_splicing = 1;

      } else cycles_wo_finds = 0;

      prev_queued = queued_paths;

      if (sync_id && queue_cycle == 1 && getenv("AFL_IMPORT_FIRST"))
        sync_fuzzers(use_argv);

    }

    skipped_fuzz = fuzz_one(use_argv);

    if (!stop_soon && sync_id && !skipped_fuzz) {
      
      if (!(sync_interval_cnt++ % SYNC_INTERVAL))
        sync_fuzzers(use_argv);

    }

    if (!stop_soon && exit_1) stop_soon = 2;

    if (stop_soon) break;

    queue_cur = queue_cur->next;
    current_entry++;

  }

首先这里会调用cull_queue 函数，也就是在每次循环之前都会进行一下队列的精简。

接着存在一个对queue_cur 的判断，如果queue_cur 是空的话，那么就表示当前的队列已经遍历完毕了因此这里需要初始化到队列的首部进行重新遍历。在进行队列初始化的时候会判断当前队列和上一个队列是否完全相同，如果完全相同的话，那么就表示这一轮的Fuzz是没有效果的，这里就需要进行变异策略的重组。

接着调用fuzz_one 函数对目标程序执行一次Fuzz，Fuzz结束之后会首先去判断stop_soon 的值也就是是否结束Fuzz，如果不结束的话那么就移动queue_cur 队列指针指向下一个测试用例，继续执行循环。

这里有两种方式去设置stop_soon 的值，一种是程序错误，另一种是用户发送某些信号例如ctrl-c，通过handler对stop_soon 进行了设置。

Exit

这一部分就是退出AFL的过程了，这里就保存一些相关的文件例如bitmap等，然后关闭程序

main函数Exit部分代码

write_bitmap();
write_stats_file(0, 0, 0);
save_auto();

stop_fuzzing:

SAYF(CURSOR_SHOW cLRD "\n\n+++ Testing aborted %s +++\n" cRST,
     stop_soon == 2 ? "programmatically" : "by user");

/* Running for more than 30 minutes but still doing first cycle? */

if (queue_cycle == 1 && get_cur_time() - start_time > 30 * 60 * 1000) {

  SAYF("\n" cYEL "[!] " cRST
         "Stopped during the first cycle, results may be incomplete.\n"
         "    (For info on resuming, see %s/README.)\n", doc_path);

}

fclose(plot_file);
destroy_queue();
destroy_extras();
ck_free(target_path);
ck_free(sync_id);

alloc_report();

OKF("We're done here. Have a nice day!\n");

exit(0);

这里会先保存bitmap、以及当前的fuzz的状态以及Token的信息。接着释放一些全局变量的内存，之后调用exit进行退出。

DryRun与MainLoop代码中的一些关键函数

前面我们只是分析了一下main函数的整体的流程，对于一些细节的部分和一些关键的函数，我们这里再分析一下

calibrate_case 函数

前面我们说到在进行第一次Fuzz Run的时候也就是DryRun的时候首先会调用perform_dry_run 函数去对测试用例进行校准。这里我们需要看一下这个测试用例的校准的函数。

calibrate_case函数

/* Calibrate a new test case. This is done when processing the input directory
   to warn about flaky or otherwise problematic test cases early on; and when
   new paths are discovered to detect variable behavior and so on. */

static u8 calibrate_case(char** argv, struct queue_entry* q, u8* use_mem,
                         u32 handicap, u8 from_queue) {

  static u8 first_trace[MAP_SIZE];

  u8  fault = 0, new_bits = 0, var_detected = 0,
      first_run = (q->exec_cksum == 0);

  u64 start_us, stop_us;

  s32 old_sc = stage_cur, old_sm = stage_max;
  u32 use_tmout = exec_tmout;
  u8* old_sn = stage_name;

  /* Be a bit more generous about timeouts when resuming sessions, or when
     trying to calibrate already-added finds. This helps avoid trouble due
     to intermittent latency. */

  if (!from_queue || resuming_fuzz)
    use_tmout = MAX(exec_tmout + CAL_TMOUT_ADD,
                    exec_tmout * CAL_TMOUT_PERC / 100);

  q->cal_failed++;

  stage_name = "calibration";
  stage_max  = fast_cal ? 3 : CAL_CYCLES;

  /* Make sure the forkserver is up before we do anything, and let's not
     count its spin-up time toward binary calibration. */

  if (dumb_mode != 1 && !no_forkserver && !forksrv_pid)
    init_forkserver(argv);

  if (q->exec_cksum) memcpy(first_trace, trace_bits, MAP_SIZE);

  start_us = get_cur_time_us();

  for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {

    u32 cksum;

    if (!first_run && !(stage_cur % stats_update_freq)) show_stats();

    write_to_testcase(use_mem, q->len);

    fault = run_target(argv, use_tmout);

    /* stop_soon is set by the handler for Ctrl+C. When it's pressed,
       we want to bail out quickly. */

    if (stop_soon || fault != crash_mode) goto abort_calibration;

    if (!dumb_mode && !stage_cur && !count_bytes(trace_bits)) {
      fault = FAULT_NOINST;
      goto abort_calibration;
    }

    cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

    if (q->exec_cksum != cksum) {

      u8 hnb = has_new_bits(virgin_bits);
      if (hnb > new_bits) new_bits = hnb;

      if (q->exec_cksum) {

        u32 i;

        for (i = 0; i < MAP_SIZE; i++) {

          if (!var_bytes[i] && first_trace[i] != trace_bits[i]) {

            var_bytes[i] = 1;
            stage_max    = CAL_CYCLES_LONG;

          }

        }

        var_detected = 1;

      } else {

        q->exec_cksum = cksum;
        memcpy(first_trace, trace_bits, MAP_SIZE);

      }

    }

  }

  stop_us = get_cur_time_us();

  total_cal_us     += stop_us - start_us;
  total_cal_cycles += stage_max;

  /* OK, let's collect some stats about the performance of this test case.
     This is used for fuzzing air time calculations in calculate_score(). */

  q->exec_us     = (stop_us - start_us) / stage_max;
  q->bitmap_size = count_bytes(trace_bits);
  q->handicap    = handicap;
  q->cal_failed  = 0;

  total_bitmap_size += q->bitmap_size;
  total_bitmap_entries++;

  update_bitmap_score(q);

  /* If this case didn't result in new output from the instrumentation, tell
     parent. This is a non-critical problem, but something to warn the user
     about. */

  if (!dumb_mode && first_run && !fault && !new_bits) fault = FAULT_NOBITS;

abort_calibration:

  if (new_bits == 2 && !q->has_new_cov) {
    q->has_new_cov = 1;
    queued_with_cov++;
  }

  /* Mark variable paths. */

  if (var_detected) {

    var_byte_count = count_bytes(var_bytes);

    if (!q->var_behavior) {
      mark_as_variable(q);
      queued_variable++;
    }

  }

  stage_name = old_sn;
  stage_cur  = old_sc;
  stage_max  = old_sm;

  if (!first_run) show_stats();

  return fault;

}

这里会首先调用init_forkserver 函数来确保forkserver已经开启，至于这里的forkserver 是什么这里还没有分析。

如果这里不是第一次运行的话，那么就将trace_bits 拷贝到first_trace 的内存空间中。
之后会执行一个for循环，也就是对测试用例的校准执行的不只一次。最大执行的次数为stage_max

之后会调用write_to_testcase 函数将测试用例的内容写入到测试文件中，这里写入的内容use_mem 其实就是函数调用之前读取的队列中的测试用例的内容。这里写入的测试文件是通过out_file 这个变量进行控制的。

之后就会调用run_target 函数启动目标程序执行测试用例。这里启动的方式有两种

如果处于dump mode，或者forkserver没有开启的话，那么这里由此函数所在的线程fork启动目标程序执行测试用例。
如果forkserver开启，那么这里只需要给forkserver发送消息，由forkserver fork产生一个新的进程及启动目标程序。当前进程和forkserver之间采用pipe管道进行通信

执行Fuzz过程中的路径信息会保存在trace_bits 中。在执行结束之后会对trace_bits 的内存空间做一个hash32 ，也就是会去判断在本次执行过程中有没有产生新的路径，这里之前路径的hash信息保存在q->exec_cksum 中。这里如果在执行过程中产生了新的路径，那么这里会首先调用has_new_bits 函数，函数这里会去判断当前的路径信息和之前的路径信息virgin_bits 是否产生了不同，如果产生了变化的话就会更新virgin_bits 的信息。注意到这里产生新路径的判断方式即现在的bit位不是0，但是之前的bits位0xff（这里表示的是之前的bit位是0，也就是不存在这样的元组）那么这里就产生了新的路径

这里需要提前了解一下路径信息的存储方式，这里实际上在白皮书上讲到过，也就是对于A→B这样的路径来说，这里存储的是shared_mem[(A>>1)^B]++ ，也就是存在路径之后相应的bits位就肯定不是0了。

那么就会分为以下两种情况进行处理

如果q->exec_cksum 为空，也就是第一次运行这个测试用例，那么就更新q->exec_cksum 以及将trace_bits 拷贝到first_trace 中
如果不为空，那么确实产生了新的路径信息。那么这里就会将测试的论述调大即设置stage_max 的值为CAL_CYCLES_LONG

当当前的测试用例的stage_max 轮测试都跑完之后，即调用update_bitmap_score 函数来计算当前测试用例的分数。

如果产生了新的路径，那么标记这个测试用例为variable ，测试之后的结果如果new_bits 那么就表示有新路径产生。

cull_queue

该函数是样本选择的重要函数，也是AFL中优胜者策略的重要组成部分，我们看一下这个函数

/* The second part of the mechanism discussed above is a routine that
   goes over top_rated[] entries, and then sequentially grabs winners for
   previously-unseen bytes (temp_v) and marks them as favored, at least
   until the next run. The favored entries are given more air time during
   all fuzzing steps. */

static void cull_queue(void) {

  struct queue_entry* q;
  static u8 temp_v[MAP_SIZE >> 3];
  u32 i;

  if (dumb_mode || !score_changed) return;

  score_changed = 0;

  memset(temp_v, 255, MAP_SIZE >> 3);

  queued_favored  = 0;
  pending_favored = 0;

  q = queue;

  while (q) {
    q->favored = 0;
    q = q->next;
  }

  /* Let's see if anything in the bitmap isn't captured in temp_v.
     If yes, and if it has a top_rated[] contender, let's use it. */

  for (i = 0; i < MAP_SIZE; i++)
    if (top_rated[i] && (temp_v[i >> 3] & (1 << (i & 7)))) {

      u32 j = MAP_SIZE >> 3;

      /* Remove all bits belonging to the current entry from temp_v. */

      while (j--) 
        if (top_rated[i]->trace_mini[j])
          temp_v[j] &= ~top_rated[i]->trace_mini[j];

      top_rated[i]->favored = 1;
      queued_favored++;

      if (!top_rated[i]->was_fuzzed) pending_favored++;

    }

  q = queue;

  while (q) {
    mark_as_redundant(q, !q->favored);
    q = q->next;
  }

}

这里首先初始化了队列，将每个测试用例的favored 成员变量置为0。接着遍历整个路径元组，对于每一个路径，这里AFL认为第一次遍历得到的优胜者测试用例是更加受欢迎的，也就是favored 。在update_bitmap_score中我们可以知道这里trace_mini 数组中是用1bit包含了路径有没有被命中的信息。那么这里temp_v 表示的就是当前测试用例没有被命中的路径数组的信息。也就是说这里永远会选择第一个遍历得到的优胜者测试用例。将其favored 成员变量设置为1。对于受欢迎的优胜者测试用例，如果没有被Fuzz过，那么这里将会通过pending_favored计算此类的测试用例的数量，以便在之后优先Fuzz。

/* Mark / unmark as redundant (edge-only). This is not used for restoring state,
   but may be useful for post-processing datasets. */

static void mark_as_redundant(struct queue_entry* q, u8 state) {

  u8* fn;
  s32 fd;

  if (state == q->fs_redundant) return;

  q->fs_redundant = state;

  fn = strrchr(q->fname, '/');
  fn = alloc_printf("%s/queue/.state/redundant_edges/%s", out_dir, fn + 1);

  if (state) {

    fd = open(fn, O_WRONLY | O_CREAT | O_EXCL, 0600);
    if (fd < 0) PFATAL("Unable to create '%s'", fn);
    close(fd);

  } else {

    if (unlink(fn)) PFATAL("Unable to remove '%s'", fn);

  }

  ck_free(fn);

}

从上面的分析中我们实际上可以看到并不是所有的优胜者测试用例都是受欢迎的，对于不太受欢迎的优胜者测试用例，那么这里会调用mark_as_redundant 函数将其放入到queue/.state/redundant_edges 文件夹下面，同理，如果被重新欢迎了，那么就将其从这个路径中删除。

fuzz_one

前面我们说到在第一次Fuzz也就是Dry Run执行完毕之后，即进入到了Main Loop中，每一次循环都会首先执行cull_queue 进行队列的精简，然后调用fuzz_one 函数开始本次的Fuzz。我们来看一下fuzz_one函数，这个函数非常的长，大概有2k行代码。这里如果Fuzz执行成功之后即返回0，如果跳过当前的测试用例或者Fuzz执行失败那么这里返回1。

按照策略跳过某些测试用例

这里首先会按照一定的概率跳过一些测试用例，这里代码如下

if (pending_favored) {

  /* If we have any favored, non-fuzzed new arrivals in the queue,
     possibly skip to them at the expense of already-fuzzed or non-favored
     cases. */

  if ((queue_cur->was_fuzzed || !queue_cur->favored) &&
      UR(100) < SKIP_TO_NEW_PROB) return 1; // 这里UR的宏定义实际上就是产生一个随机数

} else if (!dumb_mode && !queue_cur->favored && queued_paths > 10) {

  /* Otherwise, still possibly skip non-favored cases, albeit less often.
     The odds of skipping stuff are higher for already-fuzzed inputs and
     lower for never-fuzzed entries. */

  if (queue_cycle > 1 && !queue_cur->was_fuzzed) {

    if (UR(100) < SKIP_NFAV_NEW_PROB) return 1; // 75%

  } else {

    if (UR(100) < SKIP_NFAV_OLD_PROB) return 1; // 95%

  }

}

这里如果pending_favored 的值大于0，也就是队列中存在受欢迎的优胜者测试用例还没有被Fuzz过。那么这里如果当前的测试用例已经被fuzz过了或者当前的测试用例不是受欢迎的优胜者，那么这里将会存在99%的概率跳过这个测试用例。

如果队列中所有的受欢迎的优胜者测试用例都被Fuzz过了，那么这里对于队列中的不感兴趣的测试用例，如果当前的测试用例没有被fuzz过，那么将会存在75%的概率跳过这个测试用例；否则以95%的概率跳过这个测试用例。

剪枝

接着来要干的事情就是将测试用例读取到内存中。这里实际上是分配了两个内存空间，也就是in_buf/origin_in以及out_buf，其中in_buf/origin_in则是将测试用例以文件映射的方式mmap到了一个地址空间中。而out_buf则是按照文件的大小申请了相应大小的内存。

接下来对之前我们调用calibrate_case 函数对测试用例的校准结果进行了判断，如果校准的失败次数小于CAL_CHANCES 也就是3次，那么这里将会重新调用calibrate_case 进行校准。

接下来即进行剪枝的操作

/************
 * TRIMMING *
 ************/

if (!dumb_mode && !queue_cur->trim_done) { // dumb_mode表示不再进行插桩

  u8 res = trim_case(argv, queue_cur, in_buf);

  if (res == FAULT_ERROR)
    FATAL("Unable to execute target application");

  if (stop_soon) {
    cur_skipped_paths++;
    goto abandon_entry;
  }

  /* Don't retry trimming, even if it failed. */

  queue_cur->trim_done = 1;

  if (len != queue_cur->len) len = queue_cur->len;

}

memcpy(out_buf, in_buf, len);

也就是这里如果当前的测试用例没有被剪枝过的话，那么这里将会调用trim_case 函数进行剪枝的操作。注意到的是这里必须要在插桩模式下才会执行剪枝的操作。

剪枝结束之后，这里将当前的测试用例写会到磁盘中，然后更新其对应的trace_bits 和top_rated数组。剪枝结束之后更新out_buf的内容为新的in_buf的内容。

打分

剪枝结束之后对当前的测试用例调用calculate_score 函数进行打分。这一部分的代码如下

/*********************
 * PERFORMANCE SCORE *
 *********************/

orig_perf = perf_score = calculate_score(queue_cur);

/* Skip right away if -d is given, if we have done deterministic fuzzing on
   this entry ourselves (was_fuzzed), or if it has gone through deterministic
   testing in earlier, resumed runs (passed_det). */

if (skip_deterministic || queue_cur->was_fuzzed || queue_cur->passed_det)
  goto havoc_stage;

/* Skip deterministic fuzzing if exec path checksum puts this out of scope
   for this master instance. */

if (master_max && (queue_cur->exec_cksum % master_max) != master_id - 1)
  goto havoc_stage;

doing_det = 1;

这里如果在启动参数中设置了跳过确定性变异，或者当前的测试用例已经被fuzz过了或者说测试用例被标记了要跳过当前测试用例的确定性变异的过程的话，那么这里就会跳过确定性变异的过程进入到随机变异的过程中。这里我们看一下calculate_score 函数

calculate_score函数内容

/* Calculate case desirability score to adjust the length of havoc fuzzing.
   A helper function for fuzz_one(). Maybe some of these constants should
   go into config.h. */

static u32 calculate_score(struct queue_entry* q) {

  u32 avg_exec_us = total_cal_us / total_cal_cycles;
  u32 avg_bitmap_size = total_bitmap_size / total_bitmap_entries;
  u32 perf_score = 100;

  /* Adjust score based on execution speed of this path, compared to the
     global average. Multiplier ranges from 0.1x to 3x. Fast inputs are
     less expensive to fuzz, so we're giving them more air time. */

  if (q->exec_us * 0.1 > avg_exec_us) perf_score = 10;
  else if (q->exec_us * 0.25 > avg_exec_us) perf_score = 25;
  else if (q->exec_us * 0.5 > avg_exec_us) perf_score = 50;
  else if (q->exec_us * 0.75 > avg_exec_us) perf_score = 75;
  else if (q->exec_us * 4 < avg_exec_us) perf_score = 300;
  else if (q->exec_us * 3 < avg_exec_us) perf_score = 200;
  else if (q->exec_us * 2 < avg_exec_us) perf_score = 150;

  /* Adjust score based on bitmap size. The working theory is that better
     coverage translates to better targets. Multiplier from 0.25x to 3x. */

  if (q->bitmap_size * 0.3 > avg_bitmap_size) perf_score *= 3;
  else if (q->bitmap_size * 0.5 > avg_bitmap_size) perf_score *= 2;
  else if (q->bitmap_size * 0.75 > avg_bitmap_size) perf_score *= 1.5;
  else if (q->bitmap_size * 3 < avg_bitmap_size) perf_score *= 0.25;
  else if (q->bitmap_size * 2 < avg_bitmap_size) perf_score *= 0.5;
  else if (q->bitmap_size * 1.5 < avg_bitmap_size) perf_score *= 0.75;

  /* Adjust score based on handicap. Handicap is proportional to how late
     in the game we learned about this path. Latecomers are allowed to run
     for a bit longer until they catch up with the rest. */

  if (q->handicap >= 4) {

    perf_score *= 4;
    q->handicap -= 4;

  } else if (q->handicap) {

    perf_score *= 2;
    q->handicap--;

  }

  /* Final adjustment based on input depth, under the assumption that fuzzing
     deeper test cases is more likely to reveal stuff that can't be
     discovered with traditional fuzzers. */

  switch (q->depth) {

    case 0 ... 3:   break;
    case 4 ... 7:   perf_score *= 2; break;
    case 8 ... 13:  perf_score *= 3; break;
    case 14 ... 25: perf_score *= 4; break;
    default:        perf_score *= 5;

  }

  /* Make sure that we don't go over limit. */

  if (perf_score > HAVOC_MAX_MULT * 100) perf_score = HAVOC_MAX_MULT * 100;

  return perf_score;

}

从打分的规则来看（分数越高越好），我们可以看到执行速度越快、发现路径的数量越多、最近发现新路径的以及发现的路径越深的则分数更高。这里分数越高的测试用例在之后的随机变异也就是RANDOM HAVOC 阶段将会执行变异越多的轮次。也就是分数越高的随机变异的机会就越多。

变异

这里之所以代码这么长，大部分的代码都是集中在这个变异策略上。这里一共存在六种变异的策略。这里不同变异策略的详细实现还没有进行分析

SIMPLE BITFLIP
ARITHMETIC INC/DEC
INTERESTING VALUES
DICTIONARY STUFF
RANDOM HAVOC
SPLICING

这里在执行完毕每一次的变异策略之后都会调用common_fuzz_stuff 函数来将变异之后的样本投递到目标程序当中。并对结果进行一个评估。

trim_case

这里只是一个简单的方法进行剪枝的操作。

trim_case源码

/* Trim all new test cases to save cycles when doing deterministic checks. The
   trimmer uses power-of-two increments somewhere between 1/16 and 1/1024 of
   file size, to keep the stage short and sweet. */

static u8 trim_case(char** argv, struct queue_entry* q, u8* in_buf) {

  static u8 tmp[64];
  static u8 clean_trace[MAP_SIZE];

  u8  needs_write = 0, fault = 0;
  u32 trim_exec = 0;
  u32 remove_len;
  u32 len_p2;

  /* Although the trimmer will be less useful when variable behavior is
     detected, it will still work to some extent, so we don't check for
     this. */

  if (q->len < 5) return 0;

  stage_name = tmp;
  bytes_trim_in += q->len;

  /* Select initial chunk len, starting with large steps. */

  len_p2 = next_p2(q->len); // 获取最接近的2^n 的n值

  remove_len = MAX(len_p2 / TRIM_START_STEPS, TRIM_MIN_BYTES); // 16  4 也就是这里最大值是4字节或者文件大小的1/4

  /* Continue until the number of steps gets too high or the stepover
     gets too small. */

  while (remove_len >= MAX(len_p2 / TRIM_END_STEPS, TRIM_MIN_BYTES)) {

    u32 remove_pos = remove_len;

    sprintf(tmp, "trim %s/%s", DI(remove_len), DI(remove_len));

    stage_cur = 0;
    stage_max = q->len / remove_len;

    while (remove_pos < q->len) {

      u32 trim_avail = MIN(remove_len, q->len - remove_pos);
      u32 cksum;

      write_with_gap(in_buf, q->len, remove_pos, trim_avail); // 将removs_pos开启长度为trim_avail的内容跳过，其余部分写入到out_fd中

      fault = run_target(argv, exec_tmout);
      trim_execs++;

      if (stop_soon || fault == FAULT_ERROR) goto abort_trimming;

      /* Note that we don't keep track of crashes or hangs here; maybe TODO? */

      cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

      /* If the deletion had no impact on the trace, make it permanent. This
         isn't perfect for variable-path inputs, but we're just making a
         best-effort pass, so it's not a big deal if we end up with false
         negatives every now and then. */

      if (cksum == q->exec_cksum) { // 如果对路径没有影响，那么这里即删除这部分内容

        u32 move_tail = q->len - remove_pos - trim_avail;

        q->len -= trim_avail;
        len_p2  = next_p2(q->len);

        memmove(in_buf + remove_pos, in_buf + remove_pos + trim_avail, 
                move_tail);

        /* Let's save a clean trace, which will be needed by
           update_bitmap_score once we're done with the trimming stuff. */

        if (!needs_write) {

          needs_write = 1;
          memcpy(clean_trace, trace_bits, MAP_SIZE);

        }

      } else remove_pos += remove_len;

      /* Since this can be slow, update the screen every now and then. */

      if (!(trim_exec++ % stats_update_freq)) show_stats();
      stage_cur++;

    }

    remove_len >>= 1;

  }

  /* If we have made changes to in_buf, we also need to update the on-disk
     version of the test case. */

  if (needs_write) {

    s32 fd;

    unlink(q->fname); /* ignore errors */

    fd = open(q->fname, O_WRONLY | O_CREAT | O_EXCL, 0600);

    if (fd < 0) PFATAL("Unable to create '%s'", q->fname);

    ck_write(fd, in_buf, q->len, q->fname);
    close(fd);

    memcpy(trace_bits, clean_trace, MAP_SIZE);
    update_bitmap_score(q);

  }

abort_trimming:

  bytes_trim_out += q->len;
  return fault;

}

这里只对大文件进行剪枝操作，小于5字节大小的文件不会执行剪枝操作。这里首先是设置了一个大步即一开始的remove_len 按照这个长度划分文件为一个个的块，依次尝试删除这个块，如果删除之后对文件执行之后的路径没有任何影响的话，那么就将这种删除进行保留。所有的块遍历结束之后这里将remove_len>>1

如果文件内容发生了改变，那么这里就将更改之后的文件写入到磁盘中，并且这里会更新起相应的trace_bits 及其对应的分数。

update_bitmap_score

该函数的主要作用就是找出一个测试用例，该测试用例对于某组路径触发的消耗最小。

/* When we bump into a new path, we call this to see if the path appears
   more "favorable" than any of the existing ones. The purpose of the
   "favorables" is to have a minimal set of paths that trigger all the bits
   seen in the bitmap so far, and focus on fuzzing them at the expense of
   the rest.

   The first step of the process is to maintain a list of top_rated[] entries
   for every byte in the bitmap. We win that slot if there is no previous
   contender, or if the contender has a more favorable speed x size factor. */

static void update_bitmap_score(struct queue_entry* q) {

  u32 i;
  u64 fav_factor = q->exec_us * q->len;

  /* For every byte set in trace_bits[], see if there is a previous winner,
     and how it compares to us. */

  for (i = 0; i < MAP_SIZE; i++)

    if (trace_bits[i]) {

       if (top_rated[i]) {

         /* Faster-executing or smaller test cases are favored. */

         if (fav_factor > top_rated[i]->exec_us * top_rated[i]->len) continue;

         /* Looks like we're going to win. Decrease ref count for the
            previous winner, discard its trace_bits[] if necessary. */

         if (!--top_rated[i]->tc_ref) {
           ck_free(top_rated[i]->trace_mini);
           top_rated[i]->trace_mini = 0;
         }

       }

       /* Insert ourselves as the new winner. */

       top_rated[i] = q;
       q->tc_ref++;

       if (!q->trace_mini) {
         q->trace_mini = ck_alloc(MAP_SIZE >> 3);
         minimize_bits(q->trace_mini, trace_bits);
       }

       score_changed = 1;

     }

}

这里的做法就是对于给定的测试用例，依次按照bytes遍历其trace_bits ，如果之前对于当前路径有一个最优的测试用例，那么这里就会进行性能比较，涉及到两个元素也就是执行时间以及文件大小，如果当前的测试用例更优的话，那么就更新top_rated 元组中的数据，表示对于这组路径来说当前的测试用例的开销是最小的。

这里实际上反映的是AFL的优胜者策略，也就是执行时间越短、文件大小越小的测试用例对于当前的元组来说就是一个优胜者，这里tc_ref成员变量表示的是当前的测试用例被认为是优胜者的次数。如果是优胜者那么就会将其trace_bits进行压缩，只保留命中信息而去掉命中次数的信息，也就是trace_mini 。

common_fuzz_stuff

该函数在每一次变异结束之后都会被调用用来执行变异之后的测试用例。我们看一下这个函数。

/* Write a modified test case, run program, process results. Handle
   error conditions, returning 1 if it's time to bail out. This is
   a helper function for fuzz_one(). */

EXP_ST u8 common_fuzz_stuff(char** argv, u8* out_buf, u32 len) {

  u8 fault;

  if (post_handler) {

    out_buf = post_handler(out_buf, &len);
    if (!out_buf || !len) return 0;

  }

  write_to_testcase(out_buf, len);

  fault = run_target(argv, exec_tmout);

  if (stop_soon) return 1;

  if (fault == FAULT_TMOUT) {

    if (subseq_tmouts++ > TMOUT_LIMIT) {
      cur_skipped_paths++;
      return 1;
    }

  } else subseq_tmouts = 0;

  /* Users can hit us with SIGUSR1 to request the current input
     to be abandoned. */

  if (skip_requested) {

     skip_requested = 0;
     cur_skipped_paths++;
     return 1;

  }

  /* This handles FAULT_ERROR for us: */

  queued_discovered += save_if_interesting(argv, out_buf, len, fault);

  if (!(stage_cur % stats_update_freq) || stage_cur + 1 == stage_max)
    show_stats();

  return 0;

}

这里的这个post_handler 函数不知道是做什么的。这里首先会调用write_to_testcase 函数来将测试用例写入到磁盘文件中。接着调用run_target 函数来启动目标程序执行这个测试用例。

测试用例执行完毕中之后，这里会调用save_if_interesting 函数来决定是否保存当前的这个变异之后的测试用例。

save_if_interesting

该函数就是用来决定是否保存输入的测试用例的，在变异之后的样本测试中会触发调用。我们来看一下这个函数。

save_if_interesting 函数源码

/* Check if the result of an execve() during routine fuzzing is interesting,
   save or queue the input test case for further analysis if so. Returns 1 if
   entry is saved, 0 otherwise. */

static u8 save_if_interesting(char** argv, void* mem, u32 len, u8 fault) {

  u8  *fn = "";
  u8  hnb;
  s32 fd;
  u8  keeping = 0, res;

  if (fault == crash_mode) {

    /* Keep only if there are new bits in the map, add to queue for
       future fuzzing, etc. */

    if (!(hnb = has_new_bits(virgin_bits))) {
      if (crash_mode) total_crashes++;
      return 0;
    }    

#ifndef SIMPLE_FILES

    fn = alloc_printf("%s/queue/id:%06u,%s", out_dir, queued_paths,
                      describe_op(hnb));

#else

    fn = alloc_printf("%s/queue/id_%06u", out_dir, queued_paths);

#endif /* ^!SIMPLE_FILES */

    add_to_queue(fn, len, 0);

    if (hnb == 2) {
      queue_top->has_new_cov = 1;
      queued_with_cov++;
    }

    queue_top->exec_cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

    /* Try to calibrate inline; this also calls update_bitmap_score() when
       successful. */

    res = calibrate_case(argv, queue_top, mem, queue_cycle - 1, 0);

    if (res == FAULT_ERROR)
      FATAL("Unable to execute target application");

    fd = open(fn, O_WRONLY | O_CREAT | O_EXCL, 0600);
    if (fd < 0) PFATAL("Unable to create '%s'", fn);
    ck_write(fd, mem, len, fn);
    close(fd);

    keeping = 1;

  }

  switch (fault) {

    case FAULT_TMOUT:

      /* Timeouts are not very interesting, but we're still obliged to keep
         a handful of samples. We use the presence of new bits in the
         hang-specific bitmap as a signal of uniqueness. In "dumb" mode, we
         just keep everything. */

      total_tmouts++;

      if (unique_hangs >= KEEP_UNIQUE_HANG) return keeping;

      if (!dumb_mode) {

#ifdef __x86_64__
        simplify_trace((u64*)trace_bits);
#else
        simplify_trace((u32*)trace_bits);
#endif /* ^__x86_64__ */

        if (!has_new_bits(virgin_tmout)) return keeping;

      }

      unique_tmouts++;

      /* Before saving, we make sure that it's a genuine hang by re-running
         the target with a more generous timeout (unless the default timeout
         is already generous). */

      if (exec_tmout < hang_tmout) {

        u8 new_fault;
        write_to_testcase(mem, len);
        new_fault = run_target(argv, hang_tmout);

        /* A corner case that one user reported bumping into: increasing the
           timeout actually uncovers a crash. Make sure we don't discard it if
           so. */

        if (!stop_soon && new_fault == FAULT_CRASH) goto keep_as_crash;

        if (stop_soon || new_fault != FAULT_TMOUT) return keeping;

      }

#ifndef SIMPLE_FILES

      fn = alloc_printf("%s/hangs/id:%06llu,%s", out_dir,
                        unique_hangs, describe_op(0));

#else

      fn = alloc_printf("%s/hangs/id_%06llu", out_dir,
                        unique_hangs);

#endif /* ^!SIMPLE_FILES */

      unique_hangs++;

      last_hang_time = get_cur_time();

      break;

    case FAULT_CRASH:

keep_as_crash:

      /* This is handled in a manner roughly similar to timeouts,
         except for slightly different limits and no need to re-run test
         cases. */

      total_crashes++;

      if (unique_crashes >= KEEP_UNIQUE_CRASH) return keeping;

      if (!dumb_mode) {

#ifdef __x86_64__
        simplify_trace((u64*)trace_bits);
#else
        simplify_trace((u32*)trace_bits);
#endif /* ^__x86_64__ */

        if (!has_new_bits(virgin_crash)) return keeping;

      }

      if (!unique_crashes) write_crash_readme();

#ifndef SIMPLE_FILES

      fn = alloc_printf("%s/crashes/id:%06llu,sig:%02u,%s", out_dir,
                        unique_crashes, kill_signal, describe_op(0));

#else

      fn = alloc_printf("%s/crashes/id_%06llu_%02u", out_dir, unique_crashes,
                        kill_signal);

#endif /* ^!SIMPLE_FILES */

      unique_crashes++;

      last_crash_time = get_cur_time();
      last_crash_execs = total_execs;

      break;

    case FAULT_ERROR: FATAL("Unable to execute target application");

    default: return keeping;

  }

  /* If we're here, we apparently want to save the crash or hang
     test case, too. */

  fd = open(fn, O_WRONLY | O_CREAT | O_EXCL, 0600);
  if (fd < 0) PFATAL("Unable to create '%s'", fn);
  ck_write(fd, mem, len, fn);
  close(fd);

  ck_free(fn);

  return keeping;

}

这里是根据在调用目标程序执行变异之后的测试用例时所返回的结果分类进行分析的。

返回的结果是crash_mode 。这里在run_target 中没有找到相应的返回结果的情况。这里我们找到crash_mode的赋值的位置只有当我们设置-C参数的时候，crash_mode会被赋值为FAULT_CRASH 。这里针对crash_mode的解释可以看参数处理部分。这里如果返回的结果是crash_mode的话，首先会调用has_new_bits 函数来判断是否产生了新路径。这里再次复习一下产生新路径的判断方式就是trace_bits中是否有新的Bytes不为0。如果产生了新的路径，此时就会将这个测试用例加入到队列中，调用calibrate_case 函数来校准测试用例，最后将变异之后的样本写入到磁盘文件中，注意到这里写入到的磁盘文件是在queue也就是队列的目录下，会参与接下来的Fuzz。注意到这里的命名也是特殊设计的，其保留了当前测试用例得到的方式。
返回的结果是FAULT_TMOUT，也就是超时，这里如果是dump mode那么就会保存下来所有的测试用例。实际上这里的超时并不是那么受欢迎。但是这里还是去判断是否产生了新路径。没有产生新路径的话，那么这里就直接返回了。如果产生了新的路径，这里会再次执行一遍目标程序来保证是真正的超时，如果再次时候的时候返回的结果是CRASH，那么这里进入到crash的逻辑中进行处理。否则这里接下来就会将该测试用例写入到hang目录下。
返回的结果是FAULT_CRASH，也就是此时出现了Crash，如果不是dumb_mode 的话，那么这里还是会判断是否产生了新的路径，如果没有产生新的路径的话，这里直接返回了。否则这里将会将当前的测试用例保存到crashes目录下

has_new_bits

这实际上是一个很重要的函数，用来判断在运行过程中是否产生了新的路径，是基于路径覆盖的指导变异的基础。关键部分的代码如下

if (unlikely(*current) && unlikely(*current & *virgin)) {

      if (likely(ret < 2)) {

        u8* cur = (u8*)current;
        u8* vir = (u8*)virgin;

        /* Looks like we have not found any new bytes yet; see if any non-zero
           bytes in current[] are pristine in virgin[]. */

#ifdef __x86_64__

        if ((cur[0] && vir[0] == 0xff) || (cur[1] && vir[1] == 0xff) ||
            (cur[2] && vir[2] == 0xff) || (cur[3] && vir[3] == 0xff) ||
            (cur[4] && vir[4] == 0xff) || (cur[5] && vir[5] == 0xff) ||
            (cur[6] && vir[6] == 0xff) || (cur[7] && vir[7] == 0xff)) ret = 2;
        else ret = 1;

#else

        if ((cur[0] && vir[0] == 0xff) || (cur[1] && vir[1] == 0xff) ||
            (cur[2] && vir[2] == 0xff) || (cur[3] && vir[3] == 0xff)) ret = 2;
        else ret = 1;

#endif /* ^__x86_64__ */

      }

      *virgin &= ~*current;

    }

这里我们看到virgin 表示的是运行之前的bitsmap，而current表示的则是运行之后的bitsmap。这里==操作符的优先级要高于&& 。也就是说ret=2的时候表示的是产生了新的路径。而ret=1则表示的是命令某一条路径的次数发生了变化。这里只要产生了新的路径就会立即返回，也就是说命中某一条路径的次数的变化的优先级要低于产生新的路径。

这里需要注意的是命中次数并不是我们平常理解的命中次数，而是存在一个桶的概念，只有命中次数从一个桶转换到另一个桶的时候才会认为是命中次数的变化。例如从[32 ... 127] 转换到[128 ... 255] 。

变异策略

通过前面的分析这里我们知道一共存在有六种变异策略。

SIMPLE BITFLIP
ARITHMETIC INC/DEC
INTERESTING VALUES
DICTIONARY STUFF
RANDOM HAVOC
SPLICING

这里我们依次进行一下分析

SIMPLE BITFLIP

从名称这里也可以看出来是一个简单的bit翻转，这里对bit翻转操作的定义如下

#define FLIP_BIT(_ar, _b) do { \
    u8* _arf = (u8*)(_ar); \
    u32 _bf = (_b); \
    _arf[(_bf) >> 3] ^= (128 >> ((_bf) & 7)); \
  } while (0)

这里传入的第一个参数是out_buf也就是要变异的测试用例文件内容，而第二个参数传入的是stage_cur 也就是当前变异的轮数，该值从0增涨到特定的长度。我们看到这里实际上每一次变异都是以8bit也就是以一个bit为单位的。

实际上这里在执行SIMPLE BITFLIP变异的时候这里并不只是执行一次FLIP_BIT函数，而是分为好几轮。

bitflip 1/1

bitflip 1/1这一部分的代码如下

/* Single walking bit. */

stage_short = "flip1";
stage_max   = len << 3;
stage_name  = "bitflip 1/1";

stage_val_type = STAGE_VAL_NONE;

orig_hit_cnt = queued_paths + unique_crashes;

prev_cksum = queue_cur->exec_cksum;

for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {

  stage_cur_byte = stage_cur >> 3;

  FLIP_BIT(out_buf, stage_cur);

  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

  FLIP_BIT(out_buf, stage_cur);

  /* While flipping the least significant bit in every byte, pull of an extra
     trick to detect possible syntax tokens. In essence, the idea is that if
     you have a binary blob like this:

     xxxxxxxxIHDRxxxxxxxx

     ...and changing the leading and trailing bytes causes variable or no
     changes in program flow, but touching any character in the "IHDR" string
     always produces the same, distinctive path, it's highly likely that
     "IHDR" is an atomically-checked magic value of special significance to
     the fuzzed format.

     We do this here, rather than as a separate stage, because it's a nice
     way to keep the operation approximately "free" (i.e., no extra execs).
     
     Empirically, performing the check when flipping the least significant bit
     is advantageous, compared to doing it at the time of more disruptive
     changes, where the program flow may be affected in more violent ways.

     The caveat is that we won't generate dictionaries in the -d mode or -S
     mode - but that's probably a fair trade-off.

     This won't work particularly well with paths that exhibit variable
     behavior, but fails gracefully, so we'll carry out the checks anyway.

    */

  if (!dumb_mode && (stage_cur & 7) == 7) {

    u32 cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

    if (stage_cur == stage_max - 1 && cksum == prev_cksum) {

      /* If at end of file and we are still collecting a string, grab the
         final character and force output. */

      if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];
      a_len++;

      if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
        maybe_add_auto(a_collect, a_len);

    } else if (cksum != prev_cksum) {

      /* Otherwise, if the checksum has changed, see if we have something
         worthwhile queued up, and collect that if the answer is yes. */

      if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
        maybe_add_auto(a_collect, a_len);

      a_len = 0;
      prev_cksum = cksum;

    }

    /* Continue collecting string, but only if the bit flip actually made
       any difference - we don't want no-op tokens. */

    if (cksum != queue_cur->exec_cksum) {

      if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];        
      a_len++;

    }

  }

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP1]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP1] += stage_max;

这一部分的主要作用实际上是分析Token，例如固定文件结构前面的magic number。因为我们知道如果目标程序只处理特定类型的文件的话，那么在处理之前就会首先去判断文件开始的magic number以确定文件的类型。如果在Fuzz过程中magic number被改变，那么相较于magic number正确的情况就会产生比较大的路径变化，并且由于magic number一般为四字节，因此这里改变前四个字节中的任意一个实际上最后得到的路径信息都是相同的。因此这里通过前期的bit翻转来确定某些Token以改进AFL对固定结构的Fuzz效果。

这里如果对一个文件来说我们知道其固定字节的长度不超过4字节，那么这里我们就可以更改MAX_AUTO_EXTRA全局变量，不过这就需要重新编译AFL。

这里每次变异结束之后都会进行一个复原，至于上面说的路径信息的比对则是在每8次翻转之后才进行一个操作。也就是这里对每一个字节的最低bit位进行翻转的结果进行一个判断。如果得到的路径信息和之前的checksum不同，并且接下来连续的几个字节的bit翻转之后的路径信息相同，那么就将这几个字节加入到字典中。

bitflip 2/1

bitflip 2/1这一部分的代码如下

/* Two walking bits. */

stage_name  = "bitflip 2/1";
stage_short = "flip2";
stage_max   = (len << 3) - 1;

orig_hit_cnt = new_hit_cnt;

for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {

  stage_cur_byte = stage_cur >> 3;

  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);

  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP2]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP2] += stage_max;

从这里我们其实可以看到，这次是每次翻转2bit，步长是1bit。也就是这里开始进入到了真正的bit翻转的变异过程。

bitflip 4/1

/* Four walking bits. */

stage_name  = "bitflip 4/1";
stage_short = "flip4";
stage_max   = (len << 3) - 3;

orig_hit_cnt = new_hit_cnt;

for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {

  stage_cur_byte = stage_cur >> 3;

  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);
  FLIP_BIT(out_buf, stage_cur + 2);
  FLIP_BIT(out_buf, stage_cur + 3);

  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);
  FLIP_BIT(out_buf, stage_cur + 2);
  FLIP_BIT(out_buf, stage_cur + 3);

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP4]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP4] += stage_max;

步长为1bit，然后每次翻转4bit。

bitflip 8/8

实际上这个函数用来初始化eff_map，该map用来指导之后的变异。

bitflip 8/8部分的代码如下

/* Effector map setup. These macros calculate:

     EFF_APOS      - position of a particular file offset in the map.
     EFF_ALEN      - length of a map with a particular number of bytes.
     EFF_SPAN_ALEN - map span for a sequence of bytes.

   */

#define EFF_APOS(_p)          ((_p) >> EFF_MAP_SCALE2)
#define EFF_REM(_x)           ((_x) & ((1 << EFF_MAP_SCALE2) - 1))
#define EFF_ALEN(_l)          (EFF_APOS(_l) + !!EFF_REM(_l))
#define EFF_SPAN_ALEN(_p, _l) (EFF_APOS((_p) + (_l) - 1) - EFF_APOS(_p) + 1)

  /* Initialize effector map for the next step (see comments below). Always
     flag first and last byte as doing something. */

  eff_map    = ck_alloc(EFF_ALEN(len));
  eff_map[0] = 1;

  if (EFF_APOS(len - 1) != 0) {
    eff_map[EFF_APOS(len - 1)] = 1;
    eff_cnt++;
  }

  /* Walking byte. */

  stage_name  = "bitflip 8/8";
  stage_short = "flip8";
  stage_max   = len;

  orig_hit_cnt = new_hit_cnt;

  for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {

    stage_cur_byte = stage_cur;

    out_buf[stage_cur] ^= 0xFF;

    if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

    /* We also use this stage to pull off a simple trick: we identify
       bytes that seem to have no effect on the current execution path
       even when fully flipped - and we skip them during more expensive
       deterministic stages, such as arithmetics or known ints. */

    if (!eff_map[EFF_APOS(stage_cur)]) {

      u32 cksum;

      /* If in dumb mode or if the file is very short, just flag everything
         without wasting time on checksums. */

      if (!dumb_mode && len >= EFF_MIN_LEN)
        cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);
      else
        cksum = ~queue_cur->exec_cksum;

      if (cksum != queue_cur->exec_cksum) {
        eff_map[EFF_APOS(stage_cur)] = 1;
        eff_cnt++;
      }

    }

    out_buf[stage_cur] ^= 0xFF;

  }

  /* If the effector map is more than EFF_MAX_PERC dense, just flag the
     whole thing as worth fuzzing, since we wouldn't be saving much time
     anyway. */

  if (eff_cnt != EFF_ALEN(len) &&
      eff_cnt * 100 / EFF_ALEN(len) > EFF_MAX_PERC) {

    memset(eff_map, 1, EFF_ALEN(len));

    blocks_eff_select += EFF_ALEN(len);

  } else {

    blocks_eff_select += eff_cnt;

  }

  blocks_eff_total += EFF_ALEN(len);

  new_hit_cnt = queued_paths + unique_crashes;

  stage_finds[STAGE_FLIP8]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_FLIP8] += stage_max;

  /* Two walking bytes. */

  if (len < 2) goto skip_bitflip;

首先创建了一个eff_map，其大小为测试用例的字节数。然后这里将第一个元素和最后一个元素设置为1，其余设置为0。

那么接下来就进入到循环中，也就是不断的变异中，这里我们看到此时步长是1字节，每次翻转一个字节。翻转之后即执行common_fuzz_stuff 函数，在执行完毕之后会根据eff_map 中对应字节来判断，在第一次运行的时候实际上eff_map 相应的位置表示的就是当前index对应的字节是否被翻转过。如果没有被翻转过的话，那么这里就会检查路径信息。如果本次变异之后的测试用例导致了路径信息的变化，那么就将eff_map 置为1。

实际上这里eff_map 表就表示测试用例中有价值的字节的位置。如果其对应的eff_map 中的值为1的话，那么就认为这个字节的变异是有意义的，如果当前的字节是没有意义的话，那么就会在之后的变异过程中跳过对这个字节的变异。

但是这里存在两个特例

如果当前测试用例文件大小小于128字节，那么这里就会将当前测试用例的所有字节标记为有意义的。
如果当前测试用例文件的90%及之上的字节都是有意义的，那么整个文件的所有字节都是有意义的。

bitflip 16/8

注意到从这里开始，如果当前的测试用例文件小于2字节的话，那么接下来的变异都不会继续执行了。

bitflip 16/8的代码如下

stage_name  = "bitflip 16/8";
stage_short = "flip16";
stage_cur   = 0;
stage_max   = len - 1;

orig_hit_cnt = new_hit_cnt;

for (i = 0; i < len - 1; i++) {

  /* Let's consult the effector map... */

  if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)]) {
    stage_max--;
    continue;
  }

  stage_cur_byte = i;

  *(u16*)(out_buf + i) ^= 0xFFFF;

  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  stage_cur++;

  *(u16*)(out_buf + i) ^= 0xFFFF;

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP16]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP16] += stage_max;

if (len < 4) goto skip_bitflip;

这里就会用到之前我们计算的eff_map 的信息，这里我们看到此时的步长是1字节，每次翻转2字节的内容。对于eff_map 中连续为0的两个字节，那么不会对这两个字节执行变异的操作。

bitflip 32/8

这里如果测试用例的文件小于4字节的话，那么接下来的变异都不会继续执行了

bitflip 32/8

/* Four walking bytes. */

stage_name  = "bitflip 32/8";
stage_short = "flip32";
stage_cur   = 0;
stage_max   = len - 3;

orig_hit_cnt = new_hit_cnt;

for (i = 0; i < len - 3; i++) {

  /* Let's consult the effector map... */
  if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)] &&
      !eff_map[EFF_APOS(i + 2)] && !eff_map[EFF_APOS(i + 3)]) {
    stage_max--;
    continue;
  }

  stage_cur_byte = i;

  *(u32*)(out_buf + i) ^= 0xFFFFFFFF;

  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  stage_cur++;

  *(u32*)(out_buf + i) ^= 0xFFFFFFFF;

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP32]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP32] += stage_max;

这里的逻辑实际上和bitflip 16/8相同，只不过这里的步长是1字节，每次翻转4字节的内容。而对于eff_map 中连续为0的四个字节，那么就不会对这四个字节执行变异的操作。

ARITHMETIC INC/DEC

这里如果在AFL启动的时候设置了AFL_NO_ARITH 环境变量的话，那么这里就不会执行本次变异。

arith 8/8

arith 8/8 部分的代码如下

/* 8-bit arithmetics. */

stage_name  = "arith 8/8";
stage_short = "arith8";
stage_cur   = 0;
stage_max   = 2 * len * ARITH_MAX;

stage_val_type = STAGE_VAL_LE;

orig_hit_cnt = new_hit_cnt;

for (i = 0; i < len; i++) {

  u8 orig = out_buf[i];

  /* Let's consult the effector map... */

  if (!eff_map[EFF_APOS(i)]) {
    stage_max -= 2 * ARITH_MAX;
    continue;
  }

  stage_cur_byte = i;

  for (j = 1; j <= ARITH_MAX; j++) {

    u8 r = orig ^ (orig + j);

    /* Do arithmetic operations only if the result couldn't be a product
       of a bitflip. */

    if (!could_be_bitflip(r)) {

      stage_cur_val = j;
      out_buf[i] = orig + j;

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;

    } else stage_max--;

    r =  orig ^ (orig - j);

    if (!could_be_bitflip(r)) {

      stage_cur_val = -j;
      out_buf[i] = orig - j;

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;

    } else stage_max--;

    out_buf[i] = orig;

  }

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_ARITH8]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_ARITH8] += stage_max;

/* 16-bit arithmetics, both endians. */

if (len < 2) goto skip_arith;

我们看到这里的步长是1字节，每次对1字节进行算数运算。这里的算数运算的逻辑如下

首先这里会根据eff_map来对当前字节是否有价值进行变异进行一个判断，如果有价值的话，那么就会执行如下的变异策略

1 2	u8 r = orig ^ (orig + j); r = orig ^ (orig - j);

这里的orig 表示的是当前需要进行运算的字节，这里的j表示的是一个算数，该算数从0递增到ARITH_MAX=35 。这里计算得到结果之后会调用could_be_bitflip 函数对这个字节能否执行bit翻转进行一个判断。这里判断的逻辑是运算之后的字节的值不和之前bit翻转时候的值相同，以减少重复。

arith 16/8

这里的逻辑也是相对比较简单的，这里的步长还是1字节，而对于每两个字节都对其尝试了-35,35之内的数值的加减，同时这里考虑了大小端。当然这里也使用了eff_map 中的信息

arith 32/8

与arith 16/8 相同，只不过这里变成了四字节的-35,35之内的数值的加减

INTERESTING VALUES

这里在全局变量中定义了一些感兴趣的数值，实际上这些数值表示了一些临界值。

static s8  interesting_8[]  = { INTERESTING_8 };
static s16 interesting_16[] = { INTERESTING_8, INTERESTING_16 };
static s32 interesting_32[] = { INTERESTING_8, INTERESTING_16, INTERESTING_32 };

#define INTERESTING_8 \
  -128,          /* Overflow signed 8-bit when decremented  */ \
  -1,            /*                                         */ \
   0,            /*                                         */ \
   1,            /*                                         */ \
   16,           /* One-off with common buffer size         */ \
   32,           /* One-off with common buffer size         */ \
   64,           /* One-off with common buffer size         */ \
   100,          /* One-off with common buffer size         */ \
   127           /* Overflow signed 8-bit when incremented  */

interest 8/8

interest 8/8 代码如下

stage_name  = "interest 8/8";
stage_short = "int8";
stage_cur   = 0;
stage_max   = len * sizeof(interesting_8);

stage_val_type = STAGE_VAL_LE;

orig_hit_cnt = new_hit_cnt;

/* Setting 8-bit integers. */

for (i = 0; i < len; i++) {

  u8 orig = out_buf[i];

  /* Let's consult the effector map... */

  if (!eff_map[EFF_APOS(i)]) {
    stage_max -= sizeof(interesting_8);
    continue;
  }

  stage_cur_byte = i;

  for (j = 0; j < sizeof(interesting_8); j++) {

    /* Skip if the value could be a product of bitflips or arithmetics. */

    if (could_be_bitflip(orig ^ (u8)interesting_8[j]) ||
        could_be_arith(orig, (u8)interesting_8[j], 1)) {
      stage_max--;
      continue;
    }

    stage_cur_val = interesting_8[j];
    out_buf[i] = interesting_8[j];

    if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

    out_buf[i] = orig;
    stage_cur++;

  }

}

new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_INTEREST8]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_INTEREST8] += stage_max;

/* Setting 16-bit integers, both endians. */

if (no_arith || len < 2) goto skip_interest;

这里就是将测试用例中的某一个字节替换为上述定义的一些字节，这里是每次替换1字节。那么在替换之前还是会检查是否和之前的变异重复。

interest 16/8

和interest 8/8 相同，只不过这里每次替换两字节。

interest 32/8

同理，这里每次替换四字节

DICTIONARY STUFF

在这一个阶段将会采用字典中的数据来替换测试用例中的某些数据。这里一共分为三种模式，如下

user extras (over)

user extras (over) 代码部分如下

if (!extras_cnt) goto skip_user_extras;

  /* Overwrite with user-supplied extras. */

  stage_name  = "user extras (over)";
  stage_short = "ext_UO";
  stage_cur   = 0;
  stage_max   = extras_cnt * len;

  stage_val_type = STAGE_VAL_NONE;

  orig_hit_cnt = new_hit_cnt;

  for (i = 0; i < len; i++) {

    u32 last_len = 0;

    stage_cur_byte = i;

    /* Extras are sorted by size, from smallest to largest. This means
       that we don't have to worry about restoring the buffer in
       between writes at a particular offset determined by the outer
       loop. */

    for (j = 0; j < extras_cnt; j++) {

      /* Skip extras probabilistically if extras_cnt > MAX_DET_EXTRAS. Also
         skip them if there's no room to insert the payload, if the token
         is redundant, or if its entire span has no bytes set in the effector
         map. */

      if ((extras_cnt > MAX_DET_EXTRAS && UR(extras_cnt) >= MAX_DET_EXTRAS) ||
          extras[j].len > len - i ||
          !memcmp(extras[j].data, out_buf + i, extras[j].len) ||
          !memchr(eff_map + EFF_APOS(i), 1, EFF_SPAN_ALEN(i, extras[j].len))) {

        stage_max--;
        continue;

      }

      last_len = extras[j].len;
      memcpy(out_buf + i, extras[j].data, last_len);

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

      stage_cur++;

    }

    /* Restore all the clobbered memory. */
    memcpy(out_buf + i, in_buf + i, last_len);

  }

  new_hit_cnt = queued_paths + unique_crashes;

  stage_finds[STAGE_EXTRAS_UO]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_EXTRAS_UO] += stage_max;

这里我们看到该部分的操作就是，步长为1字节，然后依次将字典中的字符串替换到out_buf+index的位置。这里收到eff_map的影响。

并且这里会检查字典中的Token的数量，如果其大于MAX_DET_EXTRAS=200 ，那么对Token就会存在一个概率，计算如下

1	UR(extras_cnt) >= MAX_DET_EXTRAS // 这里UR就是产生一个0-extras_cnt的随机值。

也就是这个概率随着字典中Token数量的增大而减小。

user extras (insert)

那么这里就是插入操作，也就是步长为1字节，将字典中的字符串插入到相应的位置，这里很明显不会收到eff_map的影响。并且这里不受字典长度的限制

auto extras (over)

注意到这里的字典从extras 替换为了a_extras ，也就是这一步的字典使用的实际上是我们前面在进行bitflip的时候所自动生成的字典信息。这里将以1字节为步长，将相应位置替换为字典中的字符串。

RANDOM HAVOC

这一部分就是随机变异的内容，在这一步中将会在下面的几种变异方式中随机选择一种

随机选取测试用例中的1bit进行翻转
随机选取测试用例中的1字节替换为interesting_8 中随机选取的一个值
随机选取测试用例中的2字节，随机选取大小端，替换为interesting_16 中随机选取的一个值
随机选取测试用例中的4字节，随机选取大小端，替换为interesting_32 中随机选取的一个值
随机选取测试用例中的1字节，对其减去1-35中的一个随机数
随机选取测试用例中的1字节，对其加上1-35中的一个随机数
随机选取测试用例中的2字节，随机选取大小端，对其减去1-35中的一个随机数
随机选取测试用例中的2字节，随机选取大小端，对其加上1-35中的一个随机数
随机选取测试用例中的4字节，随机选取大小端，对其减去1-35中的一个随机数
随机选取测试用例中的4字节，随机选取大小端，对其加上1-35中的一个随机数
随机选取测试用例中的1字节，将其异或1-255之间的一个随机值
随机选取测试用例中的一段字节，将其删除
随机选取测试用例中的一个位置，以75%的概率插入测试用例中的一段随机位置、随机长度的字符串，以25%的概率插入一段随机数
随机选取测试用例中的一个位置，以75%的概率替换为测试用例中的一段随机位置、随机长度的字符串，以25%的概率替换为一段随机数
如果字典不为空（系统字典+用户字典），那么随机选取一个字典，随机选取字典中的一个字符串，将其替换到测试用例中的随机位置
如果字典不为空（系统字典+用户字典），那么随机选取一个字典，随机选取字典中的一个字符串，将其插入到测试用例中的随机位置

SPLICING

Splicing部分代码如下

/* This is a last-resort strategy triggered by a full round with no findings.
     It takes the current input file, randomly selects another input, and
     splices them together at some offset, then relies on the havoc
     code to mutate that blob. */

retry_splicing:

  if (use_splicing && splice_cycle++ < SPLICE_CYCLES &&
      queued_paths > 1 && queue_cur->len > 1) {

    struct queue_entry* target;
    u32 tid, split_at;
    u8* new_buf;
    s32 f_diff, l_diff;

    /* First of all, if we've modified in_buf for havoc, let's clean that
       up... */

    if (in_buf != orig_in) {
      ck_free(in_buf);
      in_buf = orig_in;
      len = queue_cur->len;
    }

    /* Pick a random queue entry and seek to it. Don't splice with yourself. */

    do { tid = UR(queued_paths); } while (tid == current_entry);

    splicing_with = tid;
    target = queue;

    while (tid >= 100) { target = target->next_100; tid -= 100; }
    while (tid--) target = target->next;

    /* Make sure that the target has a reasonable length. */

    while (target && (target->len < 2 || target == queue_cur)) {
      target = target->next;
      splicing_with++;
    }

    if (!target) goto retry_splicing;

    /* Read the testcase into a new buffer. */

    fd = open(target->fname, O_RDONLY);

    if (fd < 0) PFATAL("Unable to open '%s'", target->fname);

    new_buf = ck_alloc_nozero(target->len);

    ck_read(fd, new_buf, target->len, target->fname);

    close(fd);

    /* Find a suitable splicing location, somewhere between the first and
       the last differing byte. Bail out if the difference is just a single
       byte or so. */

    locate_diffs(in_buf, new_buf, MIN(len, target->len), &f_diff, &l_diff);

    if (f_diff < 0 || l_diff < 2 || f_diff == l_diff) {
      ck_free(new_buf);
      goto retry_splicing;
    }

    /* Split somewhere between the first and last differing byte. */

    split_at = f_diff + UR(l_diff - f_diff);

    /* Do the thing. */

    len = target->len;
    memcpy(new_buf, in_buf, split_at);
    in_buf = new_buf;

    ck_free(out_buf);
    out_buf = ck_alloc_nozero(len);
    memcpy(out_buf, in_buf, len);

    goto havoc_stage;

  }

这一部分做的事测试用例之间的切割和重组。

具体来说，这里首先是随机选取了另一个测试用例，然后调用locate_diffs 函数，在这两个测试用例的第一个不同的字节到最后一个不同的字节之间随机选取一个位置，将这两个测试用例分别分为两个部分，将当前测试用例的第一部分和随机选取的测试用例的第二部分进行一个重组。

这里的locate_diffs 实际上就是循环遍历长度最小的测试用例，然后字符之间进行比较。

路径信息的获取和分析

这里前面实际上一直存在一个问题，就是问什么我们调用完run_target之后，就能够获取得到运行过程中的路径信息呢。这里我们先来看一下初始化时候的函数也就是setup_shm 函数

/* Configure shared memory and virgin_bits. This is called at startup. */

EXP_ST void setup_shm(void) {

  u8* shm_str;

  if (!in_bitmap) memset(virgin_bits, 255, MAP_SIZE);

  memset(virgin_tmout, 255, MAP_SIZE);
  memset(virgin_crash, 255, MAP_SIZE);

  shm_id = shmget(IPC_PRIVATE, MAP_SIZE, IPC_CREAT | IPC_EXCL | 0600);

  if (shm_id < 0) PFATAL("shmget() failed");

  atexit(remove_shm);

  shm_str = alloc_printf("%d", shm_id);

  /* If somebody is asking us to fuzz instrumented binaries in dumb mode,
     we don't want them to detect instrumentation, since we won't be sending
     fork server commands. This should be replaced with better auto-detection
     later on, perhaps? */

  if (!dumb_mode) setenv(SHM_ENV_VAR, shm_str, 1);

  ck_free(shm_str);

  trace_bits = shmat(shm_id, NULL, 0);
  
  if (!trace_bits) PFATAL("shmat() failed");

}

shmget 函数会初始化一段MAP_SIZE 大小的共享内存。这里IPC_PRIVATE 的值是0，那么这里就会创建新的共享对象，这里返回的结果是共享内存的标识符。IPC_CREAT|IPC_EXCL 标识符表示的则是如果内核中不存在键值与key相等的共享内存，那么这里将会建立一个新的消息队列，如果存在这样的共享内存那么就会报错。

注意到这里如果不是dumb_mode 的话，那么就会将获取得到的共享内存的标识写入到环境变量中即SHM_ENV_VAR 。

之后将调用shmat函数将创建的共享内存对象映射到调用进程的地址空间中，也就是trace_bits 指向的内存空间，这里看到是我们保存路径信息的地方。

这里在每次调用run_target开始Fuzz的时候会首先调用如下的代码

1	memset(trace_bits, 0, MAP_SIZE);

即将trace_bits 信息全部清空。那么之后无论是该进程调用fork还是和forkserver进行通信的话，就没有再操作trace_bits 这一块内存空间了。

要想知道接下来的操作，我们这里还需要再看一下afl-as.c 文件的内容

插桩部分源码分析

这一部分的主要代码是afl-as.c

什么时候进行插桩

插桩位置1-基本块的开始

从前面的源码文件结构中我们实际上可以知道，该部分的代码实现的主要就是插桩功能。Main函数的逻辑比较简单，这里主要功能的实现就是add_instrumentation 函数，这里我们看一下函数的功能，这里首先从input_file 中读取数据，将修改后的文件的内容写入到modified_file 中。

这里会依次的读取文件中的每一行的内容。在符合条件的情况下，会按照64位或者32位的方式插入一段代码，如下

if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
    instrument_next && line[0] == '\t' && isalpha(line[1])) {

  fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
          R(MAP_SIZE));

  instrument_next = 0;
  ins_lines++;

}
fputs(line, outf);

那么这里是在哪写地方进行插入呢，这里实际上只有在.text段才会执行插入的操作，如下

if (!strncmp(line + 2, "text\n", 5) ||
    !strncmp(line + 2, "section\t.text", 13) ||
    !strncmp(line + 2, "section\t__TEXT,__text", 21) ||
    !strncmp(line + 2, "section __TEXT,__text", 21)) {
  instr_ok = 1;
  continue; 
}

我们这里看一下其他的成员变量的赋值过程，以了解在什么时候执行插入的操作，首先这里pass_thru 成员变量

1
2
3

if (strncmp(input_file, tmp_dir, strlen(tmp_dir)) &&
    strncmp(input_file, "/var/tmp/", 9) &&
    strncmp(input_file, "/tmp/", 5)) pass_thru = 1;

这里实际上是在之前的edit_params 函数中触发调用的，这里我们可以看到只是判断了一下输入文件的路径。一般情况下这里pass_thru 的值应该是0。接下来我们看一下skip_intel 成员变量

1 2	if (strstr(line, ".intel_syntax")) skip_intel = 1; if (strstr(line, ".att_syntax")) skip_intel = 0;

这里表示的是接下来的汇编的格式，这里att_syntax表示的是AT&T格式的汇编代码，而intel_syntax则表示的是Intel格式的汇编代码，这里我们看到实际上只有针对AT&T格式的汇编代码才会执行插入的操作。接下来我们看一下skip_app 成员变量

if (line[0] == '#' || line[1] == '#') {

  if (strstr(line, "#APP")) skip_app = 1;
  if (strstr(line, "#NO_APP")) skip_app = 0;

}

这里在汇编文件中，#APP 和#NO_APP 这两个标签中间包裹的内容实际上是我们在代码中插入的內联汇编的代码，也就是这里不会对內联汇编进行插入操作。我们这里再来看一下skip_csect

if (strstr(line, ".code")) {

  if (strstr(line, ".code32")) skip_csect = use_64bit;
  if (strstr(line, ".code64")) skip_csect = !use_64bit;

}

这里如果是64位的话，那么对汇编中的32位的汇编代码不进行插入，同理对于32位程序，汇编中的64位汇编代码不会进行插入。instrument_next 这个变量则是一个关键的变量，实际上这里由于我们处理的是汇编文件，在产生汇编的时候，这里基本块已经都分配好了，instrument_next 则表示的是下一次是否插入，其赋值则是通过查找当前是否是一个块的开始来判断的，这里代码如下

if (strstr(line, ":")) {

    if (line[0] == '.') {

#endif /* __APPLE__ */

      /* .L0: or LBB0_0: style jump destination */

#ifdef __APPLE__

//...

#else

      /* Apple: .L<num> / .LBB<num> */

      if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
          && R(100) < inst_ratio) {

#endif /* __APPLE__ */

        /* An optimization is possible here by adding the code only if the
           label is mentioned in the code in contexts other than call / jmp.
           That said, this complicates the code by requiring two-pass
           processing (messy with stdin), and results in a speed gain
           typically under 10%, because compilers are generally pretty good
           about not generating spurious intra-function jumps.

           We use deferred output chiefly to avoid disrupting
           .Lfunc_begin0-style exception handling calculations (a problem on
           MacOS X). */

        if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;

      }

    } else {

      /* Function label (always instrumented, deferred mode). */

      instrument_next = 1;
  
    }
}

注意到这里并不是所有的基本块都会进行代码的插入，实际上这里存在一个概率inst_ratio 虽然默认的值是100，但是我们可以更改这个值。注意这里判断的实际上就是切分的Block，我们这里可以看一个例子，源码如下

例子的源码

#include <stdio.h>

int main() {
    int i = 0;
    {
        int i = 1;
        printf("%d\n", i);
    }
    return 0;
}

注意到这里只有在保留调试信息的时候才会保留本地符号也就是.L开头的符号，我们这里生成汇编的命令如下

1	gcc -g -m64 -S -o test.s test.c

生成的部分汇编如下

main:
.LFB0:
        .file 1 "test.c"
        .loc 1 3 12
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        .loc 1 4 9
        movl    $0, -8(%rbp)
.LBB2:
        .loc 1 6 13
        movl    $1, -4(%rbp)
        .loc 1 7 9
        movl    -4(%rbp), %eax
        movl    %eax, %esi
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
.LBE2:
        .loc 1 9 12
        movl    $0, %eax
        .loc 1 10 1
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:

另一种形式的汇编

.L3:
        .loc 1 8 9 discriminator 3
        movl    -4(%rbp), %eax
        movl    %eax, %esi
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
        .loc 1 5 24 discriminator 3
        addl    $1, -4(%rbp)
.L2:
        .loc 1 5 5 discriminator 1
        cmpl    $4, -4(%rbp)
        jle     .L3
        .loc 1 10 12
        movl    $0, %eax
        .loc 1 11 1
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

这里我们需要注意的是这里call并不会作为基本块划分的条件。同时这里条件跳转也不会进行基本块的划分。

也就是这里.LFB0表示的是一个函数的开始部分，同时应该也表示一个基本块的开始，也就是Local Function Begin，后面的0表示的是函数的index的值。那么下面的.LBB2 则就表示的是一个基本块的开始，同时index=2。也就是这里在每个基本块的开始进行了代码的插入。

插桩位置2-条件跳转之后

同时这里还存在另一处的插桩代码如下

if (line[0] == '\t') {

  if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {

    fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
            R(MAP_SIZE));

    ins_lines++;

  }

  continue;

}

也就是这里对与条件跳转，在条件跳转之后有一定的几率去进行插桩，这里几率默认的则是100。这里只在条件跳转语句之后插桩是没有问题的，因为跳转的目标一定是一个基本块的开始，我们之前已经在其起始位置插桩完成了。

插桩位置3-结尾

1 2	if (ins_lines) fputs(use_64bit ? main_payload_64 : main_payload_32, outf);

注意到这里在整个文件处理完毕之后还会插入一段内容。

插桩的内容

这里通过前面的分析我们知道一共插入了两种内容，分别是trampoline_fmt_64 以及main_payload_64 这里从插入的位置来看，我们实际上就可以推测出trampoline_fmt_64 负责跳转到路径记录的函数，而main_payload_64 则包含了一些初始化或者记录路径信息的函数。这里我们分别来看一下，需要注意的是这里只是针对64位下的代码。

trampoline_fmt_64

static const u8* trampoline_fmt_64 =

  "\n"
  "/* --- AFL TRAMPOLINE (64-BIT) --- */\n"
  "\n"
  ".align 4\n"
  "\n"
  "leaq -(128+24)(%%rsp), %%rsp\n"
  "movq %%rdx,  0(%%rsp)\n"
  "movq %%rcx,  8(%%rsp)\n"
  "movq %%rax, 16(%%rsp)\n"
  "movq $0x%08x, %%rcx\n"
  "call __afl_maybe_log\n"
  "movq 16(%%rsp), %%rax\n"
  "movq  8(%%rsp), %%rcx\n"
  "movq  0(%%rsp), %%rdx\n"
  "leaq (128+24)(%%rsp), %%rsp\n"
  "\n"
  "/* --- END --- */\n"
  "\n";

这里逻辑很简单，即调用了__afl_maybe_log函数，我们看一下这个函数

"__afl_maybe_log:\n"
"\n"
"  lahf\n"
"  seto %al\n"
"\n"
"  /* Check if SHM region is already mapped. */\n"
"\n"
"  movl  __afl_area_ptr, %edx\n"
"  testl %edx, %edx\n"
"  je    __afl_setup\n"
"\n"

lahf 指令是将EFLAGS 寄存器数组的低字节拷贝到AH寄存器中。seto 指令则是如果出现溢出的话，那么就将al设置为1。接着判断了__afl_area_ptr 的值是否被设置了。如果没有设置的话，那么就会跳转到__afl_setup 部分去执行，我们看一下

__afl_setup

1 2	".AFL_SHM_ENV:\n" " .asciz \"" SHM_ENV_VAR "\"\n"

"__afl_setup:\n"
"\n"
"  /* Do not retry setup if we had previous failures. */\n"
"\n"
"  cmpb $0, __afl_setup_failure(%rip)\n"
"  jne __afl_return\n"
"\n"
"  /* Check out if we have a global pointer on file. */\n"
"\n"
#ifndef __APPLE__
"  movq  __afl_global_area_ptr@GOTPCREL(%rip), %rdx\n"
"  movq  (%rdx), %rdx\n"
#else
"  movq  __afl_global_area_ptr(%rip), %rdx\n"
#endif /* !^__APPLE__ */
"  testq %rdx, %rdx\n"
"  je    __afl_setup_first\n"
"\n"
"  movq %rdx, __afl_area_ptr(%rip)\n"
"  jmp  __afl_store\n" 
"\n"

这里AFL_SHM_ENV 是环境变量的字符串。这里如果__afl_area_ptr设置了，那么就跳转到__afl_store 部分去执行，如果没有设置，那么就会执行如下的代码

"__afl_setup_first:\n"
"\n"
"  /* Save everything that is not yet saved and that may be touched by\n"
"     getenv() and several other libcalls we'll be relying on. */\n"
"\n"
"  leaq -352(%rsp), %rsp\n"
"\n"
"  movq %rax,   0(%rsp)\n"
"  movq %rcx,   8(%rsp)\n"
"  movq %rdi,  16(%rsp)\n"
"  movq %rsi,  32(%rsp)\n"
"  movq %r8,   40(%rsp)\n"
"  movq %r9,   48(%rsp)\n"
"  movq %r10,  56(%rsp)\n"
"  movq %r11,  64(%rsp)\n"
"\n"
"  movq %xmm0,  96(%rsp)\n"
"  movq %xmm1,  112(%rsp)\n"
"  movq %xmm2,  128(%rsp)\n"
"  movq %xmm3,  144(%rsp)\n"
"  movq %xmm4,  160(%rsp)\n"
"  movq %xmm5,  176(%rsp)\n"
"  movq %xmm6,  192(%rsp)\n"
"  movq %xmm7,  208(%rsp)\n"
"  movq %xmm8,  224(%rsp)\n"
"  movq %xmm9,  240(%rsp)\n"
"  movq %xmm10, 256(%rsp)\n"
"  movq %xmm11, 272(%rsp)\n"
"  movq %xmm12, 288(%rsp)\n"
"  movq %xmm13, 304(%rsp)\n"
"  movq %xmm14, 320(%rsp)\n"
"  movq %xmm15, 336(%rsp)\n"
"\n"
"  /* Map SHM, jumping to __afl_setup_abort if something goes wrong. */\n"
"\n"
"  /* The 64-bit ABI requires 16-byte stack alignment. We'll keep the\n"
"     original stack ptr in the callee-saved r12. */\n"
"\n"
"  pushq %r12\n"
"  movq  %rsp, %r12\n"
"  subq  $16, %rsp\n"
"  andq  $0xfffffffffffffff0, %rsp\n"
"\n"
"  leaq .AFL_SHM_ENV(%rip), %rdi\n"
CALL_L64("getenv")
"\n"
"  testq %rax, %rax\n"
"  je    __afl_setup_abort\n"
"\n"
"  movq  %rax, %rdi\n"
CALL_L64("atoi")
"\n"
"  xorq %rdx, %rdx   /* shmat flags    */\n"
"  xorq %rsi, %rsi   /* requested addr */\n"
"  movq %rax, %rdi   /* SHM ID         */\n"
CALL_L64("shmat")
"\n"
"  cmpq $-1, %rax\n"
"  je   __afl_setup_abort\n"
"\n"
"  /* Store the address of the SHM region. */\n"
"\n"
"  movq %rax, %rdx\n"
"  movq %rax, __afl_area_ptr(%rip)\n"
"\n"
#ifdef __APPLE__
"  movq %rax, __afl_global_area_ptr(%rip)\n"
#else
"  movq __afl_global_area_ptr@GOTPCREL(%rip), %rdx\n"
"  movq %rax, (%rdx)\n"
#endif /* ^__APPLE__ */
"  movq %rax, %rdx\n"
"\n"

在保存了一些寄存器数组之后，这里即调用了getenv 函数来获取AFL_SHM_ENV 的值，这里我们通过前面的分析实际上这里AFL_SHM_ENV保存的是共享内存的id值。拿到id值之后这里即调用shmat 函数，将其映射到了本次的进程内存中。注意到这里的内存地址直接保存到了__afl_area_ptr 函数中。也就是这里__afl_area_ptr实际上表示的就是路径信息的内存空间。

接下来会启动fork_server。这里会调用fork函数

__afl_forkserver

这里父进程会一直陷入到循环中，通过pipe管道来接受消息。也就是前面和Fuzz进行通信的部分。

__afl_store

这里子进程则会开始记录信息的过程，我们看一下这一部分的代码

"__afl_store:\n"
"\n"
"  /* Calculate and store hit for the code location specified in ecx. There\n"
"     is a double-XOR way of doing this without tainting another register,\n"
"     and we use it on 64-bit systems; but it's slower for 32-bit ones. */\n"
"\n"
#ifndef COVERAGE_ONLY
"  movl __afl_prev_loc, %edi\n"
"  xorl %ecx, %edi\n"
"  shrl $1, %ecx\n"
"  movl %ecx, __afl_prev_loc\n"
#else
"  movl %ecx, %edi\n"
#endif /* ^!COVERAGE_ONLY */
"\n"
#ifdef SKIP_COUNTS
"  orb  $1, (%edx, %edi, 1)\n"
#else
"  incb (%edx, %edi, 1)\n"
#endif /* ^SKIP_COUNTS */
"\n"
"__afl_return:\n"
"\n"
"  addb $127, %al\n"
"  sahf\n"
"  ret\n"

这里我们看到路径的上一个基本块被保存在__afl_prev_loc 全局变量中，而当前基本块的ID则保存在ecx中，实际上这里的ecx的值是通过我们在插桩的时候传入进来的，如下

1 2	fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32, R(MAP_SIZE));

也就是这里传入了一个MAP_SIZE大小的随机数作为当前基本块的一个ID。这里我们就可以看到路径记录方面的内容。如果这里不对路径命中进行计数的话，那么这里只会将命中的路径的值设置为1。如果需要统计计数的话，那么这里则会直接对相应位置采取加的操作。

命中次数的计算

这里前面我们分析到，命中一组路径元组的话，其在trace_bits中相应位置的值就会增加，但是这并不表示最终的命中次数的结果。每一次run_target 函数执行完毕获取得到trace_bits之后都会进行一个处理，即调用classify_counts 函数

static inline void classify_counts(u64* mem) {

  u32 i = MAP_SIZE >> 3;

  while (i--) {

    /* Optimize for sparse bitmaps. */

    if (unlikely(*mem)) {

      u16* mem16 = (u16*)mem;

      mem16[0] = count_class_lookup16[mem16[0]];
      mem16[1] = count_class_lookup16[mem16[1]];
      mem16[2] = count_class_lookup16[mem16[2]];
      mem16[3] = count_class_lookup16[mem16[3]];

    }

    mem++;

  }

}

static u16 count_class_lookup16[65536];

EXP_ST void init_count_class16(void) {

  u32 b1, b2;

  for (b1 = 0; b1 < 256; b1++) 
    for (b2 = 0; b2 < 256; b2++)
      count_class_lookup16[(b1 << 8) + b2] = 
        (count_class_lookup8[b1] << 8) |
        count_class_lookup8[b2];

}

实际上这里和32位下面的处理是相同的。这里32位的count_class如下

static const u8 count_class_lookup8[256] = {

  [0]           = 0,
  [1]           = 1,
  [2]           = 2,
  [3]           = 4,
  [4 ... 7]     = 8,
  [8 ... 15]    = 16,
  [16 ... 31]   = 32,
  [32 ... 127]  = 64,
  [128 ... 255] = 128

};

也就是模糊了命中次数这个概念，而是用一个范围来表示命中次数。这个范围在AFL表示为一个桶。也就是命中次数从一个桶转换到另一个桶才会被认为命中次数发生了改变。

AFL总结

这里对于一些无关的参数处理或者说初始化我们跳过，只关注核心的部分

首先这里进行共享内存的初始化，通过调用shmget 分配了MAP_SIZE大小的共享内存，分别被AFL进程和目标进程映射到各自的地址空间中，作为记录路径信息的内存空间。这里记录路径信息的时候对于每个基本块都随机分配了一个MAP_SIZE中的随机值。而对于A→B这样的路径元组其对应的MAP中的位置为shared_mem[(A>>1)^B] 。

这里目标进程通过插桩的方式插入相应的路径记录信息的代码，以及Forserver的代码。
初始化完成之后，对于所有的测试用例会首先进行第一轮的Fuzz，也就是DryRun，第一次运行的目的是对测试用例进行校准，初始化所有测试用例的当前的路径信息，并依据这个路径信息执行优胜者策略。这里优胜者测试用例的选择是对于某一个特定的路径元组例如A→B，执行时间最短并且文件最小的则是当前路径元组的优胜者。但是并不是所有的优胜者都会被欢迎。AFL只会欢迎第一次看到的优胜者测试用例。
接下来就是主Fuzz循环，每次循环之前都会执行样本选择，也就是执行优胜者策略。之后按照相应的策略选择当前这一轮要执行Fuzz的测试用例样本，具体的策略如下
1. 如果队列中存在还没有被Fuzz的受欢迎的优胜者测试用例
  1. 当前的测试用例已经被Fuzz过了或者当前的测试用例不是受欢迎的优胜者测试用例，那么将会有99%的概率跳过这个测试用例
  也就是这里会以很大的概率去寻找还没有被Fuzz的受欢迎的优胜者测试用例
2. 如果队列中所有受欢迎的优胜者测试用例都被Fuzz了
  1. 如果当前的测试用例是受欢迎的优胜者，那么选择当前的测试用例
  2. 如果当前的测试用例不是受欢迎的优胜者并且测试用例的总数大于10
    1. 当前的测试用例没有被Fuzz过，75%的概率跳过这个测试用例
    2. 当前测试用例被Fuzz过，95%概率跳过这个测试用例
对当前测试用例执行剪枝操作，主要目的是减少测试用例文件的大小。具体的策略是首先选择一个大的步长，依次删除测试用例文件中相应的部分，观察路径信息是否发生变化，如果没有发生变化那么就将这一块内容从测试用例中删除。之后依次减少步长，直到达到最小步长，之后将更改之后的文件写入到磁盘文件，更新优胜者队列。更改之后的文件作为结下来要Fuzz的文件

5字节一下的文件不会执行剪枝操作。
之后对当前的这个测试用例进行打分，执行速度越快、发现路径的数量越多、最近发现新路径的以及发现的路径越深的则分数更高，在后续的随机变异中将会获取得到更多的执行轮数。
接下来就是变异的操作了。这里会涉及到6种类型的遍历。分别是bit翻转、算数加减、interesting数值运算、字典替换、随机变异、不同测试用例之间的拼接。这里我们要注意几个变异策略的作用
1. 第一次变异即步长为1，每次翻转1bit是用来探测固定结构的，也就是探测类似于Magic Number等。这里如果前几个字节的最后一个bit翻转之后所得到的路径信息相同，那么我们就可以认为这几个字节为一个Token，也就是固定字节。
2. 步长为1字节，每次翻转1字节则是用来初始化eff_map的，该map会指导后续的遍历过程。也就是对于翻转后路径信息不变的字节，那么在之后的变异过程中也不会再对这个字节进行变异。
3. INTERESTING VALUES实际上也是一种算数运算操作，不过这里的算数则是一些边界值
每次变异都会将变异的文件写入到磁盘文件中，之后执行目标程序。如果变异之后的样本产生了新的路径或者路径元组命中次数发生了变化，那么这里就会将变异之后的文件写入到测试用例文件夹中。如果没有发生hangs或者crash，那么就会将这个变异之后的测试用例加入到队列中。之后继续从第4步开始执行直至退出。