2020-02-21 23:46:44

Node源码解析——BootstrapNode

继续看Bootstrap第二步。如果第一步概括为内部模块初始化，这一步就是node环境初始化。包括process对象上的属性，一些全局对象或方法等，以及正式执行我们的JS代码并解析三方模块依赖。

BootstrapNode

回到node.cc找到BootstrapNode的实现。

MaybeLocal<Value> Environment::BootstrapNode() {
  EscapableHandleScope scope(isolate_);

  Local<Object> global = context()->Global();
  // TODO(joyeecheung): this can be done in JS land now.
  global->Set(context(), FIXED_ONE_BYTE_STRING(isolate_, "global"), global)
      .Check();

  // process, require, internalBinding, primordials
  std::vector<Local<String>> node_params = {
      process_string(),
      require_string(),
      internal_binding_string(),
      primordials_string()};
  std::vector<Local<Value>> node_args = {
      process_object(),
      native_module_require(),
      internal_binding_loader(),
      primordials()};

  MaybeLocal<Value> result = ExecuteBootstrapper(
      this, "internal/bootstrap/node", &node_params, &node_args);
    // ...
}

首先在context上设置了一个global的代理，通过该代理属性，我们就能用JS修改底层的global对象，也就是node中我们可以用的global对象了。

然后依然是准备ExecuteBootstrapper的参数。可以看到这里除了和上一步相同的参数: process和primordials外，有变化的是internal_binding和require，也就是上一步在internal/bootstrap/loaders准备好导出的internalBinding和nativeModuleRequire函数了。

接下来就执行internal/bootstrap/node。进去看细节，代码很多，不过可以先看注释总结的核心目的，我这里提取一些关键。

// This file is expected not to perform any asynchronous operations itself
// when being executed - those should be done in either
// `lib/internal/bootstrap/pre_execution.js` or in main scripts. The majority
// of the code here focus on setting up the global proxy and the process
// object in a synchronous manner.
// As special caution is given to the performance of the startup process,
// many dependencies are invoked lazily.
// This file is compiled as if it's wrapped in a function with arguments
// passed by node::RunBootstrapping()
/* global process, require, internalBinding */

只做同步操作的文件，聚焦于安装全局代理和process的同步行为，同时也提到了异步操作应该被执行的文件位置，以及该文件包装的函数接受到的参数，也是之前提到的：process, require, internalBinding。

那么具体安装了些什么呢，细节代码就不分析了，不是流程的重点，文件内注释都很清楚，感兴趣可以再细读。这里先概括为如下几条：

setupPrepareStackTrace
setupProcessObject
setupGlobalProxy
setupBuffer
Bootstrappers for all threads
Set up methods on the process object for all threads
credentials
Setup the callbacks that node::AsyncWrap will call when there are hooks to process
setupTaskQueue
setupTimers
Set the per-Environment callback that will be called when the TrackingTraceStateObserver updates trace state

至此两步bootstrap完成，env创建完毕，回到node_main_instrance.cc，得到env后就准备调用LoadEnvironment开始执行使用者的代码了。

LoadEnvironment

该函数定义在node.cc中内容很简单，就是启动主线程。

void LoadEnvironment(Environment* env) {
  CHECK(env->is_main_thread());
  // TODO(joyeecheung): Not all of the execution modes in
  // StartMainThreadExecution() make sense for embedders. Pick the
  // useful ones out, and allow embedders to customize the entry
  // point more directly without using _third_party_main.js
  USE(StartMainThreadExecution(env));
}

这个Start函数的实现中就是根据不同的用户输入决定用什么模式启动，并执行对应模式的启动文件。比如有inspect参数就会进入调试模式。

if (first_argv == "inspect" || first_argv == "debug") {
    return StartExecution(env, "internal/main/inspect");
}

我们看正常启动不带参数的默认模式。

if (!first_argv.empty() && first_argv != "-") {
  return StartExecution(env, "internal/main/run_main_module");
}

找到了internal/main/run_main_module，不过在进入之前，先看StartExecution怎么调用这个文件。

MaybeLocal<Value> StartExecution(Environment* env, const char* main_script_id) {
  EscapableHandleScope scope(env->isolate());
  CHECK_NOT_NULL(main_script_id);

  std::vector<Local<String>> parameters = {
      env->process_string(),
      env->require_string(),
      env->internal_binding_string(),
      env->primordials_string(),
      FIXED_ONE_BYTE_STRING(env->isolate(), "markBootstrapComplete")};

  std::vector<Local<Value>> arguments = {
      env->process_object(),
      env->native_module_require(),
      env->internal_binding_loader(),
      env->primordials(),
      env->NewFunctionTemplate(MarkBootstrapComplete)
          ->GetFunction(env->context())
          .ToLocalChecked()};

  return scope.EscapeMaybe(
      ExecuteBootstrapper(env, main_script_id, &parameters, &arguments));
}

见过很多次了的ExecuteBootstapper，传入的参数也没有什么变化，还是核心的process, require,internalbinding,primordials老四样，就是多了个markBootstrapComplete的标记。那么就进入run_main_module看怎么启动我们自己的JS。


'use strict';

const {
  prepareMainThreadExecution
} = require('internal/bootstrap/pre_execution');

prepareMainThreadExecution(true);

markBootstrapComplete();

// Note: this loads the module through the ESM loader if the module is
// determined to be an ES module. This hangs from the CJS module loader
// because we currently allow monkey-patching of the module loaders
// in the preloaded scripts through require('module').
// runMain here might be monkey-patched by users in --require.
// XXX: the monkey-patchability here should probably be deprecated.
require('internal/modules/cjs/loader').Module.runMain(process.argv[1]);

代码很少，不过给的信息很关键。总体分两步，先是调用pre_execution的prepareMainThreadExecution做Bootstrap的最后工作，并且调用完成后会用上一步多传的标记函数标记Bootstrap完成。

之后就通过internal/modules/cjs/loader这个文件里的Module的runMain方法正式的跑起来我们的JS了。process.argv[1]这个参数也就是对应我们node index.js这样启动的index.js。

特别注意这里一大段注释，module实现上采用了monkey-patch的形式同时支持ES module和CJS Module，同时也支持用户通过--require参数预加载脚本monkey-patch其他模块实现。那么可以预见到，这个预加载脚本的实现一定就在第一步的pre_execution里。进到里面看细节：

function prepareMainThreadExecution(expandArgv1 = false) {
  // Patch the process object with legacy properties and normalizations
  patchProcessObject(expandArgv1);
  setupTraceCategoryState();
  setupInspectorHooks();
  setupWarningHandler();

  // Resolve the coverage directory to an absolute path, and
  // overwrite process.env so that the original path gets passed
  // to child processes even when they switch cwd.
  if (process.env.NODE_V8_COVERAGE) {
    process.env.NODE_V8_COVERAGE =
      setupCoverageHooks(process.env.NODE_V8_COVERAGE);
  }

  // If source-map support has been enabled, we substitute in a new
  // prepareStackTrace method, replacing the default in errors.js.
  if (getOptionValue('--enable-source-maps')) {
    const { prepareStackTrace } =
      require('internal/source_map/prepare_stack_trace');
    const { setPrepareStackTraceCallback } = internalBinding('errors');
    setPrepareStackTraceCallback(prepareStackTrace);
  }

  setupDebugEnv();

  // Print stack trace on `SIGINT` if option `--trace-sigint` presents.
  setupStacktracePrinterOnSigint();

  // Process initial diagnostic reporting configuration, if present.
  initializeReport();
  initializeReportSignalHandlers();  // Main-thread-only.

  initializeHeapSnapshotSignalHandlers();

  // If the process is spawned with env NODE_CHANNEL_FD, it's probably
  // spawned by our child_process module, then initialize IPC.
  // This attaches some internal event listeners and creates:
  // process.send(), process.channel, process.connected,
  // process.disconnect().
  setupChildProcessIpcChannel();

  // Load policy from disk and parse it.
  initializePolicy();

  // If this is a worker in cluster mode, start up the communication
  // channel. This needs to be done before any user code gets executed
  // (including preload modules).
  initializeClusterIPC();

  initializeDeprecations();
  initializeWASI();
  initializeCJSLoader();
  initializeESMLoader();

  const CJSLoader = require('internal/modules/cjs/loader');
  assert(!CJSLoader.hasLoadedAnyUserCJSModule);
  loadPreloadModules();
  initializeFrozenIntrinsics();
}

同样是大量的初始化工作，这里先关注和启动流程关系紧密的部分——initializeCJSLoader。

function initializeCJSLoader() {
  const CJSLoader = require('internal/modules/cjs/loader');
  CJSLoader.Module._initPaths();
  // TODO(joyeecheung): deprecate this in favor of a proper hook?
  CJSLoader.Module.runMain =
    require('internal/modules/run_main').executeUserEntryPoint;
}

这里设置了CJSLoader的Module的runMain方法为internal/modules/run_main的executeUserEntryPoint，也就是我们会在run_main_module最后调用的方法。不过在进入之前，还要注意一下之前注释提到的--require参数处理。这个就是紧接着的loadPreloadModules方法。

function loadPreloadModules() {
  // For user code, we preload modules if `-r` is passed
  const preloadModules = getOptionValue('--require');
  if (preloadModules && preloadModules.length > 0) {
    const {
      Module: {
        _preloadModules
      },
    } = require('internal/modules/cjs/loader');
    _preloadModules(preloadModules);
  }
}

这里会用CJSLoader的_preloadModules去执行我们传入的脚本，正好接下来的内容都指向了这个loader，那么就进去看一看。

文件内容很多，先不细看，只需要知道这里主要有一个Module构造函数，并且上面有一些处理Module的内部方法。我们从上面涉及到的两个方法看起，先看其执行预加载脚本的_preloadModules。

Module._preloadModules = function(requests) {
  if (!ArrayIsArray(requests))
    return;

  // Preloaded modules have a dummy parent module which is deemed to exist
  // in the current working directory. This seeds the search path for
  // preloaded modules.
  const parent = new Module('internal/preload', null);
  try {
    parent.paths = Module._nodeModulePaths(process.cwd());
  } catch (e) {
    if (e.code !== 'ENOENT') {
      throw e;
    }
  }
  for (let n = 0; n < requests.length; n++)
    parent.require(requests[n]);
};

这里主要做的事是为了执行预脚本，先用之前的Module构造函数构造一个假模块，然后通过require的形式去查找并调用。这里的模块和require其实就是大家都很熟悉的模块引用的module和require了。为什么要造一个假模块呢？注释也说的很清楚，为了给预加载脚本里的模块提供路径索引。这里可以理解为运行时我们构建了一个名为preload的模块并和其他模块比如fs,stream等保持同一层级。在执行我们的文件前，先引用了这个模块，并在这个模块里require了预脚本。如果预脚本再引用了模块，就会以该文件的位置去定位查找。

接下来就看executeUserEntryPoint方法，我们的文件会传入该方法中执行。

// For backwards compatibility, we have to run a bunch of
// monkey-patchable code that belongs to the CJS loader (exposed by
// `require('module')`) even when the entry point is ESM.
function executeUserEntryPoint(main = process.argv[1]) {
  const resolvedMain = resolveMainPath(main);
  const useESMLoader = shouldUseESMLoader(resolvedMain);
  if (useESMLoader) {
    runMainESM(resolvedMain || main);
  } else {
    // Module._load is the monkey-patchable CJS module loader.
    Module._load(main, null, true);
  }
}

function shouldUseESMLoader(mainPath) {
  const userLoader = getOptionValue('--experimental-loader');
  if (userLoader)
    return true;
  const esModuleSpecifierResolution =
    getOptionValue('--es-module-specifier-resolution');
  if (esModuleSpecifierResolution === 'node')
    return true;
  // Determine the module format of the main
  if (mainPath && mainPath.endsWith('.mjs'))
    return true;
  if (!mainPath || mainPath.endsWith('.cjs'))
    return false;
  const pkg = readPackageScope(mainPath);
  return pkg && pkg.data.type === 'module';
}

找到文件路径并判断其模块类型，判断方法主要靠文件后缀或用户传入选项。我们直接看普通的CJS执行，会调用Module的_load方法。内容很长，我挑重点步骤说明。

  const filename = Module._resolveFilename(request, parent, isMain);

  const cachedModule = Module._cache[filename];
  if (cachedModule !== undefined) {
    updateChildren(parent, cachedModule, true);
    if (!cachedModule.loaded)
      return getExportsForCircularRequire(cachedModule);
    return cachedModule.exports;
  }
  const mod = loadNativeModule(filename, request);
  if (mod && mod.canBeRequiredByUsers) return mod.exports;

// lib/internal/modules/cjs/helpers.js
const { NativeModule } = require('internal/bootstrap/loaders');

function loadNativeModule(filename, request) {
  const mod = NativeModule.map.get(filename);
  if (mod) {
    debug('load native module %s', request);
    mod.compileForPublicLoader();
    return mod;
  }
}

当有父模块时会由父模块文件名得到当前文件名，我们这是第一个文件显然没有父模块就直接解析文件名，再判断缓存模块是否命中。没命中则进入下一阶段，判断其是否为nativeModule，是能被required的nativeModule就返回其导出。nativeModule就是之前谈到的bootstrap第一步导出的了。

  const module = new Module(filename, parent);

  if (isMain) {
    process.mainModule = module;
    module.id = '.';
  }

  Module._cache[filename] = module;
  if (parent !== undefined) {
    relativeResolveCache[relResolveCacheIdentifier] = filename;
  }

  let threw = true;
  try {
    // Intercept exceptions that occur during the first tick and rekey them
    // on error instance rather than module instance (which will immediately be
    // garbage collected).
    if (enableSourceMaps) {
      try {
        module.load(filename);
      } catch (err) {
        rekeySourceMap(Module._cache[filename], err);
        throw err; /* node-do-not-add-exception-line */
      }
    } else {
      module.load(filename);
    }
    threw = false;
  } finally {
    if (threw) {
      delete Module._cache[filename];
      if (parent !== undefined) {
        delete relativeResolveCache[relResolveCacheIdentifier];
      }
    } else if (module.exports &&
               ObjectGetPrototypeOf(module.exports) ===
                 CircularRequirePrototypeWarningProxy) {
      ObjectSetPrototypeOf(module.exports, PublicObjectPrototype);
    }
  }

  return module.exports;

如果不是nativeModule那就是来自三方模块或者我们自己的文件了，就用Module构造函数创建一个Module实例，并调用其load方法。

Module.prototype.load = function(filename) {
  debug('load %j for module %j', filename, this.id);

  assert(!this.loaded);
  this.filename = filename;
  this.paths = Module._nodeModulePaths(path.dirname(filename));

  const extension = findLongestRegisteredExtension(filename);
  // allow .mjs to be overridden
  if (filename.endsWith('.mjs') && !Module._extensions['.mjs']) {
    throw new ERR_REQUIRE_ESM(filename);
  }
  Module._extensions[extension](this, filename);
  this.loaded = true;

  const ESMLoader = asyncESM.ESMLoader;
  const url = `${pathToFileURL(filename)}`;
  const module = ESMLoader.moduleMap.get(url);
  // Create module entry at load time to snapshot exports correctly
  const exports = this.exports;
  // Called from cjs translator
  if (module !== undefined && module.module !== undefined) {
    if (module.module.getStatus() >= kInstantiated)
      module.module.setExport('default', exports);
  } else {
    // Preemptively cache
    // We use a function to defer promise creation for async hooks.
    ESMLoader.moduleMap.set(
      url,
      // Module job creation will start promises.
      // We make it a function to lazily trigger those promises
      // for async hooks compatibility.
      () => new ModuleJob(ESMLoader, url, () =>
        new ModuleWrap(url, undefined, ['default'], function() {
          this.setExport('default', exports);
        })
      , false /* isMain */, false /* inspectBrk */)
    );
  }
};

这里会根据不同的文件拓展名调用Module._extension方法。同时注意后续有关于ES模块的异步处理。直接看JS拓展名的调用。

Module._extensions['.js'] = function(module, filename) {
  if (filename.endsWith('.js')) {
    const pkg = readPackageScope(filename);
    // Function require shouldn't be used in ES modules.
    if (pkg && pkg.data && pkg.data.type === 'module') {
      const parentPath = module.parent && module.parent.filename;
      const packageJsonPath = path.resolve(pkg.path, 'package.json');
      throw new ERR_REQUIRE_ESM(filename, parentPath, packageJsonPath);
    }
  }
  const content = fs.readFileSync(filename, 'utf8');
  module._compile(content, filename);
};

同样有一段关于在ES模块中调用require的错误处理。主要还是通过fs模块读取文件内容并调用_compile编译。

     // ...
    const compiledWrapper = wrapSafe(filename, content, this);
    // ...
    const dirname = path.dirname(filename);
  const require = makeRequireFunction(this, redirects);
  let result;
  const exports = this.exports;
  const thisValue = exports;
  const module = this;
  if (requireDepth === 0) statCache = new Map();
  if (inspectorWrapper) {
    result = inspectorWrapper(compiledWrapper, thisValue, exports,
                              require, module, filename, dirname);
  } else {
    result = compiledWrapper.call(thisValue, exports, require, module,
                                  filename, dirname);
  }
// ...

省略细节看关键。会通过wrapSafe将文件内容包装一层成新函数，并传入了6个可用参数供文件内部调用。也就是我们熟悉的module.exports，exports，require等。先看wrapsafe是如何包装的，其实应该也能猜到，这里和之前bootstrap执行器的编译并绑定internalBinding等参数应该类似，还是通过V8操作。

const { compileFunction } = internalBinding('contextify');

function wrapSafe(filename, content, cjsModuleInstance) {
  if (patched) {
    const wrapper = Module.wrap(content);
    return vm.runInThisContext(wrapper, {
      filename,
      lineOffset: 0,
      displayErrors: true,
      importModuleDynamically: async (specifier) => {
        const loader = asyncESM.ESMLoader;
        return loader.import(specifier, normalizeReferrerURL(filename));
      },
    });
  }
  let compiled;
  try {
    compiled = compileFunction(
      content,
      filename,
      0,
      0,
      undefined,
      false,
      undefined,
      [],
      [
        'exports',
        'require',
        'module',
        '__filename',
        '__dirname',
      ]
    );
  } catch (err) {
    if (process.mainModule === cjsModuleInstance)
      enrichCJSError(err);
    throw err;
  }

  const { callbackMap } = internalBinding('module_wrap');
  callbackMap.set(compiled.cacheKey, {
    importModuleDynamically: async (specifier) => {
      const loader = asyncESM.ESMLoader;
      return loader.import(specifier, normalizeReferrerURL(filename));
    }
  });

  return compiled.function;
}

如果已经包装过，则通过vm模块运行，否则通过c++写的contextify模块中的compileFunction编译。

  MaybeLocal<Function> maybe_fn = ScriptCompiler::CompileFunctionInContext(
      parsing_context, &source, params.size(), params.data(),
      context_extensions.size(), context_extensions.data(), options,
      v8::ScriptCompiler::NoCacheReason::kNoCacheNoReason, &script)

并通过module_wrap包装，调用后则返回该文件的exports，loadEnvironment流程结束，开启eventloop。

最后再看一下makeRequireFunction包装的最终我们使用到的require，在lib/internal/modules/cjs/helpers.js中。

function makeRequireFunction(mod, redirects) {
  const Module = mod.constructor;

  let require;
  if (redirects) {
    const { resolve, reaction } = redirects;
    const id = mod.filename || mod.id;
    require = function require(path) {
      let missing = true;
      const destination = resolve(path);
      if (destination === true) {
        missing = false;
      } else if (destination) {
        const href = destination.href;
        if (destination.protocol === 'node:') {
          const specifier = destination.pathname;
          const mod = loadNativeModule(specifier, href);
          if (mod && mod.canBeRequiredByUsers) {
            return mod.exports;
          }
          throw new ERR_UNKNOWN_BUILTIN_MODULE(specifier);
        } else if (destination.protocol === 'file:') {
          let filepath;
          if (urlToFileCache.has(href)) {
            filepath = urlToFileCache.get(href);
          } else {
            filepath = fileURLToPath(destination);
            urlToFileCache.set(href, filepath);
          }
          return mod.require(filepath);
        }
      }
      if (missing) {
        reaction(new ERR_MANIFEST_DEPENDENCY_MISSING(id, path));
      }
      return mod.require(path);
    };
  } else {
    require = function require(path) {
      return mod.require(path);
    };
  }

这里如果有自定义路径参数就走自定义，否则会调用Module实例上的require方法。

// Loads a module at the given file path. Returns that module's
// `exports` property.
Module.prototype.require = function(id) {
  validateString(id, 'id');
  if (id === '') {
    throw new ERR_INVALID_ARG_VALUE('id', id,
                                    'must be a non-empty string');
  }
  requireDepth++;
  try {
    return Module._load(id, this, /* isMain */ false);
  } finally {
    requireDepth--;
  }
};

基于_load的简单封装，后面就和启动文件的流程无异了。

-- EOF --

添加在分类「 前端开发 」下，并被添加 「Node.js」 标签。