Libertus Chen-U
  1. 1 Libertus Chen-U
  2. 2 Flower Of Life 发热巫女
  3. 3 Last Surprise Lyn
  4. 4 Quiet Storm Lyn
  5. 5 The Night We Stood Lyn
  6. 6 Time Bomb Veela
  7. 7 Warcry mpi
  8. 8 Hypocrite Nush
  9. 9 Life Will Change Lyn
  10. 10 かかってこいよ NakamuraEmi
  11. 11 One Last You Jen Bird
2020-02-16 19:04:30

Node源码解析——BootstrapInternalLoaders

之前谈到了node在启动时会调用run方法,在其中有一个env准备的过程,这个过程中做了大量的bootstrap工作。但从总的层面可以分为两步,其一就是这篇文章谈到的内容——BootstrapInternalLoaders,在这一步主要完成了built-in模块和native模块的准备工作。下面就进入源码看细节吧。

Bootstrap入口

入口是位于node_main_instance.cc的createMainEnvironment方法。

std::unique_ptr<Environment> NodeMainInstance::CreateMainEnvironment(
    int* exit_code) {
  *exit_code = 0;  // Reset the exit code to 0

  HandleScope handle_scope(isolate_);

  // TODO(addaleax): This should load a real per-Isolate option, currently
  // this is still effectively per-process.
  if (isolate_data_->options()->track_heap_objects) {
    isolate_->GetHeapProfiler()->StartTrackingHeapObjects(true);
  }

  Local<Context> context;
  if (deserialize_mode_) {
    context =
        Context::FromSnapshot(isolate_, kNodeContextIndex).ToLocalChecked();
    InitializeContextRuntime(context);
    IsolateSettings s;
    SetIsolateErrorHandlers(isolate_, s);
  } else {
    context = NewContext(isolate_);
  }

  CHECK(!context.IsEmpty());
  Context::Scope context_scope(context);

  std::unique_ptr<Environment> env = std::make_unique<Environment>(
      isolate_data_.get(),
      context,
      args_,
      exec_args_,
      static_cast<Environment::Flags>(Environment::kIsMainThread |
                                      Environment::kOwnsProcessState |
                                      Environment::kOwnsInspector));
  env->InitializeLibuv(per_process::v8_is_profiling);
  env->InitializeDiagnostics();

  // TODO(joyeecheung): when we snapshot the bootstrapped context,
  // the inspector and diagnostics setup should after after deserialization.
#if HAVE_INSPECTOR
  *exit_code = env->InitializeInspector({});
#endif
  if (*exit_code != 0) {
    return env;
  }

  if (env->RunBootstrapping().IsEmpty()) {
    *exit_code = 1;
  }

  return env;
}

首先创建了V8的context,然后初始化了一个unique_ptr指针,通过该指针调用InitializeLibuv和InitializeDiagnostics初始化了用于事件循环的libuv以及用于诊断的Diagnostics,这里如果处于调试状态还会继续初始化Inspector。就绪后就进入位于node.cc的RunBootstrapping方法,正式开始Bootstrap流程。

MaybeLocal<Value> Environment::RunBootstrapping() {
  EscapableHandleScope scope(isolate_);

  CHECK(!has_run_bootstrapping_code());

  if (BootstrapInternalLoaders().IsEmpty()) {
    return MaybeLocal<Value>();
  }

  Local<Value> result;
  if (!BootstrapNode().ToLocal(&result)) {
    return MaybeLocal<Value>();
  }

  // Make sure that no request or handle is created during bootstrap -
  // if necessary those should be done in pre-execution.
  // Usually, doing so would trigger the checks present in the ReqWrap and
  // HandleWrap classes, so this is only a consistency check.
  CHECK(req_wrap_queue()->IsEmpty());
  CHECK(handle_wrap_queue()->IsEmpty());

  set_has_run_bootstrapping_code(true);

  return scope.Escape(result);
}

可以看到除了一些检查外,最重要的过程就是文章开头谈到的两步了:BootstrapInternalLoadersBootstrapNode。这篇文章的重点就是前者。

Create Binding Loaders

MaybeLocal<Value> Environment::BootstrapInternalLoaders() {
  EscapableHandleScope scope(isolate_);

  // Create binding loaders
  std::vector<Local<String>> loaders_params = {
      process_string(),
      FIXED_ONE_BYTE_STRING(isolate_, "getLinkedBinding"),
      FIXED_ONE_BYTE_STRING(isolate_, "getInternalBinding"),
      primordials_string()};
  std::vector<Local<Value>> loaders_args = {
      process_object(),
      NewFunctionTemplate(binding::GetLinkedBinding)
          ->GetFunction(context())
          .ToLocalChecked(),
      NewFunctionTemplate(binding::GetInternalBinding)
          ->GetFunction(context())
          .ToLocalChecked(),
      primordials()};

  // Bootstrap internal loaders
  Local<Value> loader_exports;
  if (!ExecuteBootstrapper(
           this, "internal/bootstrap/loaders", &loaders_params, &loaders_args)
           .ToLocal(&loader_exports)) {
    return MaybeLocal<Value>();
  }
  CHECK(loader_exports->IsObject());
  Local<Object> loader_exports_obj = loader_exports.As<Object>();
  Local<Value> internal_binding_loader =
      loader_exports_obj->Get(context(), internal_binding_string())
          .ToLocalChecked();
  CHECK(internal_binding_loader->IsFunction());
  set_internal_binding_loader(internal_binding_loader.As<Function>());
  Local<Value> require =
      loader_exports_obj->Get(context(), require_string()).ToLocalChecked();
  CHECK(require->IsFunction());
  set_native_module_require(require.As<Function>());

  return scope.Escape(loader_exports);
}

这里看注释又分成了两小步:Create binding loaders和Bootstrap internal loaders。先看前者,主要目的是准备两个汇总参数的对象:loader_params和loaders_args供第2步调用。两者的内容是相同的,区别在于params是字符串,args是真正的对象。这里面的具体参数内容又有如下4种:

  • process: node里的process对象
  • getLinkedBinding: V8里的functionTemplate类型,用来在js端获取c++模块
  • getInternalBinding:功能同getLinkedBinding
  • primorduals: JS常用的内置对象

这里需要重点关注getLinkedBinding和getInternalBinding方法。在下一步中,我们将创建native模块并传入这两个方法,通过这两个方法就可以实现在JS编写的native模块中调用到C++编写的built-in模块。不过在进入这两个函数的实现细节前,应该能猜到最终会去一个地方取到built-in模块,但built-in模块究竟存放在哪里?这就要回到node启动时最初的初始化步骤里看了——InitializeOncePerProcess,在这个函数中调用了InitializeNodeWithArgs。

int InitializeNodeWithArgs(std::vector<std::string>* argv,
                           std::vector<std::string>* exec_argv,
                           std::vector<std::string>* errors) {
  // Make sure InitializeNodeWithArgs() is called only once.
  CHECK(!init_called.exchange(true));

  // Initialize node_start_time to get relative uptime.
  per_process::node_start_time = uv_hrtime();

  // Register built-in modules
  binding::RegisterBuiltinModules();

  // Make inherited handles noninheritable.
  uv_disable_stdio_inheritance();
  // ...

找到了RegisterBuiltinModules,这个函数实现位于node_binding.cc

void RegisterBuiltinModules() {
#define V(modname) _register_##modname();
  NODE_BUILTIN_MODULES(V)
#undef V
}

这里用到了c++的宏NODE_BUILTIN_MODULES,追查这个宏。

#define NODE_BUILTIN_MODULES(V)                                                
  NODE_BUILTIN_STANDARD_MODULES(V)                                             
  NODE_BUILTIN_OPENSSL_MODULES(V)                                              
  NODE_BUILTIN_ICU_MODULES(V)                                                  
  NODE_BUILTIN_REPORT_MODULES(V)                                               
  NODE_BUILTIN_PROFILER_MODULES(V)                                             
  NODE_BUILTIN_DTRACE_MODULES(V)

继续细分为不同类别的built-in模块宏。经过c++宏处理后,RegisterBuiltinModule会如此调用注册built-in模块:

void RegisterBuiltinModules() {
  _register_async_wrap();
  _register_buffer();
  _register_cares_wrap();
  ....
}

那这些注册函数又定义在哪呢?看注释。


// This is used to load built-in modules. Instead of using
// __attribute__((constructor)), we call the _register_<modname>
// function for each built-in modules explicitly in
// binding::RegisterBuiltinModules(). This is only forward declaration.
// The definitions are in each module's implementation when calling
// the NODE_MODULE_CONTEXT_AWARE_INTERNAL.

那就随便找一个built-in模块文件验证一下,比如node_buffer.cc。直接拉到文件末尾。

NODE_MODULE_CONTEXT_AWARE_INTERNAL(buffer, node::Buffer::Initialize)

果然,那么下一步就是找这个宏的实现,在node_binding.h中。

#define NODE_MODULE_CONTEXT_AWARE_INTERNAL(modname, regfunc)                   
  NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, nullptr, NM_F_INTERNAL)

继续追查NODE_MODULE_CONTEXT_AWARE_CPP。

#define NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, priv, flags)           
  static node::node_module _module = {                                         
      NODE_MODULE_VERSION,                                                     
      flags,                                                                   
      nullptr,                                                                 
      __FILE__,                                                                
      nullptr,                                                                 
      (node::addon_context_register_func)(regfunc),                            
      NODE_STRINGIFY(modname),                                                 
      priv,                                                                    
      nullptr};                                                                
  void _register_##modname() { node_module_register(&_module); }

最终会调用到在node_binding.cc中的node_module_register这个方法。

extern "C" void node_module_register(void* m) {
  struct node_module* mp = reinterpret_cast<struct node_module*>(m);

  if (mp->nm_flags & NM_F_INTERNAL) {
    mp->nm_link = modlist_internal;
    modlist_internal = mp;
  } else if (!node_is_initialized) {
    // "Linked" modules are included as part of the node project.
    // Like builtins they are registered *before* node::Init runs.
    mp->nm_flags = NM_F_LINKED;
    mp->nm_link = modlist_linked;
    modlist_linked = mp;
  } else {
    thread_local_modpending = mp;
  }
}

根据注册模块为internal和linked的类型分别将其保存在modlist_internal和modlist_linked链表上。到这里终于追踪到了built-in模块的保存位置,相信你也察觉到了。这两条链表的保存正好对应之前绑定的getLinkedBinding和getInternalBinding。

那么下一步自然是到getInternalBinding中验证细节。

void GetInternalBinding(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  CHECK(args[0]->IsString());

  Local<String> module = args[0].As<String>();
  node::Utf8Value module_v(env->isolate(), module);
  Local<Object> exports;

  node_module* mod = FindModule(modlist_internal, *module_v, NM_F_INTERNAL);
   if (mod != nullptr) {
    exports = InitModule(env, mod, module);
  }
  // ...
}

就是一个find之后再init的过程。进入FindModule。

inline struct node_module* FindModule(struct node_module* list,
                                      const char* name,
                                      int flag) {
  struct node_module* mp;

  for (mp = list; mp != nullptr; mp = mp->nm_link) {
    if (strcmp(mp->nm_modname, name) == 0) break;
  }

  CHECK(mp == nullptr || (mp->nm_flags & flag) != 0);
  return mp;
}

果然就是在这条链表上根据模块名查询了。找到后看InitModule。

static Local<Object> InitModule(Environment* env,
                                node_module* mod,
                                Local<String> module) {
  Local<Object> exports = Object::New(env->isolate());
  // Internal bindings don't have a "module" object, only exports.
  CHECK_NULL(mod->nm_register_func);
  CHECK_NOT_NULL(mod->nm_context_register_func);
  Local<Value> unused = Undefined(env->isolate());
  mod->nm_context_register_func(exports, unused, env->context(), mod->nm_priv);
  return exports;
}

调用对应模块的nm_context_register_func,经过宏编译后实际上就对应到每个built-in模块的Initialize方法。还是以node_buffer.cc为例。

void Initialize(Local<Object> target,
                Local<Value> unused,
                Local<Context> context,
                void* priv) {
  Environment* env = Environment::GetCurrent(context);

  env->SetMethod(target, "setBufferPrototype", SetBufferPrototype);
  env->SetMethodNoSideEffect(target, "createFromString", CreateFromString);

  env->SetMethodNoSideEffect(target, "byteLengthUtf8", ByteLengthUtf8);
  env->SetMethod(target, "copy", Copy);
  env->SetMethodNoSideEffect(target, "compare", Compare);
  env->SetMethodNoSideEffect(target, "compareOffset", CompareOffset);
  env->SetMethod(target, "fill", Fill);
  // ...

在exports上定义了模块对应的相关模块方法并返回。到这一步,我们已经知道了built-in模块的调用手法了。下面看如何实现在JS编写的native模块中调用这些built-in模块。

Bootstrap internal loaders

参数准备好后,就调用了ExecuteBootstrapper执行internal/bootstrap/loaders这个文件。在进入到这个文件之前,先看ExecuteBootstrap是怎么执行这个文件的,因为我们传入的只是字符串,肯定有一个查询并编译的过程。

MaybeLocal<Value> ExecuteBootstrapper(Environment* env,
                                      const char* id,
                                      std::vector<Local<String>>* parameters,
                                      std::vector<Local<Value>>* arguments) {
  EscapableHandleScope scope(env->isolate());
  MaybeLocal<Function> maybe_fn =
      NativeModuleEnv::LookupAndCompile(env->context(), id, parameters, env);

  if (maybe_fn.IsEmpty()) {
    return MaybeLocal<Value>();
  }

  Local<Function> fn = maybe_fn.ToLocalChecked();
  MaybeLocal<Value> result = fn->Call(env->context(),
                                      Undefined(env->isolate()),
                                      arguments->size(),
                                      arguments->data());

  // If there was an error during bootstrap, it must be unrecoverable
  // (e.g. max call stack exceeded). Clear the stack so that the
  // AsyncCallbackScope destructor doesn't fail on the id check.
  // There are only two ways to have a stack size > 1: 1) the user manually
  // called MakeCallback or 2) user awaited during bootstrap, which triggered
  // _tickCallback().
  if (result.IsEmpty()) {
    env->async_hooks()->clear_async_id_stack();
  }

  return scope.EscapeMaybe(result);
}

果然,进入可以看到,是通过位于node_native_module.cc的LookupAndCompile这个函数将传入的文件字符串作为id查找并编译为maybe_fn,再通过Call执行。进入LookupAndCompile。

MaybeLocal<Function> NativeModuleLoader::LookupAndCompile(
    Local<Context> context,
    const char* id,
    std::vector<Local<String>>* parameters,
    NativeModuleLoader::Result* result) {
  Isolate* isolate = context->GetIsolate();
  EscapableHandleScope scope(isolate);

  Local<String> source;
  if (!LoadBuiltinModuleSource(isolate, id).ToLocal(&source)) {
    return {};
  }

  std::string filename_s = id + std::string(".js");
  Local<String> filename =
      OneByteString(isolate, filename_s.c_str(), filename_s.size());
  Local<Integer> line_offset = Integer::New(isolate, 0);
  Local<Integer> column_offset = Integer::New(isolate, 0);
  ScriptOrigin origin(filename, line_offset, column_offset, True(isolate));

  Mutex::ScopedLock lock(code_cache_mutex_);

  ScriptCompiler::CachedData* cached_data = nullptr;
  {
    auto cache_it = code_cache_.find(id);
    if (cache_it != code_cache_.end()) {
      // Transfer ownership to ScriptCompiler::Source later.
      cached_data = cache_it->second.release();
      code_cache_.erase(cache_it);
    }
  }

  const bool has_cache = cached_data != nullptr;
  ScriptCompiler::CompileOptions options =
      has_cache ? ScriptCompiler::kConsumeCodeCache
                : ScriptCompiler::kEagerCompile;
  ScriptCompiler::Source script_source(source, origin, cached_data);

  MaybeLocal<Function> maybe_fun =
      ScriptCompiler::CompileFunctionInContext(context,
                                               &script_source,
                                               parameters->size(),
                                               parameters->data(),
                                               0,
                                               nullptr,
                                               options);

  // This could fail when there are early errors in the native modules,
  // e.g. the syntax errors
  if (maybe_fun.IsEmpty()) {
    // In the case of early errors, v8 is already capable of
    // decorating the stack for us - note that we use CompileFunctionInContext
    // so there is no need to worry about wrappers.
    return MaybeLocal<Function>();
  }

  Local<Function> fun = maybe_fun.ToLocalChecked();
  // XXX(joyeecheung): this bookkeeping is not exactly accurate because
  // it only starts after the Environment is created, so the per_context.js
  // will never be in any of these two sets, but the two sets are only for
  // testing anyway.

  *result = (has_cache && !script_source.GetCachedData()->rejected)
                ? Result::kWithCache
                : Result::kWithoutCache;
  // Generate new cache for next compilation
  std::unique_ptr<ScriptCompiler::CachedData> new_cached_data(
      ScriptCompiler::CreateCodeCacheForFunction(fun));
  CHECK_NOT_NULL(new_cached_data);

  // The old entry should've been erased by now so we can just emplace
  code_cache_.emplace(id, std::move(new_cached_data));

  return scope.Escape(fun);
}

这个函数先调用了LoadBuiiltModuleSource从文件系统中获取该文件内容。

MaybeLocal<String> NativeModuleLoader::LoadBuiltinModuleSource(Isolate* isolate,
                                                               const char* id) {
#ifdef NODE_BUILTIN_MODULES_PATH
  std::string filename = OnDiskFileName(id);

  uv_fs_t req;
  uv_file file =
      uv_fs_open(nullptr, &req, filename.c_str(), O_RDONLY, 0, nullptr);
  CHECK_GE(req.result, 0);
  uv_fs_req_cleanup(&req);

  std::shared_ptr<void> defer_close(nullptr, [file](...) {
    uv_fs_t close_req;
    CHECK_EQ(0, uv_fs_close(nullptr, &close_req, file, nullptr));
    uv_fs_req_cleanup(&close_req);
  });

  std::string contents;
  char buffer[4096];
  uv_buf_t buf = uv_buf_init(buffer, sizeof(buffer));

  while (true) {
    const int r =
        uv_fs_read(nullptr, &req, file, &buf, 1, contents.length(), nullptr);
    CHECK_GE(req.result, 0);
    uv_fs_req_cleanup(&req);
    if (r <= 0) {
      break;
    }
    contents.append(buf.base, r);
  }

  return String::NewFromUtf8(
      isolate, contents.c_str(), v8::NewStringType::kNormal, contents.length());
#else
  const auto source_it = source_.find(id);
  CHECK_NE(source_it, source_.end());
  return source_it->second.ToStringChecked(isolate);
#endif  // NODE_BUILTIN_MODULES_PATH
}

获取文件内容用到了libuv的文件系统相关api。然后调用CompileFunctionInContext将文件内容和之前传入的含有getLinkedBinding等方法的参数对象包裹起来编译形成一个新的可执行函数,这个可执行函数就拥有了执行internal/bootstrap/loaders的能力,同时可在其中调用getLinkedBinding等C++函数。那么是时候进入这个loaders文件了。

实际上对于该文件的功能,在开头的注释说的很清楚。

// This file creates the internal module & binding loaders used by built-in
// modules. In contrast, user land modules are loaded using
// lib/internal/modules/cjs/loader.js (CommonJS Modules) or
// lib/internal/modules/esm/* (ES Modules).

internal module loader,就是JS写的native module加载器,并提供给其通过binding loaders调用built-in module的能力。这里需要注意到,用户通过CommonJS和ES引入的module走的是不同的机制,这块内容在下篇文章的bootstrap第二步,这里先不谈。关于在这个文件中创建的internal module也有注释介绍:

// Internal JavaScript module loader:
// - NativeModule: a minimal module system used to load the JavaScript core
//   modules found in lib/**/*.js and deps/**/*.js. All core modules are
//   compiled into the node binary via node_javascript.cc generated by js2c.py,
//   so they can be loaded faster without the cost of I/O. This class makes the
//   lib/internal/*, deps/internal/* modules and internalBinding() available by
//   default to core modules, and lets the core modules require itself via
//   require('internal/bootstrap/loaders') even when this file is not written in
//   CommonJS style.

知道了目的,再看具体内容。首先在process对象上定义了moduleLoadedList属性,用来表示已加载的module。

// Set up process.moduleLoadList.
const moduleLoadList = [];
ObjectDefineProperty(process, 'moduleLoadList', {
  value: moduleLoadList,
  configurable: true,
  enumerable: true,
  writable: false
});

然后定义了binding方法和_linkedBinding方法,分别使用getInternalBinding和getLinkedBinding来获取built-in模块。

  process.binding = function binding(module) {
    module = String(module);
    // Deprecated specific process.binding() modules, but not all, allow
    // selective fallback to internalBinding for the deprecated ones.
    if (internalBindingWhitelist.has(module)) {
      return internalBinding(module);
    }
    // eslint-disable-next-line no-restricted-syntax
    throw new Error(`No such module: ${module}`);
  };

  process._linkedBinding = function _linkedBinding(module) {
    module = String(module);
    let mod = bindingObj[module];
    if (typeof mod !== 'object')
      mod = bindingObj[module] = getLinkedBinding(module);
    return mod;
  };
}

然后设置了NativeModule类,也就是先前注释提到的 NativeModule了,用于加载被js2c.py编译进node_javascript.cc的JS模块。最后需要关注这个文件的返回值。

const loaderExports = {
  internalBinding,
  NativeModule,
  require: nativeModuleRequire
};

internalBinding和NativeModule扮演的角色都说清楚了,主要看这里的require怎么运作,找到nativeModuleRequire。

function nativeModuleRequire(id) {
  if (id === loaderId) {
    return loaderExports;
  }

  const mod = NativeModule.map.get(id);
  // Can't load the internal errors module from here, have to use a raw error.
  // eslint-disable-next-line no-restricted-syntax
  if (!mod) throw new TypeError(`Missing internal module '${id}'`);
  return mod.compileForInternalLoader();
}

根据require的Id找到对应模块的nativeModule实例,找到后调用其compileForInternalLoader方法。

  compileForInternalLoader() {
    if (this.loaded || this.loading) {
      return this.exports;
    }

    const id = this.id;
    this.loading = true;

    try {
      const requireFn = this.id.startsWith('internal/deps/') ?
        requireWithFallbackInDeps : nativeModuleRequire;

      const fn = compileFunction(id);
      fn(this.exports, requireFn, this, process, internalBinding, primordials);

      this.loaded = true;
    } finally {
      this.loading = false;
    }

    moduleLoadList.push(`NativeModule ${id}`);
    return this.exports;
  }
}

这个函数会通过native_module.cc里的compileFunction将JS模块包裹成接受6个参数的函数执行,并返回其exports。至此,我们终于实现了JS模块和c++模块打通。还是以buffer为例,打开lib/internal/buffer.js,里面调用的require函数和internalbinding函数是如何得到的也就很清楚了。

小结

这篇文章谈到了node中built-in模块和native模块的准备,当然关于模块准备的内容还没完,比如一段node代码执行时,还会有自定义的模块以及npm下载的三方模块。这些内容都在下一篇文章bootstrap的第二步中。

-- EOF --

添加在分类「 前端开发 」下,并被添加 「Node.js」 标签。