Node源码解析——BootstrapInternalLoaders
之前谈到了node在启动时会调用run方法,在其中有一个env准备的过程,这个过程中做了大量的bootstrap工作。但从总的层面可以分为两步,其一就是这篇文章谈到的内容——BootstrapInternalLoaders,在这一步主要完成了built-in模块和native模块的准备工作。下面就进入源码看细节吧。
Bootstrap入口
入口是位于node_main_instance.cc
的createMainEnvironment方法。
std::unique_ptr<Environment> NodeMainInstance::CreateMainEnvironment(
int* exit_code) {
*exit_code = 0; // Reset the exit code to 0
HandleScope handle_scope(isolate_);
// TODO(addaleax): This should load a real per-Isolate option, currently
// this is still effectively per-process.
if (isolate_data_->options()->track_heap_objects) {
isolate_->GetHeapProfiler()->StartTrackingHeapObjects(true);
}
Local<Context> context;
if (deserialize_mode_) {
context =
Context::FromSnapshot(isolate_, kNodeContextIndex).ToLocalChecked();
InitializeContextRuntime(context);
IsolateSettings s;
SetIsolateErrorHandlers(isolate_, s);
} else {
context = NewContext(isolate_);
}
CHECK(!context.IsEmpty());
Context::Scope context_scope(context);
std::unique_ptr<Environment> env = std::make_unique<Environment>(
isolate_data_.get(),
context,
args_,
exec_args_,
static_cast<Environment::Flags>(Environment::kIsMainThread |
Environment::kOwnsProcessState |
Environment::kOwnsInspector));
env->InitializeLibuv(per_process::v8_is_profiling);
env->InitializeDiagnostics();
// TODO(joyeecheung): when we snapshot the bootstrapped context,
// the inspector and diagnostics setup should after after deserialization.
#if HAVE_INSPECTOR
*exit_code = env->InitializeInspector({});
#endif
if (*exit_code != 0) {
return env;
}
if (env->RunBootstrapping().IsEmpty()) {
*exit_code = 1;
}
return env;
}
首先创建了V8的context,然后初始化了一个unique_ptr指针,通过该指针调用InitializeLibuv和InitializeDiagnostics初始化了用于事件循环的libuv以及用于诊断的Diagnostics,这里如果处于调试状态还会继续初始化Inspector。就绪后就进入位于node.cc
的RunBootstrapping方法,正式开始Bootstrap流程。
MaybeLocal<Value> Environment::RunBootstrapping() {
EscapableHandleScope scope(isolate_);
CHECK(!has_run_bootstrapping_code());
if (BootstrapInternalLoaders().IsEmpty()) {
return MaybeLocal<Value>();
}
Local<Value> result;
if (!BootstrapNode().ToLocal(&result)) {
return MaybeLocal<Value>();
}
// Make sure that no request or handle is created during bootstrap -
// if necessary those should be done in pre-execution.
// Usually, doing so would trigger the checks present in the ReqWrap and
// HandleWrap classes, so this is only a consistency check.
CHECK(req_wrap_queue()->IsEmpty());
CHECK(handle_wrap_queue()->IsEmpty());
set_has_run_bootstrapping_code(true);
return scope.Escape(result);
}
可以看到除了一些检查外,最重要的过程就是文章开头谈到的两步了:BootstrapInternalLoaders
和BootstrapNode
。这篇文章的重点就是前者。
Create Binding Loaders
MaybeLocal<Value> Environment::BootstrapInternalLoaders() {
EscapableHandleScope scope(isolate_);
// Create binding loaders
std::vector<Local<String>> loaders_params = {
process_string(),
FIXED_ONE_BYTE_STRING(isolate_, "getLinkedBinding"),
FIXED_ONE_BYTE_STRING(isolate_, "getInternalBinding"),
primordials_string()};
std::vector<Local<Value>> loaders_args = {
process_object(),
NewFunctionTemplate(binding::GetLinkedBinding)
->GetFunction(context())
.ToLocalChecked(),
NewFunctionTemplate(binding::GetInternalBinding)
->GetFunction(context())
.ToLocalChecked(),
primordials()};
// Bootstrap internal loaders
Local<Value> loader_exports;
if (!ExecuteBootstrapper(
this, "internal/bootstrap/loaders", &loaders_params, &loaders_args)
.ToLocal(&loader_exports)) {
return MaybeLocal<Value>();
}
CHECK(loader_exports->IsObject());
Local<Object> loader_exports_obj = loader_exports.As<Object>();
Local<Value> internal_binding_loader =
loader_exports_obj->Get(context(), internal_binding_string())
.ToLocalChecked();
CHECK(internal_binding_loader->IsFunction());
set_internal_binding_loader(internal_binding_loader.As<Function>());
Local<Value> require =
loader_exports_obj->Get(context(), require_string()).ToLocalChecked();
CHECK(require->IsFunction());
set_native_module_require(require.As<Function>());
return scope.Escape(loader_exports);
}
这里看注释又分成了两小步:Create binding loaders和Bootstrap internal loaders。先看前者,主要目的是准备两个汇总参数的对象:loader_params和loaders_args供第2步调用。两者的内容是相同的,区别在于params是字符串,args是真正的对象。这里面的具体参数内容又有如下4种:
- process: node里的process对象
- getLinkedBinding: V8里的functionTemplate类型,用来在js端获取c++模块
- getInternalBinding:功能同getLinkedBinding
- primorduals: JS常用的内置对象
这里需要重点关注getLinkedBinding和getInternalBinding方法。在下一步中,我们将创建native模块并传入这两个方法,通过这两个方法就可以实现在JS编写的native模块中调用到C++编写的built-in模块。不过在进入这两个函数的实现细节前,应该能猜到最终会去一个地方取到built-in模块,但built-in模块究竟存放在哪里?这就要回到node启动时最初的初始化步骤里看了——InitializeOncePerProcess,在这个函数中调用了InitializeNodeWithArgs。
int InitializeNodeWithArgs(std::vector<std::string>* argv,
std::vector<std::string>* exec_argv,
std::vector<std::string>* errors) {
// Make sure InitializeNodeWithArgs() is called only once.
CHECK(!init_called.exchange(true));
// Initialize node_start_time to get relative uptime.
per_process::node_start_time = uv_hrtime();
// Register built-in modules
binding::RegisterBuiltinModules();
// Make inherited handles noninheritable.
uv_disable_stdio_inheritance();
// ...
找到了RegisterBuiltinModules,这个函数实现位于node_binding.cc
。
void RegisterBuiltinModules() {
#define V(modname) _register_##modname();
NODE_BUILTIN_MODULES(V)
#undef V
}
这里用到了c++的宏NODE_BUILTIN_MODULES,追查这个宏。
#define NODE_BUILTIN_MODULES(V)
NODE_BUILTIN_STANDARD_MODULES(V)
NODE_BUILTIN_OPENSSL_MODULES(V)
NODE_BUILTIN_ICU_MODULES(V)
NODE_BUILTIN_REPORT_MODULES(V)
NODE_BUILTIN_PROFILER_MODULES(V)
NODE_BUILTIN_DTRACE_MODULES(V)
继续细分为不同类别的built-in模块宏。经过c++宏处理后,RegisterBuiltinModule会如此调用注册built-in模块:
void RegisterBuiltinModules() {
_register_async_wrap();
_register_buffer();
_register_cares_wrap();
....
}
那这些注册函数又定义在哪呢?看注释。
// This is used to load built-in modules. Instead of using
// __attribute__((constructor)), we call the _register_<modname>
// function for each built-in modules explicitly in
// binding::RegisterBuiltinModules(). This is only forward declaration.
// The definitions are in each module's implementation when calling
// the NODE_MODULE_CONTEXT_AWARE_INTERNAL.
那就随便找一个built-in模块文件验证一下,比如node_buffer.cc
。直接拉到文件末尾。
NODE_MODULE_CONTEXT_AWARE_INTERNAL(buffer, node::Buffer::Initialize)
果然,那么下一步就是找这个宏的实现,在node_binding.h
中。
#define NODE_MODULE_CONTEXT_AWARE_INTERNAL(modname, regfunc)
NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, nullptr, NM_F_INTERNAL)
继续追查NODE_MODULE_CONTEXT_AWARE_CPP。
#define NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, priv, flags)
static node::node_module _module = {
NODE_MODULE_VERSION,
flags,
nullptr,
__FILE__,
nullptr,
(node::addon_context_register_func)(regfunc),
NODE_STRINGIFY(modname),
priv,
nullptr};
void _register_##modname() { node_module_register(&_module); }
最终会调用到在node_binding.cc
中的node_module_register这个方法。
extern "C" void node_module_register(void* m) {
struct node_module* mp = reinterpret_cast<struct node_module*>(m);
if (mp->nm_flags & NM_F_INTERNAL) {
mp->nm_link = modlist_internal;
modlist_internal = mp;
} else if (!node_is_initialized) {
// "Linked" modules are included as part of the node project.
// Like builtins they are registered *before* node::Init runs.
mp->nm_flags = NM_F_LINKED;
mp->nm_link = modlist_linked;
modlist_linked = mp;
} else {
thread_local_modpending = mp;
}
}
根据注册模块为internal和linked的类型分别将其保存在modlist_internal和modlist_linked链表上。到这里终于追踪到了built-in模块的保存位置,相信你也察觉到了。这两条链表的保存正好对应之前绑定的getLinkedBinding和getInternalBinding。
那么下一步自然是到getInternalBinding中验证细节。
void GetInternalBinding(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);
CHECK(args[0]->IsString());
Local<String> module = args[0].As<String>();
node::Utf8Value module_v(env->isolate(), module);
Local<Object> exports;
node_module* mod = FindModule(modlist_internal, *module_v, NM_F_INTERNAL);
if (mod != nullptr) {
exports = InitModule(env, mod, module);
}
// ...
}
就是一个find之后再init的过程。进入FindModule。
inline struct node_module* FindModule(struct node_module* list,
const char* name,
int flag) {
struct node_module* mp;
for (mp = list; mp != nullptr; mp = mp->nm_link) {
if (strcmp(mp->nm_modname, name) == 0) break;
}
CHECK(mp == nullptr || (mp->nm_flags & flag) != 0);
return mp;
}
果然就是在这条链表上根据模块名查询了。找到后看InitModule。
static Local<Object> InitModule(Environment* env,
node_module* mod,
Local<String> module) {
Local<Object> exports = Object::New(env->isolate());
// Internal bindings don't have a "module" object, only exports.
CHECK_NULL(mod->nm_register_func);
CHECK_NOT_NULL(mod->nm_context_register_func);
Local<Value> unused = Undefined(env->isolate());
mod->nm_context_register_func(exports, unused, env->context(), mod->nm_priv);
return exports;
}
调用对应模块的nm_context_register_func,经过宏编译后实际上就对应到每个built-in模块的Initialize方法。还是以node_buffer.cc
为例。
void Initialize(Local<Object> target,
Local<Value> unused,
Local<Context> context,
void* priv) {
Environment* env = Environment::GetCurrent(context);
env->SetMethod(target, "setBufferPrototype", SetBufferPrototype);
env->SetMethodNoSideEffect(target, "createFromString", CreateFromString);
env->SetMethodNoSideEffect(target, "byteLengthUtf8", ByteLengthUtf8);
env->SetMethod(target, "copy", Copy);
env->SetMethodNoSideEffect(target, "compare", Compare);
env->SetMethodNoSideEffect(target, "compareOffset", CompareOffset);
env->SetMethod(target, "fill", Fill);
// ...
在exports上定义了模块对应的相关模块方法并返回。到这一步,我们已经知道了built-in模块的调用手法了。下面看如何实现在JS编写的native模块中调用这些built-in模块。
Bootstrap internal loaders
参数准备好后,就调用了ExecuteBootstrapper执行internal/bootstrap/loaders这个文件。在进入到这个文件之前,先看ExecuteBootstrap是怎么执行这个文件的,因为我们传入的只是字符串,肯定有一个查询并编译的过程。
MaybeLocal<Value> ExecuteBootstrapper(Environment* env,
const char* id,
std::vector<Local<String>>* parameters,
std::vector<Local<Value>>* arguments) {
EscapableHandleScope scope(env->isolate());
MaybeLocal<Function> maybe_fn =
NativeModuleEnv::LookupAndCompile(env->context(), id, parameters, env);
if (maybe_fn.IsEmpty()) {
return MaybeLocal<Value>();
}
Local<Function> fn = maybe_fn.ToLocalChecked();
MaybeLocal<Value> result = fn->Call(env->context(),
Undefined(env->isolate()),
arguments->size(),
arguments->data());
// If there was an error during bootstrap, it must be unrecoverable
// (e.g. max call stack exceeded). Clear the stack so that the
// AsyncCallbackScope destructor doesn't fail on the id check.
// There are only two ways to have a stack size > 1: 1) the user manually
// called MakeCallback or 2) user awaited during bootstrap, which triggered
// _tickCallback().
if (result.IsEmpty()) {
env->async_hooks()->clear_async_id_stack();
}
return scope.EscapeMaybe(result);
}
果然,进入可以看到,是通过位于node_native_module.cc
的LookupAndCompile这个函数将传入的文件字符串作为id查找并编译为maybe_fn,再通过Call执行。进入LookupAndCompile。
MaybeLocal<Function> NativeModuleLoader::LookupAndCompile(
Local<Context> context,
const char* id,
std::vector<Local<String>>* parameters,
NativeModuleLoader::Result* result) {
Isolate* isolate = context->GetIsolate();
EscapableHandleScope scope(isolate);
Local<String> source;
if (!LoadBuiltinModuleSource(isolate, id).ToLocal(&source)) {
return {};
}
std::string filename_s = id + std::string(".js");
Local<String> filename =
OneByteString(isolate, filename_s.c_str(), filename_s.size());
Local<Integer> line_offset = Integer::New(isolate, 0);
Local<Integer> column_offset = Integer::New(isolate, 0);
ScriptOrigin origin(filename, line_offset, column_offset, True(isolate));
Mutex::ScopedLock lock(code_cache_mutex_);
ScriptCompiler::CachedData* cached_data = nullptr;
{
auto cache_it = code_cache_.find(id);
if (cache_it != code_cache_.end()) {
// Transfer ownership to ScriptCompiler::Source later.
cached_data = cache_it->second.release();
code_cache_.erase(cache_it);
}
}
const bool has_cache = cached_data != nullptr;
ScriptCompiler::CompileOptions options =
has_cache ? ScriptCompiler::kConsumeCodeCache
: ScriptCompiler::kEagerCompile;
ScriptCompiler::Source script_source(source, origin, cached_data);
MaybeLocal<Function> maybe_fun =
ScriptCompiler::CompileFunctionInContext(context,
&script_source,
parameters->size(),
parameters->data(),
0,
nullptr,
options);
// This could fail when there are early errors in the native modules,
// e.g. the syntax errors
if (maybe_fun.IsEmpty()) {
// In the case of early errors, v8 is already capable of
// decorating the stack for us - note that we use CompileFunctionInContext
// so there is no need to worry about wrappers.
return MaybeLocal<Function>();
}
Local<Function> fun = maybe_fun.ToLocalChecked();
// XXX(joyeecheung): this bookkeeping is not exactly accurate because
// it only starts after the Environment is created, so the per_context.js
// will never be in any of these two sets, but the two sets are only for
// testing anyway.
*result = (has_cache && !script_source.GetCachedData()->rejected)
? Result::kWithCache
: Result::kWithoutCache;
// Generate new cache for next compilation
std::unique_ptr<ScriptCompiler::CachedData> new_cached_data(
ScriptCompiler::CreateCodeCacheForFunction(fun));
CHECK_NOT_NULL(new_cached_data);
// The old entry should've been erased by now so we can just emplace
code_cache_.emplace(id, std::move(new_cached_data));
return scope.Escape(fun);
}
这个函数先调用了LoadBuiiltModuleSource从文件系统中获取该文件内容。
MaybeLocal<String> NativeModuleLoader::LoadBuiltinModuleSource(Isolate* isolate,
const char* id) {
#ifdef NODE_BUILTIN_MODULES_PATH
std::string filename = OnDiskFileName(id);
uv_fs_t req;
uv_file file =
uv_fs_open(nullptr, &req, filename.c_str(), O_RDONLY, 0, nullptr);
CHECK_GE(req.result, 0);
uv_fs_req_cleanup(&req);
std::shared_ptr<void> defer_close(nullptr, [file](...) {
uv_fs_t close_req;
CHECK_EQ(0, uv_fs_close(nullptr, &close_req, file, nullptr));
uv_fs_req_cleanup(&close_req);
});
std::string contents;
char buffer[4096];
uv_buf_t buf = uv_buf_init(buffer, sizeof(buffer));
while (true) {
const int r =
uv_fs_read(nullptr, &req, file, &buf, 1, contents.length(), nullptr);
CHECK_GE(req.result, 0);
uv_fs_req_cleanup(&req);
if (r <= 0) {
break;
}
contents.append(buf.base, r);
}
return String::NewFromUtf8(
isolate, contents.c_str(), v8::NewStringType::kNormal, contents.length());
#else
const auto source_it = source_.find(id);
CHECK_NE(source_it, source_.end());
return source_it->second.ToStringChecked(isolate);
#endif // NODE_BUILTIN_MODULES_PATH
}
获取文件内容用到了libuv的文件系统相关api。然后调用CompileFunctionInContext将文件内容和之前传入的含有getLinkedBinding等方法的参数对象包裹起来编译形成一个新的可执行函数,这个可执行函数就拥有了执行internal/bootstrap/loaders
的能力,同时可在其中调用getLinkedBinding等C++函数。那么是时候进入这个loaders文件了。
实际上对于该文件的功能,在开头的注释说的很清楚。
// This file creates the internal module & binding loaders used by built-in
// modules. In contrast, user land modules are loaded using
// lib/internal/modules/cjs/loader.js (CommonJS Modules) or
// lib/internal/modules/esm/* (ES Modules).
internal module loader
,就是JS写的native module加载器,并提供给其通过binding loaders调用built-in module的能力。这里需要注意到,用户通过CommonJS和ES引入的module走的是不同的机制,这块内容在下篇文章的bootstrap第二步,这里先不谈。关于在这个文件中创建的internal module也有注释介绍:
// Internal JavaScript module loader:
// - NativeModule: a minimal module system used to load the JavaScript core
// modules found in lib/**/*.js and deps/**/*.js. All core modules are
// compiled into the node binary via node_javascript.cc generated by js2c.py,
// so they can be loaded faster without the cost of I/O. This class makes the
// lib/internal/*, deps/internal/* modules and internalBinding() available by
// default to core modules, and lets the core modules require itself via
// require('internal/bootstrap/loaders') even when this file is not written in
// CommonJS style.
知道了目的,再看具体内容。首先在process对象上定义了moduleLoadedList属性,用来表示已加载的module。
// Set up process.moduleLoadList.
const moduleLoadList = [];
ObjectDefineProperty(process, 'moduleLoadList', {
value: moduleLoadList,
configurable: true,
enumerable: true,
writable: false
});
然后定义了binding方法和_linkedBinding方法,分别使用getInternalBinding和getLinkedBinding来获取built-in模块。
process.binding = function binding(module) {
module = String(module);
// Deprecated specific process.binding() modules, but not all, allow
// selective fallback to internalBinding for the deprecated ones.
if (internalBindingWhitelist.has(module)) {
return internalBinding(module);
}
// eslint-disable-next-line no-restricted-syntax
throw new Error(`No such module: ${module}`);
};
process._linkedBinding = function _linkedBinding(module) {
module = String(module);
let mod = bindingObj[module];
if (typeof mod !== 'object')
mod = bindingObj[module] = getLinkedBinding(module);
return mod;
};
}
然后设置了NativeModule类,也就是先前注释提到的 NativeModule了,用于加载被js2c.py编译进node_javascript.cc的JS模块。最后需要关注这个文件的返回值。
const loaderExports = {
internalBinding,
NativeModule,
require: nativeModuleRequire
};
internalBinding和NativeModule扮演的角色都说清楚了,主要看这里的require怎么运作,找到nativeModuleRequire。
function nativeModuleRequire(id) {
if (id === loaderId) {
return loaderExports;
}
const mod = NativeModule.map.get(id);
// Can't load the internal errors module from here, have to use a raw error.
// eslint-disable-next-line no-restricted-syntax
if (!mod) throw new TypeError(`Missing internal module '${id}'`);
return mod.compileForInternalLoader();
}
根据require的Id找到对应模块的nativeModule实例,找到后调用其compileForInternalLoader方法。
compileForInternalLoader() {
if (this.loaded || this.loading) {
return this.exports;
}
const id = this.id;
this.loading = true;
try {
const requireFn = this.id.startsWith('internal/deps/') ?
requireWithFallbackInDeps : nativeModuleRequire;
const fn = compileFunction(id);
fn(this.exports, requireFn, this, process, internalBinding, primordials);
this.loaded = true;
} finally {
this.loading = false;
}
moduleLoadList.push(`NativeModule ${id}`);
return this.exports;
}
}
这个函数会通过native_module.cc
里的compileFunction将JS模块包裹成接受6个参数的函数执行,并返回其exports。至此,我们终于实现了JS模块和c++模块打通。还是以buffer
为例,打开lib/internal/buffer.js
,里面调用的require函数和internalbinding函数是如何得到的也就很清楚了。
小结
这篇文章谈到了node中built-in模块和native模块的准备,当然关于模块准备的内容还没完,比如一段node代码执行时,还会有自定义的模块以及npm下载的三方模块。这些内容都在下一篇文章bootstrap的第二步中。
-- EOF --