Node源码解析——BootstrapNode
继续看Bootstrap第二步。如果第一步概括为内部模块初始化,这一步就是node环境初始化。包括process对象上的属性,一些全局对象或方法等,以及正式执行我们的JS代码并解析三方模块依赖。
BootstrapNode
回到node.cc
找到BootstrapNode的实现。
MaybeLocal<Value> Environment::BootstrapNode() {
EscapableHandleScope scope(isolate_);
Local<Object> global = context()->Global();
// TODO(joyeecheung): this can be done in JS land now.
global->Set(context(), FIXED_ONE_BYTE_STRING(isolate_, "global"), global)
.Check();
// process, require, internalBinding, primordials
std::vector<Local<String>> node_params = {
process_string(),
require_string(),
internal_binding_string(),
primordials_string()};
std::vector<Local<Value>> node_args = {
process_object(),
native_module_require(),
internal_binding_loader(),
primordials()};
MaybeLocal<Value> result = ExecuteBootstrapper(
this, "internal/bootstrap/node", &node_params, &node_args);
// ...
}
首先在context上设置了一个global的代理,通过该代理属性,我们就能用JS修改底层的global对象,也就是node中我们可以用的global对象了。
然后依然是准备ExecuteBootstrapper的参数。可以看到这里除了和上一步相同的参数: process和primordials外,有变化的是internal_binding和require,也就是上一步在internal/bootstrap/loaders
准备好导出的internalBinding和nativeModuleRequire函数了。
接下来就执行internal/bootstrap/node
。进去看细节,代码很多,不过可以先看注释总结的核心目的,我这里提取一些关键。
// This file is expected not to perform any asynchronous operations itself
// when being executed - those should be done in either
// `lib/internal/bootstrap/pre_execution.js` or in main scripts. The majority
// of the code here focus on setting up the global proxy and the process
// object in a synchronous manner.
// As special caution is given to the performance of the startup process,
// many dependencies are invoked lazily.
// This file is compiled as if it's wrapped in a function with arguments
// passed by node::RunBootstrapping()
/* global process, require, internalBinding */
只做同步操作的文件,聚焦于安装全局代理和process的同步行为,同时也提到了异步操作应该被执行的文件位置,以及该文件包装的函数接受到的参数,也是之前提到的:process, require, internalBinding。
那么具体安装了些什么呢,细节代码就不分析了,不是流程的重点,文件内注释都很清楚,感兴趣可以再细读。这里先概括为如下几条:
- setupPrepareStackTrace
- setupProcessObject
- setupGlobalProxy
- setupBuffer
- Bootstrappers for all threads
- Set up methods on the process object for all threads
- credentials
- Setup the callbacks that node::AsyncWrap will call when there are hooks to process
- setupTaskQueue
- setupTimers
- Set the per-Environment callback that will be called when the TrackingTraceStateObserver updates trace state
至此两步bootstrap完成,env创建完毕,回到node_main_instrance.cc
,得到env后就准备调用LoadEnvironment开始执行使用者的代码了。
LoadEnvironment
该函数定义在node.cc
中内容很简单,就是启动主线程。
void LoadEnvironment(Environment* env) {
CHECK(env->is_main_thread());
// TODO(joyeecheung): Not all of the execution modes in
// StartMainThreadExecution() make sense for embedders. Pick the
// useful ones out, and allow embedders to customize the entry
// point more directly without using _third_party_main.js
USE(StartMainThreadExecution(env));
}
这个Start函数的实现中就是根据不同的用户输入决定用什么模式启动,并执行对应模式的启动文件。比如有inspect参数就会进入调试模式。
if (first_argv == "inspect" || first_argv == "debug") {
return StartExecution(env, "internal/main/inspect");
}
我们看正常启动不带参数的默认模式。
if (!first_argv.empty() && first_argv != "-") {
return StartExecution(env, "internal/main/run_main_module");
}
找到了internal/main/run_main_module
,不过在进入之前,先看StartExecution怎么调用这个文件。
MaybeLocal<Value> StartExecution(Environment* env, const char* main_script_id) {
EscapableHandleScope scope(env->isolate());
CHECK_NOT_NULL(main_script_id);
std::vector<Local<String>> parameters = {
env->process_string(),
env->require_string(),
env->internal_binding_string(),
env->primordials_string(),
FIXED_ONE_BYTE_STRING(env->isolate(), "markBootstrapComplete")};
std::vector<Local<Value>> arguments = {
env->process_object(),
env->native_module_require(),
env->internal_binding_loader(),
env->primordials(),
env->NewFunctionTemplate(MarkBootstrapComplete)
->GetFunction(env->context())
.ToLocalChecked()};
return scope.EscapeMaybe(
ExecuteBootstrapper(env, main_script_id, ¶meters, &arguments));
}
见过很多次了的ExecuteBootstapper,传入的参数也没有什么变化,还是核心的process, require,internalbinding,primordials老四样,就是多了个markBootstrapComplete的标记。那么就进入run_main_module
看怎么启动我们自己的JS。
'use strict';
const {
prepareMainThreadExecution
} = require('internal/bootstrap/pre_execution');
prepareMainThreadExecution(true);
markBootstrapComplete();
// Note: this loads the module through the ESM loader if the module is
// determined to be an ES module. This hangs from the CJS module loader
// because we currently allow monkey-patching of the module loaders
// in the preloaded scripts through require('module').
// runMain here might be monkey-patched by users in --require.
// XXX: the monkey-patchability here should probably be deprecated.
require('internal/modules/cjs/loader').Module.runMain(process.argv[1]);
代码很少,不过给的信息很关键。总体分两步,先是调用pre_execution
的prepareMainThreadExecution做Bootstrap的最后工作,并且调用完成后会用上一步多传的标记函数标记Bootstrap完成。
之后就通过internal/modules/cjs/loader
这个文件里的Module的runMain方法正式的跑起来我们的JS了。process.argv[1]
这个参数也就是对应我们node index.js
这样启动的index.js
。
特别注意这里一大段注释,module实现上采用了monkey-patch的形式同时支持ES module和CJS Module,同时也支持用户通过--require参数预加载脚本monkey-patch其他模块实现。那么可以预见到,这个预加载脚本的实现一定就在第一步的pre_execution
里。进到里面看细节:
function prepareMainThreadExecution(expandArgv1 = false) {
// Patch the process object with legacy properties and normalizations
patchProcessObject(expandArgv1);
setupTraceCategoryState();
setupInspectorHooks();
setupWarningHandler();
// Resolve the coverage directory to an absolute path, and
// overwrite process.env so that the original path gets passed
// to child processes even when they switch cwd.
if (process.env.NODE_V8_COVERAGE) {
process.env.NODE_V8_COVERAGE =
setupCoverageHooks(process.env.NODE_V8_COVERAGE);
}
// If source-map support has been enabled, we substitute in a new
// prepareStackTrace method, replacing the default in errors.js.
if (getOptionValue('--enable-source-maps')) {
const { prepareStackTrace } =
require('internal/source_map/prepare_stack_trace');
const { setPrepareStackTraceCallback } = internalBinding('errors');
setPrepareStackTraceCallback(prepareStackTrace);
}
setupDebugEnv();
// Print stack trace on `SIGINT` if option `--trace-sigint` presents.
setupStacktracePrinterOnSigint();
// Process initial diagnostic reporting configuration, if present.
initializeReport();
initializeReportSignalHandlers(); // Main-thread-only.
initializeHeapSnapshotSignalHandlers();
// If the process is spawned with env NODE_CHANNEL_FD, it's probably
// spawned by our child_process module, then initialize IPC.
// This attaches some internal event listeners and creates:
// process.send(), process.channel, process.connected,
// process.disconnect().
setupChildProcessIpcChannel();
// Load policy from disk and parse it.
initializePolicy();
// If this is a worker in cluster mode, start up the communication
// channel. This needs to be done before any user code gets executed
// (including preload modules).
initializeClusterIPC();
initializeDeprecations();
initializeWASI();
initializeCJSLoader();
initializeESMLoader();
const CJSLoader = require('internal/modules/cjs/loader');
assert(!CJSLoader.hasLoadedAnyUserCJSModule);
loadPreloadModules();
initializeFrozenIntrinsics();
}
同样是大量的初始化工作,这里先关注和启动流程关系紧密的部分——initializeCJSLoader。
function initializeCJSLoader() {
const CJSLoader = require('internal/modules/cjs/loader');
CJSLoader.Module._initPaths();
// TODO(joyeecheung): deprecate this in favor of a proper hook?
CJSLoader.Module.runMain =
require('internal/modules/run_main').executeUserEntryPoint;
}
这里设置了CJSLoader的Module的runMain方法为internal/modules/run_main
的executeUserEntryPoint,也就是我们会在run_main_module
最后调用的方法。不过在进入之前,还要注意一下之前注释提到的--require参数处理。这个就是紧接着的loadPreloadModules方法。
function loadPreloadModules() {
// For user code, we preload modules if `-r` is passed
const preloadModules = getOptionValue('--require');
if (preloadModules && preloadModules.length > 0) {
const {
Module: {
_preloadModules
},
} = require('internal/modules/cjs/loader');
_preloadModules(preloadModules);
}
}
这里会用CJSLoader的_preloadModules去执行我们传入的脚本,正好接下来的内容都指向了这个loader,那么就进去看一看。
文件内容很多,先不细看,只需要知道这里主要有一个Module构造函数,并且上面有一些处理Module的内部方法。我们从上面涉及到的两个方法看起,先看其执行预加载脚本的_preloadModules。
Module._preloadModules = function(requests) {
if (!ArrayIsArray(requests))
return;
// Preloaded modules have a dummy parent module which is deemed to exist
// in the current working directory. This seeds the search path for
// preloaded modules.
const parent = new Module('internal/preload', null);
try {
parent.paths = Module._nodeModulePaths(process.cwd());
} catch (e) {
if (e.code !== 'ENOENT') {
throw e;
}
}
for (let n = 0; n < requests.length; n++)
parent.require(requests[n]);
};
这里主要做的事是为了执行预脚本,先用之前的Module构造函数构造一个假模块,然后通过require的形式去查找并调用。这里的模块和require其实就是大家都很熟悉的模块引用的module和require了。为什么要造一个假模块呢?注释也说的很清楚,为了给预加载脚本里的模块提供路径索引。这里可以理解为运行时我们构建了一个名为preload
的模块并和其他模块比如fs
,stream
等保持同一层级。在执行我们的文件前,先引用了这个模块,并在这个模块里require了预脚本。如果预脚本再引用了模块,就会以该文件的位置去定位查找。
接下来就看executeUserEntryPoint方法,我们的文件会传入该方法中执行。
// For backwards compatibility, we have to run a bunch of
// monkey-patchable code that belongs to the CJS loader (exposed by
// `require('module')`) even when the entry point is ESM.
function executeUserEntryPoint(main = process.argv[1]) {
const resolvedMain = resolveMainPath(main);
const useESMLoader = shouldUseESMLoader(resolvedMain);
if (useESMLoader) {
runMainESM(resolvedMain || main);
} else {
// Module._load is the monkey-patchable CJS module loader.
Module._load(main, null, true);
}
}
function shouldUseESMLoader(mainPath) {
const userLoader = getOptionValue('--experimental-loader');
if (userLoader)
return true;
const esModuleSpecifierResolution =
getOptionValue('--es-module-specifier-resolution');
if (esModuleSpecifierResolution === 'node')
return true;
// Determine the module format of the main
if (mainPath && mainPath.endsWith('.mjs'))
return true;
if (!mainPath || mainPath.endsWith('.cjs'))
return false;
const pkg = readPackageScope(mainPath);
return pkg && pkg.data.type === 'module';
}
找到文件路径并判断其模块类型,判断方法主要靠文件后缀或用户传入选项。我们直接看普通的CJS执行,会调用Module的_load方法。内容很长,我挑重点步骤说明。
const filename = Module._resolveFilename(request, parent, isMain);
const cachedModule = Module._cache[filename];
if (cachedModule !== undefined) {
updateChildren(parent, cachedModule, true);
if (!cachedModule.loaded)
return getExportsForCircularRequire(cachedModule);
return cachedModule.exports;
}
const mod = loadNativeModule(filename, request);
if (mod && mod.canBeRequiredByUsers) return mod.exports;
// lib/internal/modules/cjs/helpers.js
const { NativeModule } = require('internal/bootstrap/loaders');
function loadNativeModule(filename, request) {
const mod = NativeModule.map.get(filename);
if (mod) {
debug('load native module %s', request);
mod.compileForPublicLoader();
return mod;
}
}
当有父模块时会由父模块文件名得到当前文件名,我们这是第一个文件显然没有父模块就直接解析文件名,再判断缓存模块是否命中。没命中则进入下一阶段,判断其是否为nativeModule,是能被required的nativeModule就返回其导出。nativeModule就是之前谈到的bootstrap第一步导出的了。
const module = new Module(filename, parent);
if (isMain) {
process.mainModule = module;
module.id = '.';
}
Module._cache[filename] = module;
if (parent !== undefined) {
relativeResolveCache[relResolveCacheIdentifier] = filename;
}
let threw = true;
try {
// Intercept exceptions that occur during the first tick and rekey them
// on error instance rather than module instance (which will immediately be
// garbage collected).
if (enableSourceMaps) {
try {
module.load(filename);
} catch (err) {
rekeySourceMap(Module._cache[filename], err);
throw err; /* node-do-not-add-exception-line */
}
} else {
module.load(filename);
}
threw = false;
} finally {
if (threw) {
delete Module._cache[filename];
if (parent !== undefined) {
delete relativeResolveCache[relResolveCacheIdentifier];
}
} else if (module.exports &&
ObjectGetPrototypeOf(module.exports) ===
CircularRequirePrototypeWarningProxy) {
ObjectSetPrototypeOf(module.exports, PublicObjectPrototype);
}
}
return module.exports;
如果不是nativeModule那就是来自三方模块或者我们自己的文件了,就用Module构造函数创建一个Module实例,并调用其load方法。
Module.prototype.load = function(filename) {
debug('load %j for module %j', filename, this.id);
assert(!this.loaded);
this.filename = filename;
this.paths = Module._nodeModulePaths(path.dirname(filename));
const extension = findLongestRegisteredExtension(filename);
// allow .mjs to be overridden
if (filename.endsWith('.mjs') && !Module._extensions['.mjs']) {
throw new ERR_REQUIRE_ESM(filename);
}
Module._extensions[extension](this, filename);
this.loaded = true;
const ESMLoader = asyncESM.ESMLoader;
const url = `${pathToFileURL(filename)}`;
const module = ESMLoader.moduleMap.get(url);
// Create module entry at load time to snapshot exports correctly
const exports = this.exports;
// Called from cjs translator
if (module !== undefined && module.module !== undefined) {
if (module.module.getStatus() >= kInstantiated)
module.module.setExport('default', exports);
} else {
// Preemptively cache
// We use a function to defer promise creation for async hooks.
ESMLoader.moduleMap.set(
url,
// Module job creation will start promises.
// We make it a function to lazily trigger those promises
// for async hooks compatibility.
() => new ModuleJob(ESMLoader, url, () =>
new ModuleWrap(url, undefined, ['default'], function() {
this.setExport('default', exports);
})
, false /* isMain */, false /* inspectBrk */)
);
}
};
这里会根据不同的文件拓展名调用Module._extension方法。同时注意后续有关于ES模块的异步处理。直接看JS拓展名的调用。
Module._extensions['.js'] = function(module, filename) {
if (filename.endsWith('.js')) {
const pkg = readPackageScope(filename);
// Function require shouldn't be used in ES modules.
if (pkg && pkg.data && pkg.data.type === 'module') {
const parentPath = module.parent && module.parent.filename;
const packageJsonPath = path.resolve(pkg.path, 'package.json');
throw new ERR_REQUIRE_ESM(filename, parentPath, packageJsonPath);
}
}
const content = fs.readFileSync(filename, 'utf8');
module._compile(content, filename);
};
同样有一段关于在ES模块中调用require的错误处理。主要还是通过fs模块读取文件内容并调用_compile编译。
// ...
const compiledWrapper = wrapSafe(filename, content, this);
// ...
const dirname = path.dirname(filename);
const require = makeRequireFunction(this, redirects);
let result;
const exports = this.exports;
const thisValue = exports;
const module = this;
if (requireDepth === 0) statCache = new Map();
if (inspectorWrapper) {
result = inspectorWrapper(compiledWrapper, thisValue, exports,
require, module, filename, dirname);
} else {
result = compiledWrapper.call(thisValue, exports, require, module,
filename, dirname);
}
// ...
省略细节看关键。会通过wrapSafe将文件内容包装一层成新函数,并传入了6个可用参数供文件内部调用。也就是我们熟悉的module.exports,exports,require等。先看wrapsafe是如何包装的,其实应该也能猜到,这里和之前bootstrap执行器的编译并绑定internalBinding等参数应该类似,还是通过V8操作。
const { compileFunction } = internalBinding('contextify');
function wrapSafe(filename, content, cjsModuleInstance) {
if (patched) {
const wrapper = Module.wrap(content);
return vm.runInThisContext(wrapper, {
filename,
lineOffset: 0,
displayErrors: true,
importModuleDynamically: async (specifier) => {
const loader = asyncESM.ESMLoader;
return loader.import(specifier, normalizeReferrerURL(filename));
},
});
}
let compiled;
try {
compiled = compileFunction(
content,
filename,
0,
0,
undefined,
false,
undefined,
[],
[
'exports',
'require',
'module',
'__filename',
'__dirname',
]
);
} catch (err) {
if (process.mainModule === cjsModuleInstance)
enrichCJSError(err);
throw err;
}
const { callbackMap } = internalBinding('module_wrap');
callbackMap.set(compiled.cacheKey, {
importModuleDynamically: async (specifier) => {
const loader = asyncESM.ESMLoader;
return loader.import(specifier, normalizeReferrerURL(filename));
}
});
return compiled.function;
}
如果已经包装过,则通过vm模块运行,否则通过c++写的contextify模块中的compileFunction编译。
MaybeLocal<Function> maybe_fn = ScriptCompiler::CompileFunctionInContext(
parsing_context, &source, params.size(), params.data(),
context_extensions.size(), context_extensions.data(), options,
v8::ScriptCompiler::NoCacheReason::kNoCacheNoReason, &script)
并通过module_wrap包装,调用后则返回该文件的exports,loadEnvironment流程结束,开启eventloop。
最后再看一下makeRequireFunction包装的最终我们使用到的require,在lib/internal/modules/cjs/helpers.js
中。
function makeRequireFunction(mod, redirects) {
const Module = mod.constructor;
let require;
if (redirects) {
const { resolve, reaction } = redirects;
const id = mod.filename || mod.id;
require = function require(path) {
let missing = true;
const destination = resolve(path);
if (destination === true) {
missing = false;
} else if (destination) {
const href = destination.href;
if (destination.protocol === 'node:') {
const specifier = destination.pathname;
const mod = loadNativeModule(specifier, href);
if (mod && mod.canBeRequiredByUsers) {
return mod.exports;
}
throw new ERR_UNKNOWN_BUILTIN_MODULE(specifier);
} else if (destination.protocol === 'file:') {
let filepath;
if (urlToFileCache.has(href)) {
filepath = urlToFileCache.get(href);
} else {
filepath = fileURLToPath(destination);
urlToFileCache.set(href, filepath);
}
return mod.require(filepath);
}
}
if (missing) {
reaction(new ERR_MANIFEST_DEPENDENCY_MISSING(id, path));
}
return mod.require(path);
};
} else {
require = function require(path) {
return mod.require(path);
};
}
这里如果有自定义路径参数就走自定义,否则会调用Module实例上的require方法。
// Loads a module at the given file path. Returns that module's
// `exports` property.
Module.prototype.require = function(id) {
validateString(id, 'id');
if (id === '') {
throw new ERR_INVALID_ARG_VALUE('id', id,
'must be a non-empty string');
}
requireDepth++;
try {
return Module._load(id, this, /* isMain */ false);
} finally {
requireDepth--;
}
};
基于_load的简单封装,后面就和启动文件的流程无异了。
-- EOF --