npm 的完整公共数据集可以通过 公共注册表 获取。使用 CouchDB 复制,你可以获得所有元数据的完整副本,并且根据我们的使用条款,下载 tar 包副本进行检查或实验是被允许的。
🌐 npm's full public dataset is available via the public registry. Using CouchDB replication, you can get a full copy of all metadata, and it is acceptable within our terms of use to download copies of tarballs for inspection or experimentation.
npm 的网站也提供了包的元数据。我们允许商业爬虫(如 GoogleBot)对这些内容进行索引。在我们的酌情决定下,我们也允许实验性爬虫访问本站,但前提是它们的请求速度保持在每秒 1 次或更低。以这个速度,索引所有包将需要 3 天,因此如果你想获取完整的元数据副本,通过复制访问数据总是更快,这只需一到两个小时就能提供完整数据,并且之后会自动保持同步。
🌐 npm's website also has package metadata available. We allow this content to be indexed by commercial crawlers such as GoogleBot. At our discretion, we also allow experimental crawlers to access the site, as long as they keep their request velocity to 1 request per second or less. At that velocity, indexing all packages would take 3 days, so if you want a full copy of our metadata it is always going to be faster to access the data via replication, which takes only an hour or two to provide full data and will thereafter automatically stay in sync.
如果你不希望安装 CouchDB 来管理复制,我们提供了 开源软件,可以轻松同步到注册表的公共源。
🌐 If you do not wish to install CouchDB to manage replication, we provide open source software that makes it easy to sync to the registry's public feed.
如果你尝试通过高速爬取 npm 网站来访问软件包元数据,我们保留限制或封禁你的 IP、用户代理或两者的权利。
🌐 If you attempt to access package metadata by high-velocity crawling of the npm website, we reserve the right to rate-limit or ban your IP, user-agent or both.