Current location: Home> Ai Course> AI Basics

【2025】Use NPX to generate LLMs.txt file

Author: LoRA Time: 10 Mar 2025 1016

Clean, high-quality text data is crucial in AI training and data analysis. The NPX package generate-llmstxt provided by Firecrawl allows users to extract structured text directly from the website and generate llms.txt and llms-full.txt files for LLM. This article will introduce its installation, usage and optimization techniques to help you extract LLM training data efficiently.

What is generate-llmstxt?

generate-llmstxt is an NPX package that uses the Firecrawl API to convert web pages into structured text files for LLM training or data analysis.

Output file

  • llms.txt : Extract key information on the web page and form summary text

  • llms-full.txt : Fully crawl web text, suitable for deeper AI training

Default storage location

  • public/llms.txt

  • public/llms-full.txt

How to generate LLMs.txt using generate-llmstxt?

Method 1: Provide API Key directly using the command line

 npx generate-llmstxt --api-key YOUR_FIRECRAWL_API_KEY

Way 2 Use .env file storage API Key

Create a .env file in the project root directory and add the following

 FIRECRAWL_API_KEY=your_api_key_here

Then run

 npx generate-llmsstxt

Common options analysis

parametereffectdefault value
-k, --api-key <key>Firecrawl API Key (if using .env, you can omit it)Required
-u, --url <url>The target website URL to be crawledhttps://example.com
-m, --max-urls <number>Maximum number of crawled pages (1-100)50
-o, --output-dir <path>Specify the output directorypublic

Example usage

Run directly (using the default output directory public/)

 npx generate-llmstxt -k your_api_key -u https://your-website.com -m 20

Env file (no need --api-key)

 npx generate-llmsstxt -u https://your-website.com -m 20

Specify a custom output directory

 npx generate-llmsstxt -k your_api_key -u https://your-website.com -o custom/output/path

Use .env files + custom output directory

 npx generate-llmsstxt -u https://your-website.com -o content/llms

Example of generating file

llms.txt example (summary version)

 # LLMs.txt - AI Training Summary Data - Website Name: Your Website
- Topic: Artificial Intelligence Data Processing - Key Points:
  1. Provide data crawling API
  2. Suitable for LLM training 3. Support text analysis

llms-full.txt example (full version)

 # LLMs-Full.txt - Full Text Data## Website Title: Your Website - AI Data Extraction Website provides an automated way to convert web page content into LLM training data. Its API allows users to crawl text and generate structured summary and full-text data...

Operation requirements

Node.js 14+ required
A valid Firecrawl API Key (command line or .env file) must be provided

in conclusion

Using generate-llmstxt, you can easily crawl web content and generate structured text data for LLM training. Whether it is a summary (llms.txt) or a complete text (llms-full.txt), it can meet different AI needs.

Try npx generate-llmstxt now to improve AI training efficiency!