Skip to content

Commit f2a570a

Browse files
authored
Merge pull request #55 from ScrapeGraphAI/htmlify-integration
add htmlify service
2 parents 787df8f + 854f338 commit f2a570a

22 files changed

+3771
-0
lines changed

scrapegraph-js/README.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,91 @@ const prompt = 'What does the company do?';
5555

5656
## 🎯 Examples
5757

58+
### Scrape - Get HTML Content
59+
60+
#### Basic Scrape
61+
62+
```javascript
63+
import { scrape } from 'scrapegraph-js';
64+
65+
const apiKey = 'your-api-key';
66+
const url = 'https://example.com';
67+
68+
(async () => {
69+
try {
70+
const response = await scrape(apiKey, url);
71+
console.log('HTML content:', response.html);
72+
console.log('Status:', response.status);
73+
} catch (error) {
74+
console.error('Error:', error);
75+
}
76+
})();
77+
```
78+
79+
#### Scrape with Heavy JavaScript Rendering
80+
81+
```javascript
82+
import { scrape } from 'scrapegraph-js';
83+
84+
const apiKey = 'your-api-key';
85+
const url = 'https://example.com';
86+
87+
(async () => {
88+
try {
89+
const response = await scrape(apiKey, url, {
90+
renderHeavyJs: true
91+
});
92+
console.log('HTML content with JS rendering:', response.html);
93+
} catch (error) {
94+
console.error('Error:', error);
95+
}
96+
})();
97+
```
98+
99+
#### Scrape with Custom Headers
100+
101+
```javascript
102+
import { scrape } from 'scrapegraph-js';
103+
104+
const apiKey = 'your-api-key';
105+
const url = 'https://example.com';
106+
107+
(async () => {
108+
try {
109+
const response = await scrape(apiKey, url, {
110+
headers: {
111+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
112+
'Cookie': 'session=123'
113+
}
114+
});
115+
console.log('HTML content with custom headers:', response.html);
116+
} catch (error) {
117+
console.error('Error:', error);
118+
}
119+
})();
120+
```
121+
122+
#### Get Scrape Request Status
123+
124+
```javascript
125+
import { getScrapeRequest } from 'scrapegraph-js';
126+
127+
const apiKey = 'your-api-key';
128+
const requestId = 'your-request-id';
129+
130+
(async () => {
131+
try {
132+
const response = await getScrapeRequest(apiKey, requestId);
133+
console.log('Request status:', response.status);
134+
if (response.status === 'completed') {
135+
console.log('HTML content:', response.html);
136+
}
137+
} catch (error) {
138+
console.error('Error:', error);
139+
}
140+
})();
141+
```
142+
58143
### Scraping Websites
59144

60145
#### Basic Scraping
@@ -395,6 +480,99 @@ const feedbackText = 'This is a test feedback message.';
395480
})();
396481
```
397482

483+
## 🔧 Available Functions
484+
485+
### Scrape
486+
487+
#### `scrape(apiKey, url, options)`
488+
489+
Converts a webpage into HTML format with optional JavaScript rendering.
490+
491+
**Parameters:**
492+
- `apiKey` (string): Your ScrapeGraph AI API key
493+
- `url` (string): The URL of the webpage to convert
494+
- `options` (object, optional): Configuration options
495+
- `renderHeavyJs` (boolean, optional): Whether to render heavy JavaScript (default: false)
496+
- `headers` (object, optional): Custom headers to send with the request
497+
498+
**Returns:** Promise that resolves to an object containing:
499+
- `html`: The HTML content of the webpage
500+
- `status`: Request status ('completed', 'processing', 'failed')
501+
- `scrape_request_id`: Unique identifier for the request
502+
- `error`: Error message if the request failed
503+
504+
**Example:**
505+
```javascript
506+
const response = await scrape(apiKey, 'https://example.com', {
507+
renderHeavyJs: true,
508+
headers: { 'User-Agent': 'Custom Agent' }
509+
});
510+
```
511+
512+
#### `getScrapeRequest(apiKey, requestId)`
513+
514+
Retrieves the status or result of a previous scrape request.
515+
516+
**Parameters:**
517+
- `apiKey` (string): Your ScrapeGraph AI API key
518+
- `requestId` (string): The unique identifier for the scrape request
519+
520+
**Returns:** Promise that resolves to the request result object.
521+
522+
**Example:**
523+
```javascript
524+
const result = await getScrapeRequest(apiKey, 'request-id-here');
525+
```
526+
527+
### Smart Scraper
528+
529+
#### `smartScraper(apiKey, url, prompt, schema, numberOfScrolls, totalPages, cookies)`
530+
531+
Extracts structured data from websites using AI-powered scraping.
532+
533+
**Parameters:**
534+
- `apiKey` (string): Your ScrapeGraph AI API key
535+
- `url` (string): The URL of the website to scrape
536+
- `prompt` (string): Natural language prompt describing what to extract
537+
- `schema` (object, optional): Zod schema for structured output
538+
- `numberOfScrolls` (number, optional): Number of scrolls for infinite scroll pages
539+
- `totalPages` (number, optional): Number of pages to scrape
540+
- `cookies` (object, optional): Cookies for authentication
541+
542+
### Search Scraper
543+
544+
#### `searchScraper(apiKey, prompt, url, numResults, headers, outputSchema)`
545+
546+
Searches and extracts information from multiple web sources using AI.
547+
548+
### Crawl API
549+
550+
#### `crawl(apiKey, url, prompt, dataSchema, extractionMode, cacheWebsite, depth, maxPages, sameDomainOnly, sitemap, batchSize)`
551+
552+
Starts a crawl job to extract structured data from a website and its linked pages.
553+
554+
### Markdownify
555+
556+
#### `markdownify(apiKey, url, headers)`
557+
558+
Converts a webpage into clean, well-structured markdown format.
559+
560+
### Agentic Scraper
561+
562+
#### `agenticScraper(apiKey, url, steps, useSession, userPrompt, outputSchema, aiExtraction)`
563+
564+
Performs automated actions on webpages using step-by-step instructions.
565+
566+
### Utility Functions
567+
568+
#### `getCredits(apiKey)`
569+
570+
Retrieves your current credit balance and usage statistics.
571+
572+
#### `sendFeedback(apiKey, requestId, rating, feedbackText)`
573+
574+
Submits feedback for a specific request.
575+
398576
## 📚 Documentation
399577

400578
For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)

0 commit comments

Comments
 (0)