Skip to content

Get details of a single crawler

GET
/api/v2/organizations/{organization}/projects/{project}/crawlers/{crawler}

Authorizations

Parameters

Path Parameters

organization
required
string

Organization identifier

project
required
string

Project identifier

crawler
required
string

Responses

200

The request has succeeded.

object
id
required

Crawler ID

integer
456
name

Crawler name

string
Test Crawler
project_id
required

Project ID

integer
789
uuid
required

Crawler UUID

string
550e8400-e29b-41d4-a716-446655440000
config
required

Crawler configuration (YAML)

string
domain: test-domain.com\nconfig:\n max_html: 100\n browser_mode: false
domain
required

Crawler domain

string
test-domain.com
domain_verified

Domain verification status

integer
1
urls_list

URLs list (YAML)

string
single_url:\n - /\n - /about\n - /contact
webhook_url

Webhook URL for notifications

string
https://example.com/webhook
webhook_auth_header

Authorization header for webhook

string
Bearer token123
webhook_extra_vars

Extra variables for webhook

string
key1=value1&key2=value2
browser_mode

Browser mode enabled

boolean
workers

Number of concurrent workers

integer
2
delay

Delay between requests in seconds

number format: float
4
depth

Maximum crawl depth

integer
-1
max_hits

Maximum total requests

integer
0
max_html

Maximum HTML pages

integer
50
status_ok

HTTP status codes for content capture

Array<integer>
[
200
]
user_agent

Custom user agent

string
Mozilla/5.0...
max_errors

Maximum errors before stopping

integer
100
start_urls

Starting URLs

Array<string>
[
"/",
"/blog"
]
urls

URLs list

Array<string>
[
"/",
"/about"
]
headers

Custom headers

object
key
additional properties
string
{
"Authorization": "Bearer token"
}
exclude

URL patterns to exclude

Array<string>
[
"/admin/*"
]
include

URL patterns to include

Array<string>
[
"/blog/*"
]
sitemap

Sitemap configuration

Array<object>
object
url

Sitemap URL

string
/sitemap.xml
recursive

Recursively follow sitemap links

boolean
true
[
{
"url": "/sitemap.xml",
"recursive": true
}
]
allowed_domains

Allowed domains

Array<string>
[
"example.com"
]
assets

Asset harvesting configuration

object
network_intercept

Network intercept configuration for asset collection

object
enabled

Enable network intercept

boolean
true
timeout

Request timeout in seconds

integer
30
execute_js

Execute JavaScript during asset collection

boolean
parser

Parser configuration for asset extraction

object
enabled

Enable parser

boolean
true
{
"network_intercept": {
"enabled": true,
"timeout": 30,
"execute_js": false
},
"parser": {
"enabled": true
}
}
created_at

Creation timestamp

string format: date-time
2024-01-20T09:15:00Z
updated_at

Last update timestamp

string format: date-time
2024-10-11T16:45:00Z
deleted_at

Deletion timestamp

string format: date-time
nullable

400

The server could not understand the request due to invalid syntax.

object
message
required

Error message

string
The requested resource was not found
error
required

Error flag

boolean
true

403

Access is forbidden.

object
message
required

Error message

string
The requested resource was not found
error
required

Error flag

boolean
true