Skip to content

Update a crawler

PATCH
/api/v2/organizations/{organization}/projects/{project}/crawlers/{crawler}

Authorizations

Parameters

Path Parameters

organization
required
string

Organization identifier

project
required
string

Project identifier

crawler
required
string

Request Body required

object
mode

WAF operation mode

string
default: report
Allowed values: report block
paranoia_level

OWASP paranoia level

integer
default: 1 >= 1 <= 4
allow_rules

WAF rule IDs to allow/whitelist

Array<string>
allow_ip

IP addresses to allow

Array<string>
block_ip

IP addresses to block

Array<string>
block_asn

ASN numbers to block

Array<string>
block_ua

User agent patterns to block

Array<string>
block_referer

Referer patterns to block

Array<string>
notify_slack

Slack webhook URL for notifications

string
https://hooks.slack.com/services/XXX
notify_slack_hits_rpm

Minimum hits per minute to trigger Slack notification

integer
100
notify_email

Email addresses for notifications

Array<string>
httpbl

Project Honey Pot HTTP:BL configuration

object
httpbl_enabled

Enable HTTP:BL

boolean
block_suspicious

Block suspicious IPs

boolean
block_harvester

Block email harvesters

boolean
block_spam

Block spam sources

boolean
block_search_engine

Block search engines

boolean
httpbl_key

HTTP:BL API key

string
block_lists

Enable predefined block lists

object
user_agent

Block known bad user agents

boolean
referer

Block known bad referers

boolean
ip

Block known bad IPs

boolean
ai

Block AI crawlers

boolean
thresholds

Rate limiting thresholds

Array<object>
object
type

Threshold type

string
Allowed values: ip header waf_hit_by_ip
rps

Requests per second limit (for ip/header)

integer
10
hits

Hit count limit (for waf_hit_by_ip)

integer
10
minutes

Time window in minutes (for waf_hit_by_ip)

integer
5
cooldown

Cooldown period in seconds

integer
30
mode

Threshold enforcement mode

string
default: disabled
Allowed values: disabled report block
value

Header name (for header type)

string
nullable
notify_slack

Slack webhook for this threshold

string
nullable
name

Crawler name

string
Test Crawler
domain

Domain to crawl

string
test-domain.com
browser_mode

Enable browser mode

boolean
execute_js

Execute JavaScript during asset collection (only when browser_mode is enabled)

boolean
true
urls

URLs to crawl

Array<string>
[
"/",
"/about",
"/contact"
]
start_urls

Starting URLs for crawl

Array<string>
[
"/",
"/blog"
]
headers

Custom headers

object
key
additional properties
string
{
"Authorization": "Bearer token123",
"X-Custom-Header": "value"
}
exclude

URL patterns to exclude (regex)

Array<string>
[
"/admin/*",
"/private/*"
]
include

URL patterns to include (regex)

Array<string>
[
"/blog/*",
"/products/*"
]
webhook_url

Webhook URL for notifications

string
https://example.com/webhook
webhook_auth_header

Authorization header for webhook

string
Bearer token123
webhook_extra_vars

Extra variables for webhook

string
key1=value1&key2=value2
workers

Number of concurrent workers (verified domains only)

integer
>= 1 <= 20
4
delay

Delay between requests in seconds (verified domains only)

number format: float
<= 10
0.25
depth

Maximum crawl depth, -1 for unlimited (verified domains only)

integer
>= -1
-1
max_hits

Maximum total requests, 0 for unlimited (verified domains only)

integer
1000
max_html

Maximum HTML pages, 0 for unlimited (verified domains only)

integer
100
status_ok

HTTP status codes that will result in content being captured and pushed to Quant (verified domains only)

Array<integer>
[
200,
201
]
sitemap

Sitemap configuration (verified domains only)

Array<object>
object
[
{
"url": "/sitemap.xml",
"recursive": true
}
]
allowed_domains

Allowed domains for multi-domain crawling, automatically enables merge_domains (verified domains only)

Array<string>
[
"example.com",
"assets.example.com"
]
user_agent

Custom user agent, only when browser_mode is false (verified domains only)

string
Mozilla/5.0...
assets

Asset harvesting configuration (verified domains only)

object
{
"network_intercept": {
"enabled": true,
"timeout": 30
}
}
max_errors

Maximum errors before stopping crawl (verified domains only)

integer
1000

Responses

200

The request has succeeded.

object
mode

WAF operation mode

string
default: report
Allowed values: report block
paranoia_level

OWASP paranoia level

integer
default: 1 >= 1 <= 4
allow_rules

WAF rule IDs to allow/whitelist

Array<string>
allow_ip

IP addresses to allow

Array<string>
block_ip

IP addresses to block

Array<string>
block_asn

ASN numbers to block

Array<string>
block_ua

User agent patterns to block

Array<string>
block_referer

Referer patterns to block

Array<string>
notify_slack

Slack webhook URL for notifications

string
https://hooks.slack.com/services/XXX
notify_slack_hits_rpm

Minimum hits per minute to trigger Slack notification

integer
100
notify_email

Email addresses for notifications

Array<string>
httpbl

Project Honey Pot HTTP:BL configuration

object
httpbl_enabled

Enable HTTP:BL

boolean
block_suspicious

Block suspicious IPs

boolean
block_harvester

Block email harvesters

boolean
block_spam

Block spam sources

boolean
block_search_engine

Block search engines

boolean
httpbl_key

HTTP:BL API key

string
block_lists

Enable predefined block lists

object
user_agent

Block known bad user agents

boolean
referer

Block known bad referers

boolean
ip

Block known bad IPs

boolean
ai

Block AI crawlers

boolean
thresholds

Rate limiting thresholds

Array<object>
object
type

Threshold type

string
Allowed values: ip header waf_hit_by_ip
rps

Requests per second limit (for ip/header)

integer
10
hits

Hit count limit (for waf_hit_by_ip)

integer
10
minutes

Time window in minutes (for waf_hit_by_ip)

integer
5
cooldown

Cooldown period in seconds

integer
30
mode

Threshold enforcement mode

string
default: disabled
Allowed values: disabled report block
value

Header name (for header type)

string
nullable
notify_slack

Slack webhook for this threshold

string
nullable
id

Crawler ID

integer
456
name

Crawler name

string
Test Crawler
project_id

Project ID

integer
789
uuid

Crawler UUID

string
550e8400-e29b-41d4-a716-446655440000
config

Crawler configuration (YAML)

string
domain: test-domain.com\nconfig:\n max_html: 100\n browser_mode: false
domain

Crawler domain

string
test-domain.com
domain_verified

Domain verification status

integer
1
urls_list

URLs list (YAML)

string
single_url:\n - /\n - /about\n - /contact
created_at

Creation timestamp

string format: date-time
2024-01-20T09:15:00Z
updated_at

Last update timestamp

string format: date-time
2024-10-11T16:45:00Z
deleted_at

Deletion timestamp

string format: date-time

400

The server could not understand the request due to invalid syntax.

object
message
required

Error message

string
The requested resource was not found
error
required

Error flag

boolean
true

403

Access is forbidden.

object
message
required

Error message

string
The requested resource was not found
error
required

Error flag

boolean
true