Skip to content

Smart Crawling (Advanced)

Smart Crawling is an advanced mode that runs in the browser, so it can see and interact with the page just like a user. This helps WebSync capture content from modern sites where simple link-following misses important content or grabs too much navigation noise.

What it can do:

  • Read the page DOM and extract the main content instead of menus and sidebars.
  • Click buttons like “Load more” to reveal additional items.
  • Scroll to trigger lazy-loaded content.
  • Navigate pagination and multi-step layouts.

A crawl spec is a small set of rules that tell Smart Crawling how to handle a specific site. It focuses on:

  • Which links to follow.
  • Where the main content lives on the page.
  • Which UI interactions are needed to reveal all content.

You do not need to configure crawl specs yourself. WebSync applies them automatically when available.

Examples of sites with built-in crawl specs

Section titled “Examples of sites with built-in crawl specs”

We maintain crawl specs for popular patterns and platforms, including:

  • Documentation sites (docs-style navigation and deep link trees).
  • Blog platforms (post lists, categories, pagination).
  • Knowledge bases and help centers (nested topics and sidebars).

If you use a site that fits one of these patterns, Smart Crawling typically produces much cleaner results than a generic crawl.

If you have a site that is not captured well by generic crawling, let us know. We can add or improve a crawl spec for it.

Submit a request here: https://tally.so/r/nGkYRL