Configuration & Settings
These settings control how WebSync discovers pages, extracts content, and sends sources to NotebookLM.
Include and exclude filters (regex)
Section titled “Include and exclude filters (regex)”Filters are regular expressions that decide which URLs are allowed or blocked during a crawl.
Use them to:
- Limit crawling to specific paths (for example, only
/docs/). - Exclude noise like
/tag/,/search/, or/category/.
Example patterns:
- Include only docs:
https?://example.com/docs/.* - Exclude tags and search:
https?://example.com/(tag|search)/.*
Crawl limits
Section titled “Crawl limits”Max depth
Section titled “Max depth”The maximum link depth (levels of recursion) from the start page.
Examples:
0= only the current page.1= the current page plus direct links.
Max pages to crawl
Section titled “Max pages to crawl”The maximum number of pages WebSync will discover and fetch during a crawl. This controls crawl size and runtime.
Max sources to import
Section titled “Max sources to import”The maximum number of sources sent to NotebookLM. This can be lower than pages crawled if you are only importing a subset or merging short pages.
Parsing methods
Section titled “Parsing methods”Choose how WebSync extracts content from HTML:
Raw— send the HTML as-is.parse5— extract text nodes from HTML using the parse5 library.node-html-markdown— convert HTML to Markdown.defuddle(default) — extract structured content using defuddle.
Posting methods
Section titled “Posting methods”Choose how sources are sent to NotebookLM:
Link— send the URL only (NotebookLM attempts extraction).Content— send extracted content (best for pages requiring JS rendering).Auto— WebSync chooses the best method per page.
Page merging and deduplication
Section titled “Page merging and deduplication”- Merging: very short pages are concatenated into a single source to reduce noise.
- Deduplication: not supported.
Source naming
Section titled “Source naming”Source name template
Section titled “Source name template”You can use these variables in the template:
{title}{url}{timestamp:format?}{date:format?}
Use || for fallbacks, for example: { title || url }.
Optional formats use date-fns format strings, for example: MM/dd/yyyy.
Import flow controls
Section titled “Import flow controls”Automatically start importing after crawl
Section titled “Automatically start importing after crawl”When enabled, WebSync skips the audit step and immediately sends all sources to NotebookLM. This is the default behavior.
Recommended defaults
Section titled “Recommended defaults”- Max depth:
3 - Max pages to crawl:
1000 - Max sources to import:
300(adjust based on your NotebookLM plan) - Parsing method:
defuddle - Posting method:
auto - Auto-start import after crawl:
on