Overview
Scraping a website is the easiest way to load a bunch of information into an agent for it to reference. However, it's also the easiest way to feed too much information into your agent. Websites typically contain a lot of text, and despite what you may have heard, more doesn't always mean better when it comes to knowledge bases for agents. When possible, use document uploads or text inputs instead of web scraping.
β
Websites are re-scraped every 24 hours so any changes made to your website will reflect in your Agent within a day.
Initiate Web Scrape - URL
Before putting a URL in, decide what information from your website that you want to be available in your Knowledge Document. I know your gut reaction is going to be "All of it!" but this is likely a bad idea.
Your webpage has a ton of information and a lot of it just isn't useful in a document that is designed for product/service descriptions and Frequently Asked Questions. You probably have funnels on your page and lots of buttons that say "Book Now" or "Schedule a Call." These words can add information to your prompt that can cause some serious issues, like your bookings failing.
You can scrape certain pages by adding their direct URL into this field so you can cherry-pick the information that you want available.
Add your website's Product or FAQ URL and click Analyze
We will show the favicon found so that you can ensure the site information was processed correctly.
Thorough, Standard, and Quick Detail Levels
Select how much "detail" you would like to like to have in the scraped site. Detail in this case does not mean "accuracy of the information." It instead is more like "Scope" by selecting these options, you are effecting the breadth and depth parameters or how many pages you want the scrape to cover. You can also choose to select a customized set of pages to pull into the scrape (this can be done after the scrape as well as you test to improve accuracy):
File Status
After you start a web scrape, you will watch the file go through 3 statuses (uploaded files only go through Processing and Live):
Queued for Scraping (CloseBot will soon scrape the site)
Processing (CloseBot is indexing the file to be used by AI)
Live (CloseBot is ready to use this in conversations on the attached sources)
Connect to your Source
Just like other documents, make sure after you initiate the web scrape, you attach it to the correct source. The Knowledge Doc must be attached for a source to be able to access the information within the document. After starting a scrape you can do this right away:
Or you can do it to a scrape that is already live:
Advanced
Deduplication
We also "deduplicate" when we are pulling links for a scrape which means if we have already scraped a unique page, we will skip that link when determining which links we will scrape so we offer unique pages for each layer.
β
Document Uploads/Limits
We have a hard limit of 6MB per document uploaded. This is a system limitation and is unlikely to ever get higher. We will also stop the scrape at the point that the information hits 6MB.
Along with this being a system limitation, it will also prevent someone from running up your or their storage costs by trying to upload a video into the Knowledge Library.
β
Summary
Scraping a website is very easy and it will automatically re-scrape the website each day to make sure the data is up-to-date, but that is really where the benefits end. Using upload or Create Text File to add to your knowledge documents is much more controllable and will have better results than using the web scraper.








