Leave That Thing Alone Blog

Viewing By Entry / Main
16 Aug. 2005

Verity K2 Spider allows a searchable collection to be created from dynamic webpages. The Verity spider actually crawls your website unlike the basic Verity indexing. This means you can index and search dynamic database driven webpages along with static webpages. I had a hard time finding all the CF7 spider info I needed in one spot, so here is an example of how to set up a Verity Spider Collection in ColdFusion7.

(I have created an spider search example page that has indexed a small number of dynamic webpages and you can download example vspider code and ColdFusion cfsearch code.)

Before you start you need to visit this webpage http://www.macromedia.com/go/50f419a and download the Verity Spider styles. Verity Spider will not work without that style folder added. For this demo just download the files and place them in the folder, you do not need to follow their entire example although it is very close to this example.

Once you have added that style folder it’s time to create a collection. The vspider utility will create the spider collection. vspider is located in Windows here: [ColdFusion7Root]\verity\k2\_nti40\bin\

The vspider must be configured and run in command line or in windows with a BAT file. Here is an example of a BAT file that will spider an entire localhost website. *Note this must be on ONE LINE

C:\CFusionMX7\verity\k2\_nti40\bin\vspider -style C:\CFusionMX7\verity\Data\stylesets\ColdFusionVspider\ -collection C:\CFusionMX7\verity\collections\recycle -start http://localhost/ -cgiok -abspath -reparse -indmimeinclude text/* -indmimeexclude text/css

Options explanation:
  -style: required location of verity spider K2 style directory
  -collection: location where collection will be created
  -start: where to start
  -cgiok: index CGI type webpages ie: index.cfm?event=this&other=that
  -indmimeinclude text/* : index text based files only
  -indmimeexclude text/css: do not index CSS files
This is a small list of options available, for a complete list of vspider options visit: http://livedocs.macromedia.com/coldfusion/7/htmldocs/00001780.htm

If you want to search a subdirectory and not the entire website then modify the -start option and add the -include line:

-start http://localhost/yourDIR/ -include /yourDIR/*

You can also reference an ini file with settings if your command line is too long. In example below the ini file would contain all the other options.

vspider -style C:\CFusionMX7\verity\Data\stylesets\ColdFusionVspider\ -collection C:\CFusionMX7\verity\collections\recycle -cmdfile vertiyconfig.ini

When run the vspider will index all the pages based on the options you give it. It will bring up a window that looks like this:
view of vspider running

Once the vspider has run there should be a directory full of folders and files in the –collection directory assigned in the vspider command.

Now we need to index the collection so the cfsearch tag can be used. At this point the collection exists but ColdFusion cannot search it. So open the CF admin and go to the “Verity Collections” page. Add a new englishx collection, in the example below the collection name is “recycle”. Note that the Path does not include the collection name itself, that is added from the collection Name.
creating collection in cfadmin

Now click “create collection” to index that collection. When it is complete you should see the number of documents in the collection. That’s it the collection is indexed and ready for the cfsearch tag. Once indexed the collection should look like this in the CF admin:
created collection

Now it’s time to create a search page using the cfsearch tag. In ColdFusion7 it’s now possible to add spelling suggestion and to highlight matched words, so lets do that too. Here is a very basic ColdFusion search piece of code to search a collection called “recycle”. It will make a spelling suggestion if there is only 1 result and it will return any spelling suggestions in a variable called “info”.

<cfparam name="form.searchtext" default="cans" />

<cfsearch    collection="recycle"
      criteria="#form.searchtext#"
      name="searchResults"
      suggestions="1"
      contexthighlightbegin="<span class=""hilite"">"

      contexthighlightend="</span>"
      ContextPassages="1"
      ContextBytes="400"
      status="info"
       />


<cfif info.found LTE 1 AND isDefined('info.SuggestedQuery')>
   <!--- spelling suggestion --->
Did you mean: <cfoutput>#info.SuggestedQuery#</cfoutput>?
</cfif>

<cfoutput query="searchResults">
   <h4>#searchResults.currentrow#</h4>
   Title: #title#<br />
   URL: #URL#<br />
   Score: #score#<br />
   Summary: #context#<br />
</cfoutput>

Above is a very basic example. Here is a live example of a verity spider search page.

Download code from live example.

This is a very basic example, but the above vspider command should get you up and running with a K2 collection. Once you create a collection you will find you will need to tweak the options for your needs. If you have questions please leave a comment or email me.

Comments

There has been an article added to macromedia's site about vspider: http://www.macromedia.com/devnet/coldfusion/articles/vspider.html


Can anyone tell me if VSpider has the capacity to "exclude" certain folders in a site when indexing? I have read and read, and I can't find out anywhere. I have a friend who says that it can, but cannot find that capability addressed anywhere.

That is all I want to do, for now. I've created a regular Verity Search ...have created a collection, indexed that collection and used CFSearch to search the collection. My problem is that folders and files are being delivered in the search results that I don't want delivered.

Will VSpider help this? Can anyone help me? Please?

thanks, WCW


Thanks for your help, so far. I have found the Vspider.exe but when run by double clicking or Start/Run and navigating to it, it comes up and disappears so quickly that I cannot access it. Is there something I'm missing? Sorry to be so dumb. How am I supposed to be able to configure it, if it pops open and disappears. Am so sorry to be at so basic a level here.

thanks, WCW


I am feeling like I'm in the twilight zone. I am trying to create a new collection, and now, typing the exact same thing.....I think.. that I did the last time into the command line, I cannot get VSpider to create a collection. When I attempt to index what might have been run.. in CF Administrator, it lists 0 files indexed. When VSpider runs it begins, then pauses, and begins the run, but disappears much too quickly. I am begining to wonder whether it really worked last time.

When I did a regular Verity collection, it using cfindex, it indexed over 10,000 files, and the VSpider I did a couple of days ago only indexed 685 files. Grant it that I don't get so much garbage in the search.

What Can I be doing wrong. It worked just yesterday.

Also, I have a collection that's been indexed, and I've cleaned up my Website. Can I use CF Administrator to "purge" the collection and re-index it?

So frustrated, and today's my deadline.

Here's what I'm writing in the command line. Have tried writing a .bat file AND typing by hand into the command line. And it's all in one line in notepad and I've saved the file in the C:\CFusionMX7\verity\k2\_nti40\bin\vspider directory. Have tried 127.0.0.1 and localhost.

C:\CFusionMX7\verity\k2\_nti40\bin\vspider -style C:\CFusionMX7\verity\Data\stylesets\ColdFusionVspider\ -collection C:\CFusionMX7\verity\collections\wusfsite6 -start http://127.0.0.1/ -cgiok -abspath -reparse -indmimeinclude text/* -indmimeexclude text/css

Help? WCW


I redirected the default web site in IIS to a named site I had set up. This caused vspider to fail as "localhost" was getting redirected to "MyNameHost" ... and even though it was all on the same server, vspider baulked at the idea! this wasted half a day figuring out so thought I would post in case someone else comes across this. Every time I indexed, it came back with "0" docs and a skipping due to host error.