With Scripted Index you can write, update, and maintain incremental indexing options without the need to log in. The search robot reads instructions from a text file that is hosted on your server.
To use Scripted Index, you use the Scripted Incremental Index Configuration page to specify the URL to a script file (a plain text file) that is located on your server. For example, https://www.mysite.com/indexlist.txt
. As your site changes, you can add command blocks to the text file either manually or automatically (with a script triggered by the arrival of information from a news feed, stock ticker, or other altered file).
When the scripted incremental index begins, the search robot reads the text file and runs the new commands that are found in that file. By default, the search robot processes only the new commands, which are determined by the file date. Unless you check Clear Date at the time you configure Scripted Index, the search robot “remembers” the date-specifier of the most recently processed block.
The script file that you specify in the URL is a plain text file that is located on your server. You can use carriage returns, line feeds, or both for the end-of-line sequence. A blank line contains zero or more white space characters followed by an end-of-line sequence. All commands are case-insensitive.
The text file is organized in blocks that describe the information that the search robot uses when it performs a scripted incremental index.
Blocks are ordered by date, with the oldest blocks at the top of the text file, and the most recent blocks at the bottom. Each block begins with a single line date-command and a date-specifier command, and ends with a blank-line separator as in the following block example (in between are several commands):
A leading zero is required for all ordinal dates lower than the 10th when using the HTTP 1.1 style. For example, November 6th is 06 Nov, not 6 Nov.
Command |
Description |
---|---|
date-command |
The first line of each block starts with one of two date commands:
|
date-specifier |
The date-specifier command typically records either the ordinal date and time (date command) or the time in epoch seconds (seconds command) that the block information was added to the file. For example: A leading zero is required for all ordinal dates lower than the 10th when using the HTTP 1.1 style. For example, November 6th is 06 Nov, not 6 Nov. The search robot "remembers" the date-specifier of the most recently processed block and only indexes information that it considers to be "newer." (Real-time does not matter to the search robot. Instead, the time in relation to other previously processed times is what matters.) After the search robot reads a block with a date-specifier of 10:00 p.m, for example, it does not read any blocks that record times before 10:00 p.m., regardless of when the index operation runs. In a worst-case scenario, you might mistakenly enter the year "2040" instead of "2004" in your date-specifier. In such an instance, the search robot indexes the 2040 block during the next indexing operation and then refuses to read any other blocks of information (unless one post-dates 2040). If this should happen, remove all previously processed blocks from the text file, click Clear Date , and then push it live. |
comment line |
Begin comment lines with the "#" character. Each comment line must be a line of its own; you cannot type end-of-line comments. A comment line is not considered a blank line. It can also appear anywhere in a block, even before a date or seconds command as in the following example: |
action-command |
Each text block can contain as many action commands as you want. The following action-command options correspond to those for standard incremental indexing:
|
See also About URL Masks.
In the following script file example, the search robot processes the blocks provided that the date-specifiers post-date the date-specifier of the most recently processed block. If that is the case, then the following indexing operations occur:
Deletes y2k-problems.html
from the index.
Adds no-y2k-problems.html
to the search index and does not follow any of the links for no-y2k-problems.html
.
While crawling, exclude URLs that match housewares.htm
and lightfixtures.htm
l from the search index.
Include all other directories and documents under www.mydomain.com
.
Update all documents within the products
and information
directories, crawling and indexing all subsidiary links that have changed since the last indexing operation.
While crawling, exclude URLs in the archive
section of the website if they are dated on or before January 1, 1999.
Exclude URLs that match housewares.html
and lightfixtures.html
from the search index.
Index files in the help
directory, but do not crawl or index any links from those files.
Crawl and index any other files encountered for www.mydomain.com
.
# Start of file.
# Added by John Smith
date Sat, 01 Jan 2004 16:05:53 PST
exclude https://www.mydomain.com/housewares.html
exclude https://www.mydomain.com/lightfixtures.html
include https://www.mydomain.com/
delete https://www.mydomain.com/y2k-problems.html
add https://www.mydomain.com/no-y2k-problems.html nofollow
date Sun, 02 Jan 2004 20:19:08 PST
# Added by the wire service updater
exclude-date 1999-01-01 https://www.mydomain.com/archive server-date
exclude https://www.mydomain.com/housewares.html
exclude https://www.mydomain.com/lightfixtures.html
include https://www.mydomain.com/help/ nofollow
include https://www.mydomain.com/
# no add files, just update existing files
# update all files in the "products" directory
update https://www.mydomain.com/products/
# update all files in the "information" directory
update regexp ^https://www\.mydomain\.com/information/.*$
# End of file.
You can specify a script that you have created that writes, updates, and maintains an incremental index, without the need to log in. The search robot reads instructions from the text file that is hosted on your server to perform the incremental index.
To configure a scripted incremental index
On the product menu, click Index > Scripted Index > Configuration.
On the Scripted Incremental Index Configuration page, in the Script File URL, enter the URL to the text file script that is located on your server.
See About Scripted Index.
(Optional) Check Clear Date if you do not want the search robot to “remember” the date-specifier of the most recently processed block.
By default, the search robot processes only new blocks of commands that are found in the text file, which is determined by the file’s date. If you do not want the default, check Clear Date.
Click Save Changes.
(Optional) Do one of the following:
Click History to revert any changes that you have made.
Click Live.
Click Push Live.
You can schedule scripted incremental indexing to occur at regular intervals throughout the day.
The base time that you select is local according to the time zone that is configured in Account Settings.
See Configuring your account settings.
Web servers are often scheduled to go down for maintenance in the middle of the night. If your server is down during a scheduled index time, the indexing process will fail. Be sure that you select a time of day when your web server is available.
The index schedule only applies to your live index; you cannot schedule staged incremental indexes.
To set the scripted incremental index schedule for a live website
You can use Scripted Incremental Index to index “pieces” of your live or staged website, such as a collection of frequently changed pages, all without the need to log in.
To use this feature, be sure that you have configure a scripted incremental index text file.
See Configuring a scripted incremental index.
To run a scripted incremental index of a live or staged website
On the product menu, do one of the following:
Click Scripted Index Now.
(Optional) If indexing errors occurred, click View Errors to view the associated log.
When a live full scripted index or a staged full scripted index is complete, you can view its associated log to troubleshoot any errors that occurred.
You cannot export logs, nor save them. However, the log remains available for viewing until the new index occurs.
To view the incremental index log of a live or staged website
On the product menu, do one of the following:
Click Index > Scripted Index > Live Log.
Click Index > Scripted Index > Staged Log.
On the log page, at the top or bottom, do any of the following:
Use the navigation options First, Prev, Next, Last, or Go to line to move through the log.
Use the display options Errors only, Wrap line, or Show to refine what you see.