This Appendix discusses the Web page keyword searching that is available from the "Search" feature at the upper-right of many Ohio University pages, including the Front Door. There are three reasons for including this discussion here:
The way we build the index has implications for your work as a pagemaster. In order for your pages to be effectively indexed, so that people who are not familiar with the Ohio University Web site can find them more readily, you need to follow the four technical content standards numbered 12 through 15 in Chapter I.
Individual pagemasters can use the same search engine to provide searches that are restricted to their own Web pages, as discussed below.
"Inquiring minds want to know." - This is, after all, an educational institution, so if you are "terminally curious," read on.
With keyword searching for Web pages, we are asking the system to do a lot of work behind the scenes in order to generate a few KB of text for display. Ordinary HTML pages are simply a collection of data bytes that the server has to send down the wire to the browser for display. On the other hand, keyword searching means running a query program on the server, which has to search through a large database to find the pages the viewer is interested in. Searching requires a lot of disk I/O to build the indexes ahead of time, as well as for the search, and then both system CPU cycles and disk I/O are needed to calculate the score for each "hit" page and sort the hits by rank order.
The process is both machine- and labor-intensive, so for the sake of efficiency, we will not create separate indexes for every portion of the Web. Instead, the query software permits restricting the displayed results to those that are within a limited realm (for example, limiting the results to any pages whose URL starts out "http://www.ohio.edu/technology/"). The next section describes how you can use the new software to build a realm-limited search into your pages.
The old ThunderStone Search Appliance has been permanently disconnected from the network, as of November 3, 2008.
We have replaced it with the Google Search Appliance ("GSA"), whose use is described here. Updating your existing custom search to use the GSA instead of the ThunderStone machine requires replacement of the FORM and INPUT fields in your old HTML code with those documented below.
If the only search you have on your pages is a copy of the one used in the upper-right corner of the Front Door (which searches the entire Ohio University web presence using a FORM tag with "action="http://www.ohio.edu/progtools/searchRoute.cfm"), then you will not need to make any changes at all: we have updated that code already. It is the custom searches, as described below, that require updating to use the GSA.
The custom search method described here is intended to permit any Ohio University pagemaster to include, on any of his or her pages, a search option that will return quickly only those hits that are part of that subsite. You can control the pages that are searched in two ways: specifying the "collection" to be used (i.e., which of the specific databases that the search engine maintains should be used for the search), and specifying the "realm" to be reported on (i.e., the initial parts of the URL that all of your subsite's pages have in common).
At this time there are two collections available:
technology includes all pages in the groups listed below, most of which are maintained by central information technologists. This collection will be expanded to include subsites maintained by distributed information technologists, as those subsites are brought to our attention. The starting points for this profile currently include:
http://www.ohio.edu/policy/ [but including only the IT policies]
default_collection includes all indexed Ohio University Web servers.
Therefore, for example, this method can be used to search only those pages whose URLs start with "http://www.ohio.edu/perspectives/", (using the "default" collection) but it cannot be used to create a combined search of all pages whose URLs start with either "http://www.ohio.edu/perspectives/" or "http://www.ohio.edu/researchnews/", because there is no combination of one collection and one realm that will include all of those pages and no others. If you need to create such a complex search, please contact the Office of Information Technology, at 593-1222, or by E-mail to email@example.com, in order to determine whether an existing collection will work, or whether the search engine would have to be re-configured to create a new collection.
There are several steps to building your own custom search:
Identify a "collection" that includes all of your pages (choosing the collection that includes the fewest other pages will speed your search slightly), and identify the "realm" you will specify to restrict the search to only your pages. Typically this will be the full URL up to the point where the pages vary. For example, "http://www.ohio.edu/pagemasters" would specify a realm that includes all of the Pagemasters Toolbox pages.
Including a terminal slash restricts the search results to include only pages at that level, not in any sub-subsites. For example, specifying a realm of "http://www.ohio.edu/pagemasters/" would exclude http://www.ohio.edu/pagemasters/memo85/append4.html, which would have been included without the terminal slash.
Use your mouse to select the HTML code displayed here, and copy it (the bold highlighting is intended to ease identification of those parts of the code when completing steps 4, 5, and 6, below):
<form method="get" action="http://google.ohio.edu/search">
<input type="hidden" name="sort" value="date:D:L:d1">
<input type="hidden" name="entqr" value="0">
<input type="hidden" name="ud" value="1">
<input type="hidden" name="client" value="ou_front">
<input type="hidden" name="output" value="xml_no_dtd">
<input type="hidden" name="proxystylesheet" value="ou_front">
<input type="hidden" name="ie" value="UTF-8">
<input type="hidden" name="oe" value="UTF-8">
<input type="hidden" name="as_dt" value="i">
<input type="hidden" name="site" value="default_collection">
<input type="hidden" name="as_sitesearch"
Click here and type to enter your search keywords:
<input size=25 name="q" value="" maxlength="255"> <input type=submit name="btnG" value="Search">
Go to your page editor, open the page you want to add the search into, view the HTML code if that isn't the default, position the insertion point appropriately, and paste.
Special instructions apply for CommonSpot users.
There need be no particular relationship between the location of the page where you have placed the search form and the pages that are being searched. The code determines, through the combination of the "site" and "as_sitesearch" values, which pages will be searched. Of course, if the search does not cover the subsite that the page is part of, then you should have nearby visible text that describes what pages will be searched.
Find the hidden "site" input tag in the HTML you just pasted into your file, and if necessary change the value from "default_collection" to the appropriate collection for your pages, as you decided in step 1 (the part to change is displayed in bold type, above).
Find the hidden "as_sitesearch" input tag in the HTML you just pasted into your file. If the collection you have specified includes no other pages than the ones you want to search, remove that entire tag. If the collection you have specified does include other pages, change the value from "http://www.ohio.edu/pagemasters" to the appropriate realm for your pages, as you decided in step 1 (the part to change is displayed in bold type, above). Be sure to include or exclude the terminal slash as appropriate for the results you want, according to the commentary in the second paragraph of step 1.
Revise the prompt text as appropriate (the part to change is displayed in bold type, above). In some situations a much more terse prompt would be appropriate.
Save the modified page and test the search.
The HTML given in step 3, above, will produce the results on the demo page (go there and try the search, now, to see what the results page is like).