US20130133046A1
2013-05-23
13/727,430
2012-12-26
The embodiments described herein generally relate to a method and system for enabling a client to configure and control the crawling function available through a crawl configuration Web service. A client is able to configure and control the crawling function by defining the URL space of the crawl. Such space may be defined by configuring the starting point(s) and other properties of the crawl. The client further configures the crawling function by creating and configuring a content source and/or a crawl rule. Further, a client defines authentication information applicable to the crawl to enable the discovery and retrieval of electronic documents requiring authentication and/or authorization information for access thereof. A protocol governs the format, structure and syntax (using a Web Services Description Language schema) of messages for communicating to and from the Web crawler through an application programming interface on a server hosting the crawler application.
Get notified when new applications in this technology area are published.
H04L63/08 » CPC main
Network architectures or network communication protocols for network security for supporting authentication of entities communicating through a packet data network
This application is a continuation application of U.S. patent application Ser. No. 12/766,703, Filed on Apr. 23, 2010, and entitled âSEARCH SERVICE ADMINISTRATION WEB SERVICE PROTOCOL,â which claims the benefit of U.S. Provisional Application Ser. No. 61/285,931, filed on Dec. 11, 2009, and entitled, âSEARCH SERVICE ADMINISTRATION WEB SERVICE PROTOCOL.â The entireties of the aforementioned applications are incorporated herein by reference.
A portion of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright.COPYRGT. 2010, Microsoft Corp.
Computer users access networks, such as the Internet, on a frequent basis for the discovery and retrieval of information and data. For example, particular types of content and data are searched for and accessed on the World Wide Web (âthe Webâ), in which the Web is a system on the Internet of interlinked electronic documents. The Web provides access to a multitude of Web sites, in which a Web site is a collection of related Web pages or other digital resources associated with a common Uniform Resource Locator (URL). Each Web site typically has multiple files and related resources that are held by a Web server and may be accessed through a network, including the Internet and/or a local area network (LAN). A Web server stores and distributes electronic documents associated with the particular Web site hosted by the Web server. These electronic documents may also include embedded hyperlinks, or other links, that reference other electronic documents, data, Web pages, Web sites, etc. Electronic documents are distributed in a format, such as Hypertext Markup Language (HTML), for example.
With the plethora of data available on the Web, computer applications have been developed to âcrawlâ the multitude of Web documents stored on the numerous Web servers connected to the Web to search for particular documents and/or data for retrieval. A âcrawlâ process thus includes traversing the URL space, in which links in the electronic documents are discovered and followed as well. Given the vast amounts of data available on the Web, such âWeb crawlingâ may be nearly boundless and time-consuming. As a result, an exorbitant number of documents may be retrieved, causing network bandwidth to be consumed unnecessarily while hampering resource efficiency. Further, if the number of documents retrieved with regard to a particular search is particularly large, a user may not have the time or resources to carefully filter through such documents to find meaningful information. Further yet, some data may not be retrieved at all if authentication or authorization requirements at particular Web sites prevent the crawling of related Web documents. Consequently, electronic documents with valuable information to the user may be missed altogether.
Although specific problems have been addressed in this Background, this disclosure is not intended in any way to be limited to solving those specific problems.
Embodiments generally relate to enabling a client, such as a client computer, to configure and control the Web crawling function provided by a crawling application of a search service application, in which such crawling application is referred to herein generally as a âWeb search service.â For example, in embodiments, a client configures and controls the crawling function of the index server provided by the Search Services of MICROSOFT OFFICE SHAREPOINT SERVER 2007 produced by MICROSOFT CORPORATION of Redmond, Wash. An index server is a server having the task of crawling, among other tasks. Configuring and controlling the Web crawling function allows a client to define the space crawled by the index server, such as by defining the Uniform Resource Locator (URL) space of the crawl. The URL space is defined, for example, by configuring the starting point(s) and restriction rule(s), or crawl rule(s), for the crawl. In particular embodiments, a content source is defined, in which a content source specifies the type of content to be crawled and the start addresses, e.g., URL addresses, for the content. A crawl rule is defined in embodiments as a set of preferences applicable to a specific URL or range of URLs and is used to include or exclude items in the crawl and/or to specify the content access account to use when crawling the URL or range of URLs. The configuration and control of the Web crawling function is also accomplished in embodiments by allowing the client to define authentication information, e.g., credentials, for use during a crawl to allow access to certain electronic documents, for example, or other data. Configuring a crawl also enables the client to control such features as when the crawl occurs, the duration of the crawl, etc.
The configuration and control of the Web search service by the client is enabled in embodiments by providing an application programming interface for receiving specific method calls, and providing responses thereto, for invoking the functionality of the Web search service to create specific parameters for the URL spaces, crawl rules, and credential data, for example. The format and procedures governing the transmittal and receipt of data at such application programming interface is provided by a protocol, such as the Search Service Administration Web Service protocol in accordance with embodiments disclosed herein. Inputs are received from a client for such method calls to the Web search service. The Web search service processes such inputs to configure the crawl function and sends a response to the client.
This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in any way as to limit the scope of the claimed subject matter.
Embodiments of the present disclosure may be more readily described by reference to the accompanying drawings in which like numerals refer to like items.
FIG. 1 illustrates an example logical representation of an environment or system for configuring the crawling function of a search service application in accordance with an embodiment of the present disclosure.
FIG. 2 depicts a logical representation of example functional component modules for an index server hosting the search service application depicted in FIG. 1 for configuring the crawling function in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates an object hierarchy for a crawler application of the search service application depicted in FIGS. 1 and 2 in accordance with an embodiment of the present disclosure.
FIG. 4 depicts an example user interface showing data entry fields for allowing a user to configure a crawling function of a crawler application in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates a flow diagram illustrating the operational characteristics of a process for searching for an electronic document(s) by executing a crawler application in accordance with an embodiment of the present disclosure.
FIG. 6 depicts a flow diagram illustrating the operational characteristics of a process for configuring a crawling function of a crawler application in accordance with an embodiment of the present disclosure.
FIG. 7 illustrates a flow diagram illustrating the operational characteristics of a process for creating and configuring a content source in accordance with an embodiment of the present disclosure.
FIG. 8 depicts a flow diagram illustrating the operational characteristics of a process for configuring a crawl rule in accordance with an embodiment of the present disclosure.
FIG. 9 depicts an example computing system upon which embodiments of the present disclosure may be implemented.
This disclosure will now more fully describe example embodiments with reference to the accompanying drawings, in which specific embodiments are shown. Other aspects may, however, be embodied in many different forms and the inclusion of specific embodiments in this disclosure should not be construed as limiting such aspects to the embodiments set forth herein. Rather, the embodiments depicted in the drawings are included to provide a disclosure that is thorough and complete and which fully conveys the intended scope to those skilled in the art. Dashed lines may be used to show optional components or operations.
Embodiments generally relate to enabling a client to configure, and thus control, the crawling function of a search service application server having crawl functionality. A âcrawlâ process involves traversing a URL space, for example, for content in electronic documents associated with the URL space that satisfies search criteria. Links, such as hyperlinks, embedded in the electronic documents are also discovered and followed. However, according to embodiments, a given crawl is configured to prevent it from following links outside of desired boundaries of the URL space. Such control is accomplished by defining restriction rules, or âcrawl rulesâ to restrict the boundaries of the crawl. In other embodiments, a crawl is configured not to restrict its reach but, rather, to allow it to access content by providing it with necessary authentication and/or authorization information applicable to such content.
In accordance with embodiments, the noted configuration and control of the Web search service by a client is enabled by providing an application programming interface (API) on the protocol server hosting the Web search service, or crawler application. The API processes request messages and response messages, such as messages related to configuration changes of the crawler function of the Web search service, so as to facilitate interaction between a âprotocol client,â for example, and the Web search service. In embodiments, the API processes request messages to put such messages in the proper format, structure, and syntax for calling the functionality of the Web search service. Such format, structure, and syntax are governed by a protocol, such as the Search Service Administration Web Service protocol according to embodiments disclosed herein. Specific methods and rules are thus defined, in accordance with the Search Service Administration Web Service protocol, for creating specific parameters for the URL spaces, crawl rules, content sources, and credential data, for example. The Search Service Administration Web Service protocol thus provides the formatting and syntax to use for defining the specific methods and rules for calling the functionality of the Web search service, providing responses thereto, and/or otherwise generally handling communications between the protocol client and a protocol server hosting the Web search service. The Search Service Administration Web Service protocol is offered for example purposes as a type of protocol to be used in accordance with embodiments disclosed herein. Other types and names of protocols providing the same or similar functionality may be used without departing from the spirit and scope of the present disclosure.
In an embodiment, the protocol client is a software module with functionality, for example, for: presenting a user interface (UI) for receiving configuration requests and input and displaying configuration response messages, receiving inputs for a configuration request, formatting the received input data into a request message, transmitting the message to the Web server hosting the Web search service, and receiving and processing any received responses from the Web search service, for example. In accordance with embodiments disclosed herein, the protocol client is stored on the client computer and is executed by the client computer's processor. In another embodiment, the protocol client is downloaded at the client computer through a Web browser, such as MICROSOFT INTERNET EXPLORER produced by MICROSOFT CORPORATION of Redmond, Wash. Various types of browsers can be used in accordance with embodiments disclosed herein. Further, while embodiments disclosed herein relate to crawls of Web sites, for example, the content to be crawled can also include specific systems, such as MICROSOFT SHAREPOINT sites, file systems, and internal and external Web sites.
In an embodiment, the protocol client presents a user interface for allowing a user to enter inputs for configuring a crawl function. For example, in an embodiment, a user indicates that for a given crawl, the crawl rule is âcase sensitive,â meaning that only URLs of matching links (matching uppercase and lowercase letters) are crawled. In an embodiment, the user interface for receiving configuration inputs from a user is obtained through a Web search service Web site that is identified by a URL known by the protocol client. Authentication and authorization for accessing the Web search service Web site is performed through the protocol client, and the use of any number of types of authentication and authorization protocols can be used in accordance with embodiments disclosed herein. In another embodiment, the user interface for receiving configuration inputs from a user is stored in memory on the client computer itself or removable storage means and is retrieved by clicking an icon, box, etc., to launch a Web crawl.
Upon receiving the user inputs, the protocol client formats the message, in accordance with the Search Service Administration Web Service protocol, for example, with the proper format and syntax for communicating to the Web crawler. The protocol client also uses other protocols for formatting the message for transmittal to the Web service. For example, an appropriate messaging protocol and an appropriate transport protocol are used for transmitting the message to the Web service. An example of a messaging protocol for formatting a request for transmittal to the Web search service is the Simple Object Access Protocol (SOAP) messaging protocol. An example of a transport protocol for transmitting the formatted request is the Hypertext Transfer Protocol Secure (HTTPS) protocol. These protocols are offered by way of example only. Any number of types of messaging and transport protocols can be used in embodiments disclosed herein.
In accordance with embodiments disclosed herein, upon receipt of a message requesting configuration of the crawl function, the Web search service processes the message and determines whether the configuration request is allowable. If allowable, the Web search service makes the appropriate configuration to the crawl function and sends a response message, in accordance with the Search Service Administration Web Service protocol, and through the application programming interface, to the protocol client indicating the configuration change, for example. If the configuration request is not allowable (for example, by specifying a restriction rule that already exists or conflicts with another rule), the Web search service sends a response message to the protocol client indicating a fault message or other error or indication that the requested configuration was not made or is not available.
An example logical environment or system 100 for enabling a protocol client to configure and control a crawl function of a crawler application hosted by a protocol server, or Web server, is shown in FIG. 1 in accordance with embodiments disclosed herein. A protocol server 102 hosting a search service application with a crawler application (a âWeb search serviceâ) is connected to network 108 for enabling the crawler application to search the content, e.g., Web sites, held by Web servers 110, 112, and 116. In embodiments, any number of Web servers 110, 112, and 116 can be used, as shown by ellipses 114. Each Web site typically has multiple files and resources held by a Web server, such as Web servers 110, 112, 114, and 116. Web servers 110, 112, 114, and 116 distribute electronic documents associated with the particular Web sites. Storage capabilities for storing such electronic documents are shown by the databases adjacent to, or attached to, Web servers 110, 112, and 116. In embodiments, such storage means are housed within the Web servers. Protocol server 102 hosting the Web search service is also connected to client computer 104 through network 106. Client computer 104 is thus able to send configuration requests for controlling the crawling function of the Web search service through network 106 and is able to receive response messages regarding such configuration requests through network 106 as well. Any type of client computer 104 can be used in accordance with embodiments disclosed herein.
Logical environment 100 is not limited to any particular implementation and instead embodies any computing environment upon which the functionality of the environment described herein may be practiced. Further, networks 106 and 108, although shown as individual single networks may be any types of networks conventionally understood by those of ordinary skill in the art. In accordance with an example embodiment, the network may be the global network (e.g., the Internet or World Wide Web, i.e., âWebâ for short). It may also be a local area network, e.g., intranet, or a wide area network. In accordance with embodiments, communications over networks 106 and 108 occur according to one or more standard packet-based formats, e.g., H.323, IP, Ethernet, and/or ATM.
Further, any type of environment or system can be used in accordance with embodiments of the present disclosure. FIG. 1 is offered as an example only for purposes of understanding the teachings of the embodiments disclosed herein. For example, FIG. 1 shows servers 102, 110, 112, 114, and 116. However, embodiments also cover any type of server, separate servers, server farm, or other message server. Further yet, FIG. 1 shows client computer 104. However, any type of small computer device can be used without departing from the spirit and scope of the embodiments disclosed herein. Indeed, environment or system 100 represents a valid way of practicing embodiments disclosed herein but is in no way intended to limit the scope of the present disclosure. Further, the example network environment 100 may be considered in terms of the specific components described, e.g., protocol server, client computer, etc., or, alternatively, may be considered in terms of the analogous modules corresponding to such units.
Although only one client computer 104 is shown, for example, another embodiment provides for multiple small computer devices to communicate with Web server 102. In an embodiment, each small computer device communicates with the network 106, or, in other embodiments, multiple and separate networks communicate with the small computer devices. In yet another embodiment, each small computer device communicates with a separate network.
While FIG. 1 shows example environment or system 100 for configuring a crawl function on the Web search service, FIG. 2 illustrates example software functional modules 200 corresponding to such computing units for enabling such configuration by a protocol client in accordance with embodiments disclosed herein. Storage means 214 are also depicted in FIG. 2. These storage means and functional modules are offered by way of example only. Numerous types of modules, components, or storage means can be used in accordance with embodiments disclosed herein. At client computer 104, browser 202 retrieves protocol client 204. For example, in an embodiment, protocol client 204 is downloaded at client computer 104 through a Web browser, such as MICROSOFT INTERNET EXPLORER, produced by MICROSOFT CORPORATION of Redmond, Wash. Any type of browser can be used in accordance with embodiments disclosed herein. In other embodiments, the protocol client is already downloaded or otherwise available in non-removable or removable memory associated with client computer 104. Protocol client 204 receives input data from a user in accordance with embodiments. In other embodiments, protocol client 204 receives input data from another computing device(s) and/or computer program(s). Upon receiving input data indicating the configuration desired, protocol client 204 formats such data into a request message for Web search service 208. Configuration request message 216 is transmitted across network 106 to protocol server 206.
Protocol server 206, which may also be referred to as an âindex server,â âWeb server,â or âserverâ in general, hosts Web search service 208. As discussed above, Web search service 208 is a crawler application in embodiments disclosed herein. In further embodiments, the crawler application is part of a general search application. While configuration request message 216 is transmitted to Web search service 208 hosted by protocol server 206, a general search request (not shown) is transmitted to search engine 210 in protocol server 206 in other embodiments disclosed herein. For example, specific search criteria are provided in a search request to protocol server 206, and search engine 210 uses index 212 to determine if any electronic documents in index 212 satisfy the specified search criteria. Electronic documents are cataloged in index 212 during a crawl, in which the protocol server 206, or index server 206, produces data structures including an index catalog (not shown) and metadata index (not shown) with regard to retrieved electronic document(s). In an embodiment, where an index catalog already exists, index server 206 produces entries in the index catalog (not shown) to reflect information regarding the retrieved electronic document(s). The index catalog and metadata index are then used in later search requests to efficiently respond to such search queries. In other embodiments, where electronic documents are not found in index 212, Web search service 208 is invoked to âcrawlâ the Web, for example, in search of electronic documents satisfying the search criteria.
In embodiments where a configuration request message 216 is sent from protocol client 104, API 220 on protocol server 206 acts as the interface between protocol client 204 and protocol server 206. API 220 thus processes the configuration request message 216, determines that is a type of message that is appropriate for the Web search service, e.g., a request to perform a crawl or configuration of a crawl function in accordance with the Search Service Administration Web Service protocol, for example, and, according to embodiments, puts it into a format understandable by the Web search service 208. Upon receipt of configuration request message 216, Web search service 208 determines whether the configuration request is allowable and configures the crawl function if the request is allowable. Web search service 208 uses API 220 to facilitate communication of a response message 218, in accordance with the Search Service Administration Web Service protocol, to protocol client 204 and produces a response message 218 to the configuration request. Configuration response message 218 is then transmitted over network 106 to protocol client 204. In embodiments, such configuration response message 218 includes the configuration information made to the crawl function. In other embodiments, configuration response message 218 includes fault information, indicating, for example, that an authorization has failed or the configuration request could not be otherwise performed.
While FIG. 2 shows Web search service 208, FIG. 3 depicts object hierarchy 300 maintained by protocol server 206 and representing the state of the protocol, such as the Search Administration Web Service protocol for Web search service 208. As discussed above, the Search Service Administration Web Service protocol governs the format, structure, and syntax of messages to communicate to the Web crawler. Properties of the objects shown in object hierarchy 300 affect the behavior of protocol server 206 during crawl processes. The top level of object hierarchy 300 is crawler application 208. As noted, crawler application 208 may also be referred to as Web search service 208 (labeled as such in FIG. 2). In embodiments, one instance of crawler application 208 exists per search service application. An example embodiment disclosed herein provides for the following properties of crawler application 208 (as illustrated in FIG. 3), in accordance with the Search Service Administration Web Service protocol:
| Value | Meaning |
| 0x00000001 | Internal pause, not initiated |
| by the protocol client. | |
| 0x00000002 | Internal pause, not initiated |
| by the protocol client. | |
| 0x00000004 | Paused for back-up/restore. |
| 0x00000008 | Paused for query |
| component initialization. | |
| 0x00000010 | Internal pause, not initiated |
| by the protocol client. | |
In addition, FIG. 3 depicts âContent Sourceâ 304. A content source is a set of options for specifying the type of content to be crawled and the start addresses, e.g., URL addresses, for the content to be indexed. A content source thus includes a plurality of start addresses from which to start a crawl, in accordance with embodiments disclosed herein. In an embodiment, crawler application 208 depicted in FIG. 3 includes content source objects 304. In another embodiment, crawler application 208 includes zero content source objects. The content source objects 304 represent content sources used to start a crawl. The following example properties apply to Content Source 304 in accordance with embodiments disclosed herein:
| Value | Meaning |
| 1 | Normal |
| 2 | High. When picking the next URL to |
| crawl from the crawl queue, the protocol | |
| server, in an embodiment, gives | |
| preferential consideration to URL's | |
| discovered from crawling high priority | |
| content sources over URL's discovered | |
| from crawling normal priority content | |
| sources. | |
| Value | Meaning |
| 0 | Enables specifying settings that control |
| the depth of crawl for a Web site based | |
| on start address server, host hops and | |
| page depth | |
| 1 | Enables specifying settings that control |
| the depth of crawl for a Web site based | |
| on discovering everything under the | |
| hostname for each start address or only | |
| crawling the site collection of each start | |
| address | |
| 2 | Lotus Notes database |
| 3 | File shares |
| 4 | Exchange public folders |
| 5 | Custom |
| 6 | Legacy<2> Business Data Catalog |
| 8 | Custom search connector |
| 9 | Business Data Connectivity (BDC) |
| Value | Meaning | |
| CrawlVirtualServers | The entire Web applications pointed | |
| to by start addresses are crawled. | ||
| CrawlSites | Only the specific sites pointed by the | |
| start addresses are crawled without | ||
| enumerating all sites in the Web | ||
| application. | ||
In further embodiments, crawler application 208 includes an ordered collection of zero or more crawl rule objects 302. Crawl rules define the URL space of the crawl. For example, crawl rules are used to restrict the URL space of the crawl in certain embodiments. When a link is discovered in a crawl, the crawl rule(s) is checked to determine if the item should be included or excluded from the crawl. In embodiments, crawl rules in crawl rule object 302 contain one or more wildcard expressions for determining matches against the URLs of discovered links. In such determinations, all characters in a discovered link are matched exactly against the crawl rule expression, with the exception of the wildcard characters. According to embodiments disclosed herein, the â*â and â?â wildcard characters are allowed in defining crawl rules with wildcards. Where wildcard characters and expressions are used, embodiments provide for determining the crawl behavior of a link according to the first rule found to match the link. In further embodiments, a crawl rule specifies authentication parameters for accessing items matching certain URLs. The following example properties apply to crawl rule object 302 according to embodiments of the present disclosure:
In embodiments disclosed herein, crawler application 208 also includes an anchor content source 308, which represents the status of an anchor crawl. An anchor crawl, according to an embodiment, is the process of adding the text that is included with a hyperlink to a full-text index catalog. The text included with a hyperlink describes the target content of the hyperlink in embodiments. This text is referred to as âanchor text,â for example. Further, a âfull-text index catalogâ is defined in embodiments as a collection of full-text index components and other files organized in a specific directory structure and containing the data needed to perform queries. In turn, a full-text index component is defined in an embodiment as a set of files that contain all of the index keys extracted from a set of items, in which an index key is a key referencing a record in a content index file or a scope index file and consisting of an index key string and a property identifier. Properties of the anchor content source 308 include the following according to embodiments of the present disclosure:
Anchor content source object 308 thus allows crawler application 208 to track the start and end times of an anchor crawl, according to embodiments described herein.
According to the embodiment depicted in FIG. 3, crawler application 208 also includes zero or more crawl mapping objects 306. In an embodiment, a crawl mapping is a mapping of an access URL and a display URL of an item. Protocol server 206 uses the access URL of a crawled item to obtain the item from a content source, including an item repository, for example. Further, protocol server 206 uses the display URL as a URL of the item to store in a metadata index. The display URL is the address of the item according to embodiments. Protocol server 206 returns the display URL of the item to a client, e.g., a user, in response to a search query requesting such item. During a crawl process, each item's access URL and display URL are checked against the crawl mapping objects, which contain a Source property and a Target property. A match occurs if any prefix of the URL covering complete path segments equals the Source property or Target property of the mapping. If more than one mapping matches the URL, the mapping that matches the longest prefix is used. As an example, http://site/pathseg1/pathseg2/file.htm matches http://site, or http://site/pathseg1, or http://site/pathseg1/pathseg2, but does not match http://site/pathse or http://saite/pathseg1/path. If the access URL matches the Source property of the mapping, the matching prefix is replaced by the Target property to construct the display URL, while preserving the suffix of the URL. In embodiments, the crawl mappings collection does not allow mappings with duplicate Source or Target properties:
Returning to FIG. 2, crawler application 208 (Web search service 208) receives configuration request messages from, and transmits configuration response messages (including fault messages) to, protocol client 204. Examples of these configuration messages include the following, which are communicated in operations, such as Web Services Description Language (WSDL) operations for example, according to embodiments of the present disclosure and in accordance with the format and syntax of messages communicated to the Web crawler as governed by the Search Service Administration Web Service protocol:
| WSDL Operation | Description |
| AddAdvancedCrawlRule | This operation is used to create a new |
| crawl rule for the crawler application. It | |
| allows two more parameters to be | |
| specified than the AddCrawlRule | |
| operation. | |
| AddContentSource | This operation is used to create a new |
| content source in the crawler application. | |
| AddCrawlMapping | This operation creates a new crawl |
| mapping for the crawler application. | |
| AddCrawlRule | This operation is used to create a new |
| crawl rule for the crawler application. | |
| AddExtension | This operation is used to add a file |
| extension to the file extensions collection | |
| contained in the crawler application. | |
| CatalogPauseStatus | This operation is used to retrieve the |
| pauseReason property of the crawler | |
| application. | |
| ClearExtensionList | This operation is used to empty the list of |
| file extensions recognized by the index | |
| server. | |
| EditContentSource | This operation is used to edit the content |
| source properties in the crawler | |
| application. | |
| GetConnectorProperty | This operation is used to retrieve a |
| previously stored value from the | |
| propertyBag collection of the crawler | |
| application. | |
| GetContentSources | This operation is used to get information |
| about all the content sources for the | |
| specified project of the crawler | |
| application. | |
| GetContentState | This operation is used to retrieve the |
| states and various properties of the | |
| crawler application. | |
| GetCrawlMappings | This method is used to retrieve all crawl |
| mappings existing in the crawler | |
| application. | |
| GetCrawlRuleList | This operation is used to retrieve the |
| crawler application's list of crawl rules. | |
| GetExtensionList | This operation is used to retrieve the |
| crawler application's list of file | |
| extensions. | |
| GetVersion | This operation is used to retrieve the |
| configuration version of the crawler | |
| application. | |
| IncreaseRegistryVersion | This operation is used increase the |
| registry version of the crawler application | |
| by one. | |
| IncrementVersion | This operation is used to increase the |
| configuration version of the crawler | |
| application by one. | |
| IsAnchorCrawlIdle | This operation is used to check if an |
| anchor crawl in the crawler application is | |
| in progress. | |
| IsCaseSensitiveURL | This operation is used to check if the |
| crawler application treats the specified | |
| URL in a case sensitive manner. | |
| IsCatalogPauseCompleted | This operation is used to check if the |
| action of pausing all crawls on the | |
| crawler application for the specified | |
| reason has been completed. | |
| IsDeleteCrawlInProgress | This operation is used to check if a delete |
| crawl in the crawler application is in | |
| progress. | |
| IsExtensionIncludeList | This operation is used to determine |
| whether the file extensions list in the | |
| crawler application is an inclusion list or | |
| an exclusion list. | |
| ListKnownLotusNotesDatabases | This operation is used to retrieve a list of |
| known Lotus Notes database names for a | |
| given Lotus Notes server name. | |
| PauseCrawl | This operation is used to pause a crawl of |
| a content source of the crawler | |
| application. | |
| RefreshAnchorContentSource | This method is used to retrieve the current |
| status of the anchor content source of the | |
| crawler application. | |
| RefreshContentSource | This operation is used to retrieve the |
| current status of a content source from the | |
| crawler application. | |
| RemoveContentSource | This operation is used to remove a |
| content source from the crawler | |
| application. | |
| RemoveCrawlMapping | This method is used to remove a crawl |
| mapping from the crawler application. | |
| RemoveCrawlRule | This operation removes a crawl rule from |
| the crawler application. | |
| RemoveExtension | This operation is used to remove a file |
| extension from the extensions list defined | |
| for the crawler application. | |
| ResumeCrawl | This operation is used to resume a crawl |
| of a content source of the crawler | |
| application. | |
| SetConnectorProperty | This operation is used to store a value in |
| the propertyBag collection of the crawler | |
| application. | |
| SetContentSourcesMetadata | This operation is used to set the metadata |
| property associated with the crawler | |
| application. This metadata string, in | |
| embodiments, is intended for protocol | |
| client use only, the protocol server just | |
| stores it without interpreting. Once set, | |
| metadata string can be obtained by calling | |
| GetContentSources operation. | |
| SetCrawlRuleCredentials | This operation is used to configure the |
| authentication method and crawl account | |
| for a crawl rule. | |
| SetCrawlRuleCredentials2 | This operation is used to configure the |
| authentication method and crawl account | |
| for a crawl rule. | |
| SetCrawlRulePriority | This operation is used to modify the order |
| of the crawl rules in the ordered | |
| collection of the crawl rules in the crawler | |
| application. | |
| SetDefaultGatheringAccount | This operation is used to set the default |
| crawl account for the crawler application. | |
| SetIsExtensionIncludeList | This operation is used to set whether the |
| list of file extensions in the crawler | |
| application is an inclusion list or an | |
| exclusion list. | |
| SetRetryLimit | This operation is used to set the retry limit |
| for the crawler application. | |
| StartCrawl | This operation is used to start a crawl of a |
| content source of the crawler application. | |
| StartRankingUpdate | This operation is used to start the anchor |
| crawl of the anchor content source. | |
| StopCrawl | This operation is used to stop a crawl of a |
| content source of the crawler application. | |
| TestCrawlRule | This operation is used to check if a |
| specified URL matches the specified | |
| crawl rule. | |
| TestCrawlRules | This operation is used to find the first |
| crawl rule in the crawler application's | |
| crawl rules collection that matches a | |
| specified URL | |
| UpdateCrawlRule | This operation is used to update a crawl |
| rule for the crawler application. | |
| ValidateScheduleTrigger | This operation is used to validate that a |
| trigger, as specified in [MS-TSCH], | |
| 2.4.2.11|Triggers, can be used to schedule | |
| a crawl. [MS-TSCH] (âTask Scheduler | |
| Service Remoting Protocol | |
| Specification,â MICROSOFT | |
| CORPORATION of Redmond, | |
| Washington, Šâ2010 MICROSOFT | |
| CORPORATION) is incorporated by | |
| reference herein in its entirety. | |
| WaitForInProgressAnchorCrawlToComplete | This operation is used to wait until no |
| anchor crawl is in progress. | |
While the above messages are included in WSDL operations in embodiments described herein, such is offered by way of example. Other language can be used for such operations in accordance with other embodiments disclosed herein and without departing from the spirit and scope of the present disclosure. âWSDLâ operations are offered by way of example only.
The messages transmitted between protocol client 204 and protocol server 206 thus allow for configuring the crawling function of Web search service 208, in accordance with embodiments described herein. In embodiments, these messages comprise a structure, format and syntax consistent with the Search Service Administration Web Service protocol. For example, the AddAdvancedCrawlRule operation allows protocol client 204 to create a new crawl rule, including specifying parameters for the rule, for crawler application 208. With the AddAdvancedCrawlRule operation, protocol client 204 sends an ISearchApplicationAdminWebService_AddAdvancedCrawlRuleinputMessage request message 216. Protocol server 206 responds with an ISearchApplicationAdminWebService_AddAdvancedCrawlRule_OutputMessage response message 218 as follows:
| <wsdl:operation name=âAddAdvancedCrawlRuleâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddAdvancedCrawlRuleâ |
| message=âtns: |
| ISearchApplicationAdminWebService_AddAdvancedCrawlRule_Input- |
| Messageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddAdvancedCrawlRuleResponseâ |
| message=âtns: |
| ISearchApplicationAdminWebService_AddAdvancedCrawlRule_Output- |
| Messageâ/> |
| </wsdl:operation> |
Before responding, protocol server 206 determines whether the request message is allowable. For example, the path specified, in embodiments, is not a valid regular expression. Or, in other embodiments, the length of the path exceeds a maximum number of characters. The following response message applies in embodiments:
Examples of the request, or input, message for configuring the crawl function with the AddAdvancedCrawlRule operation include:
ISearchApplicationAdminWebService_AddAdvancedCrawlRule_InputMessage
The requested WSDL message for the AddAdvancedCrawlRule WSDL operation.
The SOAP action value is:
The SOAP body contains the AddAdvancedCrawlRule element.
Input data for such a request message includes, in embodiments, the following:
The input data for the AddAdvancedCrawlRule WSDL operation.
| <xs:element name=âAddAdvancedCrawlRuleâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âversionInâ |
| ââââââtype=âxs:intâ/> |
| ââââââ<xs:element minOccurs=â0â name=âcurrentUserâ |
| âââânillable=âtrueâ type=âxs:stringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âisIncludeRuleâ |
| ââââtype=âxs:booleanâ/> |
| ââââââ<xs:element minOccurs=â0â |
| ââââname=âisAdvancedRegularExpressionâ type=âxs:booleanâ/> |
| ââââââ<xs:element minOccurs=â0â name=âcaseSensitiveURâ |
| ââââtype=âxs:booleanâ/> |
| ââââââ<xs:element minOccurs=â0â name=âpathâ nillable=âtrueâ |
| ââââtype=âxs:stringâ/> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| Value | Meaning |
| 0 | Inclusion rule. URLs matching |
| the path are included in the crawl. | |
| 1 | Exclusion rule. URLs matching |
| the path are not included in the crawl. | |
In turn, examples of the response, or output, message for the AddAdvancedCrawlRule operation include:
ISearchApplicationAdminWebService_AddAdvancedCrawlRule_OutputMessage
The response WSDL message for the AddAdvancedCrawlRule method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddAdvancedCrawlRuleResponse
The SOAP body contains the AddAdvancedCrawlRuleResponse element.
The result data for the response message for the AddAdvancedCrawlRuleResponse operation includes, for example:
| <xs:element name=âAddAdvancedCrawlRuleResponseâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| ââââââ<xs:element minOccurs=â0â | |
| ââââname=âAddAdvancedCrawlRuleResultâ nillable=âtrueâ | |
| ââââtype=âxs:stringâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
In another embodiment, the configuration request involves retrieving a previously stored value from a âpropertyBagâ collection of crawler application 208 through the use of the âGetConnectorâ message operation. As shown in FIG. 3, the âpropertyBagâ is a property of crawler application 208 and includes a collection of name/value pairs for storing arbitrary values. The âpropertyBagâ property thus allows a value to be retrieved using the name with which it was stored. An example configuration request message 216 for this operation includes: ISearchApplicationAdminWebService_GetConnectorProperty_InputMessage. An example configuration response message 218 includes: ISearchApplicationAdminWebServiceGetConnectorProperty_OutputMessage. An example input/output operation is as follows:
| <wsdl:operation name=âGetConnectorPropertyâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| GetConnectorPropertyâ |
| message=âtns: |
| ISearchApplicationAdminWebService_GetConnectorProperty_Input- |
| Messageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| GetConnectorPropertyResponseâ |
| message=âtns: |
| ISearchApplicationAdminWebService_GetConnectorProperty_Output- |
| Messageâ/> |
| </wsdl:operation> |
The following response messages, e.g., fault messages, are provided by protocol server 206 in accordance with embodiments disclosed herein:
In embodiments, input data for the GetConnectorProperty operation include, for example:
| <xs:element name=âGetConnectorPropertyâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=ânameâ nillable=âtrueâ |
| ââââtype=âxs:stringâ/> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
Output data for the configuration response message includes, in embodiments, the following for the GetConnectorProperty message operation:
The result data for the GetConnectorProperty WSDL operation.
| <xs:element name=âGetConnectorPropertyResponseâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence> | |
| âââââââââ<xs:element minOccurs=â0â | |
| ââââââname=âGetConnectorPropertyResultâ nillable=âtrueâ | |
| ââââââtype=âxs:stringâ/> | |
| ââââââ</xs:sequence> | |
| âââ</xs:complexType> | |
| </xs:element> | |
Further, the action values for the GetConnectorProperty operation, using the SOAP protocol for example, include in embodiments:
ISearchApplicationAdminWebService_GetConnectorProperty_InputMessage
The requested WSDL message for the GetConnectorProperty WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/GetConnectorProperty
The SOAP body contains the GetConnectorProperty element.
ISearchApplicationAdminWebService_GetConnectorProperty_OutputMessage
The response WSDL message for the GetConnectorProperty method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/GetConnectorPropertyResponse
The SOAP body contains the GetConnectorPropertyResponse element.
In other embodiments, configuration request message 216 includes an IncrementRegistryVersion message for increasing the registry version of crawler application 208 by a value, such as by âone,â for example. Request message 216 includes, for example: ISearchApplicationAdminWebService_IncreaseRegistryVersion_InputMessage. Response message 218 includes, for example: ISearchApplicationAdminWebService_IncreaseRegistryVersion_OutputMessage. An example input/output operation is as follows:
| <wsdl:operation name=âIncreaseRegistryVersionâ> |
| âââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IncreaseRegistryVersionâ |
| message=âtns:ISearchApplicationAdminWebService_IncreaseRegistry- |
| Version_InputMessageâ/> |
| âââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IncreaseRegistryVersionResponseâ |
| message=âtns:ISearchApplicationAdminWebService_IncreaseRegistry- |
| Version_OutputMessageâ/> |
| </wsdl:operation> |
With this type of configuration request, protocol server 206, in embodiments, increases the registryVersion of crawler application 208 by one, for example. If an error exists in incrementing the registryVersion, protocol server 206 sends a FaultException<ExceptionDetail> message to protocol client 204. The following input and output message and input/output data are examples of operations for this configuration type:
ISearchApplicationAdminWebService_IncreaseRegistryVersion_InputMessage
The requested WSDL message for the IncreaseRegistryVersion WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/IncreaseRegistryVersion
The SOAP body contains the IncreaseRegistryVersion element.
The response WSDL message for the IncreaseRegistryVersion method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/IncreaseRegistryVersionResponse
The SOAP body contains the IncreaseRegistryVersionResponse element.
The input data for the IncreaseRegistryVersion WSDL operation.
| <xs:element name=âIncreaseRegistryVersionâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence/> | |
| âââ</xs:complexType> | |
| </xs:element> | |
The result data for the IncreaseRegistryVersion WSDL operation.
| <xs:element name=âIncreaseRegistryVersionResponseâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence/> | |
| âââ</xs:complexType> | |
| </xs:element> | |
Embodiments also provide for the âIncrementVersionâ message operation, in which the configuration version of crawler application 208 is increased by a value, such as by âone,â for example. Request message 216 for such an operation includes, for example: ISearchApplicationAdminWebService_IncrementVersion_InputMessage. Protocol server 206 responds with response message 218: ISearchApplicationAdminWebService_IncrementVersion_OutputMessage. An example input/output operation is as follows:
| âââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IncrementVersionâ |
| message=âtns:ISearchApplicationAdminWebService_Increment- |
| Version_InputMessageâ/> |
| âââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IncrementVersionResponseâ |
| message=âtns:ISearchApplicationAdminWebService_Increment- |
| Version_OutputMessageâ/> |
| </wsdl:operation> |
Fault messages are sent by protocol server 206 in the following situations, for example:
The following input and output messages and input/output data are examples of operations for this IncrementVersion configuration of crawler application 208, in accordance with embodiments herein:
ISearchApplicationAdminWebService_IncrementVersion_InputMessage
The requested WSDL message for the IncrementVersion WSDL operation.
The SOAP action value is:
The SOAP body contains the IncrementVersion element.
ISearchApplicationAdminWebService_IncrementVersion_OutputMessage
The response WSDL message for the IncrementVersion method.
The SOAP action value is:
The SOAP body contains the IncrementVersionResponse element.
The input data for the IncrementVersion WSDL operation.
| <xs:element name=âIncrementVersionâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence> | |
| <xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> | |
| ââââââ</xs:sequence> | |
| âââ</xs:complexType> | |
| </xs:element> | |
The result data for the IncrementVersion WSDL operation.
| <xs:element name=âIncrementVersionResponseâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence> | |
| <xs:element minOccurs=â0â name=âIncrementVersionResultâ | |
| type=âxs:intâ/> | |
| ââââââ</xs:sequence> | |
| âââ</xs:complexType> | |
| </xs:element> | |
In other embodiments, a configuration request is made to store a value in the âpropertyBagâ collection of crawler application 208. For example, the âSetConnectorPropertyâ message operation allows for such storage with a configuration request message 216 of: ISearchApplicationAdminWebService_SetConnectorProperty_InputMessage. In turn, a configuration response message 218, for example, is: ISearchApplicationAdminWebService_SetConnectorProperty_OutputMessage. An example input/output operation is as follows:
| <wsdl:operation name=âSetConnectorPropertyâ> |
| âââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/- |
| SetConnectorPropertyâ |
| message=âtns:ISearchApplicationAdminWebService_SetConnector- |
| Property_InputMessageâ/> |
| âââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| SetConnectorPropertyResponseâ |
| message=âtns:ISearchApplicationAdminWebService_SetConnector- |
| Property_OutputMessageâ/> |
| </wsdl:operation>. |
Protocol server 206 responds with fault messages in embodiments described as follows, for example:
The following input and output message and input/output data are examples of operations for this type of configuring of crawler application 208 by setting connector properties, as described in embodiments herein:
ISearchApplicationAdminWebService_SetConnectorProperty_InputMessage
The requested WSDL message for the SetConnectorProperty WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/SetConnectorProperty
The SOAP body contains the SetConnectorProperty element.
ISearchApplicationAdminWebService_SetConnectorProperty_OutputMessage
The response WSDL message for the SetConnectorProperty method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/SetConnectorPropertyResponse
The SOAP body contains the SetConnectorPropertyResponse element.
The input data for the SetConnectorProperty WSDL operation.
| <xs:element name=âSetConnectorPropertyâ> |
| <xs:complexType> |
| âââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=ânameâ nillable=âtrueâ |
| âââtype=âxs:stringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âvalueâ nillable=âtrueâ |
| âââtype=âxs:stringâ/> |
| âââ</xs:sequence> |
| </xs:complexType> |
| </xs:element> |
name: The name for which to store the corresponding value. In embodiments, the
name is less than or equal to 16369 characters.
value: The value to store. In embodiments, the value is less than or equal to 4000 characters.
The result data for the SetConnectorProperty WSDL operation.
| <xs:element name=âSetConnectorPropertyResponseâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence/> | |
| âââ</xs:complexType> | |
| </xs:element> | |
Further embodiments provide for configuring the crawler function of crawler application 208 by defining credentials for authentication purposes for the crawl rule. For example, the âSetCrawlRuleCredentials2â message operation is used in accordance with embodiments disclosed herein to configure the authentication method and crawl account for a crawl rule. A crawl account is a user account having access to the content traversed by a crawl component, according to embodiments. For such an authentication configuration request message 216, the following request message 216 is used in embodiments: ISearchApplicationAdminWebService_SetCrawlRuleCredentials2_inputMessage. A configuration response message 218 includes the following in embodiments: ISearchApplicationAdminWebService_SetCrawlRuleCredentials2_OutputMessage. An example input/output operation for this configuration request and response is as follows:
| <wsdl:operation name=âSetCrawlRuleCredentials2â> |
| âââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| SetCrawlRuleCredentials2â |
| message=âtns:ISearchApplicationAdminWebService_SetCrawlRule- |
| Credentials2_InputMessageâ/> |
| âââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| SetCrawlRuleCredentials2Responseâ |
| message=âtns:ISearchApplicationAdminWebService_SetCrawlRule- |
| Credentials2_OutputMessageâ/> |
| </wsdl:operation>. |
Further, the following messages are sent and/or actions taken based on the request message received, in accordance with embodiments disclosed herein:
The following input and output message and input/output data are examples of operations for this configuring of crawler application 208 by setting credentials to configure the authentication method and crawl account for a crawl rule, as described in embodiments herein:
The requested WSDL message for the SetCrawlRuleCredentials2 WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/SetCrawlRuleCredentials2
The SOAP body contains the SetCrawlRuleCredentials2 element.
The response WSDL message for the SetCrawlRuleCredentials2 method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/SetCrawlRuleCredentials2Response
The SOAP body contains the SetCrawlRuleCredentials2Response element.
The input data for the SetCrawlRuleCredentials2 WSDL operation.
| <xs:element name=âSetCrawlRuleCredentials2â> |
| âââ<xs:complexType> |
| ââââââ<xs:sequence> |
| âââââââââ<xs:element minOccurs=â0â name=âversionInâ |
| âââââââââtype=âxs:intâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âcurrentUserâ |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âpathâ |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âauthTypeâ |
| âââââââââtype=âxs:intâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âauthString1â |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âauthString2â |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âauthString3â |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âauthString4â |
| âââââânillable=âtrueâ type=âxs:stringâ/> |
| âââââââââ<xs:element minOccurs=â0â name=âlastModifiedâ |
| ââââââtype=âxs:dateTimeâ/> |
| ââââââ</xs:sequence> |
| âââ</xs:complexType> |
| </xs:element> |
| Value | Meaning |
| 0 | Default access |
| 1 | Integrated Windows |
| authentication | |
| 2 | Basic authentication |
| 3 | Authentication using |
| certificates | |
| 4 | Forms authentication |
| 5 | Cookie based |
| authentication | |
The result data for the SetCrawlRuleCredentials2 WSDL operation.
| <xs:element name=âSetCrawlRuleCredentials2Responseâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence> | |
| âââ<xs:element minOccurs=â0â | |
| name=âSetCrawlRuleCredentials2Resultâ type=âxs:intâ/> | |
| ââââââ</xs:sequence> | |
| âââ</xs:complexType> | |
| </xs:element> | |
A further message operation in embodiments includes the âIsCaseSensitiveURLâ operation for checking if the crawler application treats the URL specified for the crawl in a case-sensitive manner. Configuration request message 216 includes the following example input message: ISearchApplicationAdminWebService_IsCaseSensitiveURL_InputMessage. The following configuration response message 218 applies in an embodiment: ISearchApplicationAdminWebService_IsCaseSensitiveURL_OutputMessage. Protocol server 206 returns the following response messages, for example, depending on the success of the particular configuration request for case-sensitivity:
The following input and output message and input/output data are examples of operations for this case-sensitivity configuring of crawler application 208, as described in embodiments herein:
| <wsdl:operation name=âIsCaseSensitiveURLâ> |
| âââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IsCaseSensitiveURLâ |
| message=âtns:ISearchApplicationAdminWebService_IsCaseSensitive- |
| URL_InputMessageâ/> |
| âââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| IsCaseSensitiveURLResponseâ |
| message=âtns:ISearchApplicationAdminWebService_IsCaseSensitive- |
| URL_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_IsCaseSensitiveURL_InputMessage
The requested WSDL message for the IsCaseSensitiveURL WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebServic/IsCaseSensitiveURL
The SOAP body contains the IsCaseSensitiveURL element.
ISearchApplicationAdminWebService_IsCaseSensitiveURL_OutputMessage
The response WSDL message for the IsCaseSensitiveURL method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/IsCaseSensitiveURLResponse
The SOAP body contains the IsCaseSensitiveURLResponse element.
The input data for the IsCaseSensitiveURL WSDL operation.
| <xs:element name=âIsCaseSensitiveURLâ> | |
| âââ<xs:complexType> | |
| ââââââ<xs:sequence> | |
| âââââââââ<xs:element minOccurs=â0â name=âstrURLâ | |
| âââââânillable=âtrueâ type=âxs:stringâ/> | |
| ââââââ</xs:sequence> | |
| âââ</xs:complexType> | |
| </xs:element> | |
strURL: A single URL or UNC path. In embodiments, this is present.
The result data for the IsCaseSensitiveURL WSDL operation.
| <xs:element name=âIsCaseSensitiveURLResponseâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| ââââââ<xs:element minOccurs=â0â | |
| ââââname=âIsCaseSensitiveURLResultâ type=âxs:booleanâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
| Value | Meaning | |
| false | Case insensitive manner. | |
| true | Case sensitive manner. | |
In further embodiments, configuration request message 216, âWaitForInProgressAnchorCrawlToComplete,â is used to configure the crawl function of crawler application 208 to wait until no anchor crawl is in progress before proceeding with another crawl. An example input/output operation for âWaitForInProgressAnchorCrawlToCompleteâ is as follows:
| <wsdl:operation name=âWaitForInProgressAnchorCrawlToCompleteâ> |
| â<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/WaitForInProgressAnchorCrawlToCompleteâ |
| message=âtns:ISearchApplicationAdminWebService_WaitForInProgressAnchorCrawlToComplete_InputMessageâ/> |
| â<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/WaitForInProgressAnchorCrawlToCompleteResponseâ |
| message=âtns:ISearchApplicationAdminWebService_WaitForInProgressAnchorCrawlToComplete_OutputMessageâ/> |
| </wsdl:operation> |
The requested WSDL message for the WaitForinProgressAnchorCrawlToComplete WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/WaitForInProgressAnchorCrawlTo
The SOAP body contains the WaitForinProgressAnchorCrawlToComplete element.
The response WSDL message for the WaitForinProgressAnchorCrawlToComplete method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/WaitForInProgressAnchorCrawlTo CompleteResponse
The SOAP body contains the WaitForinProgressAnchorCrawlToCompleteResponse element.
The input data for the WaitForinProgressAnchorCrawlToComplete WSDL operation.
| <xs:element name=âWaitForInProgressAnchorCrawlToCompleteâ> | |
| â<xs:complexType> | |
| ââ<xs:sequence/> | |
| â</xs:complexType> | |
| </xs:element> | |
| <xs:element name= | |
| âWaitForInProgressAnchorCrawlToCompleteResponseâ> | |
| â<xs:complexType> | |
| ââ<xs:sequence> | |
| âââ<xs:element minOccurs=â0â | |
| name=âWaitForInProgressAnchorCrawlToCompleteResultâ | |
| type=âxs:booleanâ/> | |
| ââ</xs:sequence> | |
| â</xs:complexType> | |
| </xs:element> | |
Further, the response message includes the following status information, for example:
| Value | Meaning | |
| false | The operation was not successful. The | |
| protocol server encountered an error | ||
| while waiting for an anchor crawl to complete. | ||
| true | The operation was successful. An anchor | |
| crawl was not in progress or has been | ||
| completed after the protocol server | ||
| received the request message. | ||
Embodiments also provide for determining whether an anchor crawl in crawler application 208 is in progress. Input request message 216 for such a determination includes, for example: ISearchApplicationAdminWebService_IsAnchorCrawlIdle_InputMessage. An example response message includes: ISearchApplicationAdminWebService_IsAnchorCrawlIdle_OutputMessage. An example input/output operation is as follows:
| <wsdl:operation name=âIsAnchorCrawlIdleâ> |
| â<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/IsAnchorCrawlIdleâ |
| message=âtns:ISearchApplicationAdminWebService_IsAnchorCrawlIdle_inputMessageâ/> |
| â<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/IsAnchorCrawlIdleResponseâ |
| message=âtns:ISearchApplicationAdminWebService_IsAnchorCrawlIdle_OutputMessageâ/> |
| </wsdl:operation> |
The requested WSDL message for the IsAnchorCrawlIdle WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/IsAnchorCrawlIdle
The SOAP body contains the IsAnchorCrawlIdle element.
The response WSDL message for the IsAnchorCrawlIdle method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/IsAnchorCrawlIdleResponse
The SOAP body contains the IsAnchorCrawlIdleResponse element.
The input data for the IsAnchorCrawlIdle WSDL operation.
| <xs:element name=âIsAnchorCrawlIdleâ> | |
| â<xs:complexType> | |
| ââ<xs:sequence/> | |
| â</xs:complexType> | |
| </xs:element> | |
| <xs:element name=âIsAnchorCrawlIdleResponseâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| âââ<xs:element minOccurs=â0â name=âIsAnchorCrawlIdleResultâ |
| âââtype=âxs:booleanâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
| Value | Meaning | |
| false | An anchor crawl is in progress. | |
| true | An anchor crawl is not in progress. | |
Further, embodiments also provide for âSetPropertyâ and âGetPropertyâ configuration request operations, in which a particular property of a crawl is set and retrieved. For example, such input and output messages are used in embodiments for such configuration operations:
| <xs:element name=âSetPropertyâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âpropertyNameâ |
| âââânillable=âtrueâ type=âxs:stringâ /> |
| ââââââ<xs:element minOccurs=â0â name=âvalueâ nillable=âtrueâ |
| ââââââtype=âxs:anyTypeâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <xs:element name=âSetPropertyResponseâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âSetPropertyResultâ |
| ââââtype=âxs:booleanâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <xs:element name=âGetPropertyâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âpropertyNameâ |
| âââânillable=âtrueâ type=âxs:stringâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <xs:element name=âGetPropertyResponseâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âGetPropertyResultâ |
| âââââânillable=âtrueâ type=âxs:anyTypeâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
In further embodiments, configuration operations 216 and 218 are used to set and retrieve one or more properties of the content source. For example, âSetContentSourcePropertyâ and âGetContentSourcePropertyâ messages are used to set and retrieve, respectively, properties of the content source for a particular crawl. In embodiments, request and response messages for such configuration operations include:
| </xs:element> |
| <xs:element name=âSetContentSourcePropertyâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âversionInâ |
| ââââtype=âxs:intâ /> |
| ââââââ<xs:element minOccurs=â0â name=âcontentSourceâ |
| ââââtype=âxs:intâ /> |
| ââââââ<xs:element minOccurs=â0â name=âpropertyNameâ |
| âââânillable=âtrueâ type=âxs:stringâ /> |
| ââââââ<xs:element minOccurs=â0â name=âvalueâ nillable=âtrueâ |
| ââââtype=âxs:anyTypeâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| ââââââ<xs:element name=âSetContentSourcePropertyResponseâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| <xs:element |
| xmlns:q22=âhttp://schemas.datacontract.org/2004/07/ |
| Microsoft.Office.Server.Search.Internal.Administrationâ minOccurs= |
| â0â name=âSetContentSourcePropertyResultâ nillable= |
| âtrueâ type=âq22:ContentSourceDynamicPropsInternalâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <xs:element name=âGetContentSourcePropertyâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âversionInâ |
| ââââtype=âxs:intâ /> |
| ââââââ<xs:element minOccurs=â0â name=âcontentSourceâ |
| ââââtype=âxs:intâ /> |
| ââââââ<xs:element minOccurs=â0â name=âpropertyNameâ |
| âââânillable=âtrueâ type=âxs:stringâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <xs:element name=âGetContentSourcePropertyResponseâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| <xs:element minOccurs=â0â name=âGetContentSourcePropertyResultâ |
| nillable=âtrueâ type=âxs:anyTypeâ /> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
Further embodiments include example configuration operations 216 and 218 as follows:
| <wsdl:operation name=âEditContentSourceâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/EditContentSourceâ |
| message=âtns:ISearchApplicationAdminWebService_EditContentSource_InputMessageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/EditContentSourceResponseâ |
| message=âtns:ISearchApplicationAdminWebService_EditContentSource_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_EditContentSource_InputMessage
The requested WSDL message for the EditContentSource WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/EditContentSource
The SOAP body contains the EditContentSource element.
ISearchApplicationAdminWebService_EditContentSource_OutputMessage
The response WSDL message for the EditContentSource method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/EditContentSourceResponse
The SOAP body contains the EditContentSourceResponse element.
The input data for the EditContentSource WSDL operation.
| <xs:element name=âEditContentSourceâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| ââââââ<xs:element minOccurs=â0â name=âversionInâ |
| ââââââtype=âxs:intâ/> |
| ââââââ<xs:element minOccurs=â0â name=âcurrentUserâ |
| âââânillable=âtrueâ type=âxs:stringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âidâ type=âxs:intâ/> |
| ââââââ<xs:element minOccurs=â0â name=ânameâ nillable=âtrueâ |
| ââââtype=âxs:stringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âmetadataâ |
| âââââânillable=âtrueâ |
| ââââtype=âxs:stringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âhostDepthâ |
| ââââtype=âxs:intâ/> |
| ââââââ<xs:element minOccurs=â0â name=âenumerationDepthâ |
| ââââtype=âxs:intâ/> |
| ââââââ<xs:element minOccurs=â0â name=âfollowDirectoriesâ |
| ââââtype=âxs:booleanâ/> |
| ââââââ<xs:element minOccurs=â0â name=âstartAddressesâ |
| âââânillable=âtrueâ |
| ââââxmlns:q16=âhttp://schemas.microsoft.com/2003/10/ |
| ââââSerialization/Arraysâ type=âq16:ArrayOfstringâ/> |
| ââââââ<xs:element minOccurs=â0â name=âfullCrawlTriggerâ |
| âââânillable=âtrueâ type=âxs:base64Binaryâ/> |
| ââââââ<xs:element minOccurs=â0â name=âincCrawlTriggerâ |
| âââânillable=âtrueâ type=âxs:base64Binaryâ/> |
| ââââââ<xs:element minOccurs=â0â name=âcrawlPriorityâ |
| ââââtype=âxs:intâ/> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| Value | Meaning |
| 1 | Normal |
| 2 | High |
| <xs:element name=âEditContentSourceResponseâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| ââââââ<xs:element minOccurs=â0â | |
| ââââname=âEditContentSourceResultâ type=âxs:intâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
| <wsdl:operation name=âAddExtensionâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/AddExtensionâ |
| message=âtns:ISearchApplicationAdminWebService_AddExtension_InputMessageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/AddExtensionResponseâ |
| message=âtns:ISearchApplicationAdminWebService_AddExtension_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_AddExtension_InputMessage
The requested WSDL message for the AddExtension WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddExtension
The SOAP body contains the AddExtension element.
ISearchApplicationAdminWebService_AddExtension_OutputMessage
The response WSDL message for the AddExtension method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddExtensionResponse
The SOAP body contains the AddExtensionResponse element.
The input data for the AddExtension WSDL operation.
| <xs:element name=âAddExtensionâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| <xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> | |
| <xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ | |
| type=âxs:stringâ/> | |
| <xs:element minOccurs=â0â name=âextâ nillable=âtrueâ | |
| type=âxs:stringâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
The result data for the AddExtension WSDL operation.
| <xs:element name=âAddExtensionResponseâ> |
| ââ<xs:complexType> |
| ââââ<xs:sequence> |
| <xs:element minOccurs=â0â name=âAddExtensionResultâ type=âxs:intâ/> |
| ââââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
| <wsdl:operation name=âRemoveExtensionâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| RemoveExtensionâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_RemoveExtension_InputMessageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| RemoveExtensionResponseâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_RemoveExtension_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_RemoveExtension_InputMessage
The requested WSDL message for the RemoveExtension WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/RemoveExtension
The SOAP body contains the RemoveExtension element.
ISearchApplicationAdminWebService_RemoveExtension_OutputMessage
The response WSDL message for the RemoveExtension method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/RemoveExtensionResponse
The SOAP body contains the RemoveExtensionResponse element.
The input data for the RemoveExtension WSDL operation.
| <xs:element name=âRemoveExtensionâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| <xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> | |
| <xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ | |
| type=âxs:stringâ/> | |
| <xs:element minOccurs=â0â name=âextâ nillable=âtrueâ | |
| type=âxs:stringâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
The result data for the RemoveExtension WSDL operation.
| <xs:element name=âRemoveExtensionResponseâ> | |
| ââ<xs:complexType> | |
| ââââ<xs:sequence> | |
| <xs:element minOccurs=â0â name=âRemoveExtensionResultâ | |
| type=âxs:intâ/> | |
| ââââ</xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
| <wsdl:operation name=âClearExtensionListâ> |
| â<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| ClearExtensionListâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_ClearExtensionList_InputMessageâ/> |
| â<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| ClearExtensionListResponseâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_ClearExtensionList_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_ClearExtensionList_InputMessage
The requested WSDL message for the ClearExtensionList WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/ClearExtensionList
The SOAP body contains the ClearExtensionList element.
ISearchApplicationAdminWebService_ClearExtensionList_OutputMessage
The response WSDL message for the ClearExtensionList method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/ClearExtensionListResponse
The SOAP body contains the ClearExtensionListResponse element.
The input data for the ClearExtensionList WSDL operation.
| <xs:element name=âClearExtensionListâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| ââ<xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> |
| ââ<xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
The result data for the ClearExtensionList WSDL operation.
| <xs:element name=âClearExtensionListResponseâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| ââ<xs:element minOccurs=â0â name=âClearExtensionListResultâ |
| type=âxs:intâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
| <wsdl:operation name=âAddCrawlMappingâ> |
| â<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddCrawlMappingâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_AddCrawlMapping_InputMessageâ/> |
| â<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddCrawlMappingResponseâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_AddCrawlMapping_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_AddCrawlMapping_InputMessage
The requested WSDL message for the AddCrawlMapping WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddCrawlMapping
The SOAP body contains the AddCrawlMapping element.
ISearchApplicationAdminWebService_AddCrawlMapping_OutputMessage
The response WSDL message for the AddCrawlMapping method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddCrawlMappingResponse
The SOAP body contains the AddCrawlMappingResponse element.
The input data for the AddCrawlMapping WSDL operation.
| <xs:element name=âAddCrawlMappingâ> |
| ââ<xs:complexType> |
| ââ<xs:sequence> |
| âââ<xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> |
| âââ<xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| âââ<xs:element minOccurs=â0â name=âsourceâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| âââ<xs:element minOccurs=â0â name=âtargetâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| ââ</xs:sequence> |
| ââ</xs:complexType> |
| </xs:element> |
The result data for the AddCrawlMapping WSDL operation.
| <xs:element name=âAddCrawlMappingResponseâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| ââ<xs:element minOccurs=â0â name=âAddCrawlMappingResultâ |
| type=âxs:intâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
| <wsdl:operation name=âAddContentSourceâ> |
| â<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddContentSourceâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_AddContentSource_InputMessageâ/> |
| â<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| AddContentSourceResponseâ |
| message=âtns:ISearchApplicationAdminWeb- |
| Service_AddContentSource_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_AddContentSource_InputMessage
The requested WSDL message for the AddContentSource WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddContentSource
The SOAP body contains the AddContentSource element.
ISearchApplicationAdminWebService_AddContentSource_OutputMessage
The response WSDL message for the AddContentSource method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/AddContentSourceResponse
The SOAP body contains the AddContentSourceResponse element.
The input data for the AddContentSource WSDL operation.
| <xs:element name=âAddContentSourceâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| âââ<xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> |
| âââ<xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| âââ<xs:element minOccurs=â0â name=âtypeâ type=âxs:intâ/> |
| âââ<xs:element minOccurs=â0â name=âwssCrawlStyleâ |
| xmlns:q9=âhttp://schemas.datacontract.org/2004/07/ |
| Microsoft.Office.Server.Search.Administrationâ |
| type=âq9:SharePointCrawlBehaviorâ/> |
| âââ<xs:element minOccurs=â0â name=ânameâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
| Value | Meaning |
| 0 | Enables specifying settings that |
| control the depth of crawl for a | |
| Web site based on start address | |
| server, host hops and page depth | |
| 1 | Enables specifying settings that |
| control the depth of crawl for a | |
| Web site based on discovering | |
| everything under the hostname for | |
| each start address or only crawling | |
| the site collection of each start address | |
| 2 | Lotus Notes database |
| 3 | File shares |
| 4 | Exchange public folders |
| 5 | Custom |
| 6 | Legacy<3> Business Data Catalog |
| 8 | Custom search connector |
| 9 | Business Data Connectivity (BDC) |
The result data for the AddContentSource WSDL operation.
| <xs:element name=âAddContentSourceResponseâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| ââ<xs:element minOccurs=â0â name=âAddContentSourceResultâ |
| type=âxs:intâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
| <wsdl:operation name=âRemoveCrawlMappingâ> |
| ââ<wsdl:input |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| RemoveCrawlMappingâ |
| message=âtns:ISearchApplicationAdminWebService_Remove- |
| CrawlMapping_InputMessageâ/> |
| ââ<wsdl:output |
| wsam:Action=âhttp://tempuri.org/ISearchApplicationAdminWebService/ |
| RemoveCrawlMappingResponseâ |
| message=âtns:ISearchApplicationAdminWebService_Remove- |
| CrawlMapping_OutputMessageâ/> |
| </wsdl:operation> |
ISearchApplicationAdminWebService_RemoveCrawlMapping_InputMessage
The requested WSDL message for the RemoveCrawlMapping WSDL operation.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/RemoveCrawlMapping
The SOAP body contains the RemoveCrawlMapping element.
ISearchApplicationAdminWebService_RemoveCrawlMapping_OutputMessage
The response WSDL message for the RemoveCrawlMapping method.
The SOAP action value is:
http://tempuri.org/ISearchApplicationAdminWebService/RemoveCrawlMappingResponse
The SOAP body contains the RemoveCrawlMappingResponse element.
The input data for the RemoveCrawlMapping WSDL operation.
| <xs:element name=âRemoveCrawlMappingâ> |
| â<xs:complexType> |
| ââ<xs:sequence> |
| ââ<xs:element minOccurs=â0â name=âversionInâ type=âxs:intâ/> |
| ââ<xs:element minOccurs=â0â name=âcurrentUserâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| ââ<xs:element minOccurs=â0â name=âsourceâ nillable=âtrueâ |
| type=âxs:stringâ/> |
| ââ</xs:sequence> |
| â</xs:complexType> |
| </xs:element> |
The result data for the RemoveCrawlMapping WSDL operation.
| <xs:element name=âRemoveCrawlMappingResponseâ> | |
| ââ<xs:complexType> | |
| <xs:sequence> | |
| <xs:element minOccurs=â0â name=âRemoveCrawlMappingResultâ | |
| type=âxs:intâ/> | |
| </xs:sequence> | |
| ââ</xs:complexType> | |
| </xs:element> | |
While the above describes and defines the format and syntax of the messages for communicating with the Web crawler, according to embodiments governed by the Search Service Administration Web Service protocol, Appendix A herein, entitled âEmbodiment of Web Services Description Language (âWSDLâ) Schema,â includes a full example WSDL schema illustrating the structure, format, and syntax of the messages, including those described above, for communicating to the Web crawler. As shown, the syntax of the structures uses example Extensible Markup Language (âXMLâ) schema, as well as WSDL. Appendix A is incorporated herein in full.
While FIG. 2 shows the configuration request and response messages for configuring the crawl function of crawler application 208, FIG. 4 illustrates an example user interface 400 for allowing a user to enter inputs for the configuration requests 216 illustrated in FIG. 2, in accordance with embodiments disclosed herein. In an embodiment, protocol client 204 presents user interface 400 for allowing a user to enter data for configuring the crawl function of crawler application 208. User interface 400 is displayed on the user interface of client computer 104, for example. Configuration options 402 are displayed to the user in user interface 400. While a single user interface 400 is shown in FIG. 4, multiple user interfaces can be used in accordance with embodiments disclosed herein. User interface 400 prompts a user to create a new crawl rule 404 by allowing the user to mark checkbox 406. Further, a user configures the crawl function by marking checkbox 408 to indicate that only URLs of matching links (including case-sensitive considerations) should be crawled, according to embodiments disclosed herein. In an embodiment, user interface 400 allows a user to configure the crawl function by defining authentication information for the crawl by marking checkbox 410 to indicate that the user desires to provide credentials for the crawl rule. For example, the user is prompted to enter credentials, such as a password 412 by way of example only, in data entry field 414. User interface 400 is offered for purposes of illustration only. Any type of user interfaces can be used in accordance with embodiments disclosed herein. In other embodiments, no user interface is used, and configuration requests and input data are instead provided directly by another computing device, a computer program, etc.
While FIG. 4 shows an example user interface for allowing a user to provide input data for configuring the crawl function of crawler application 208, FIG. 5 depicts the operational steps 500 for requesting a search for a specific type of information and determining whether a configuration request applies to a Web crawl. Start operation 502 is initiated, and process 500 proceeds to obtain the current configuration version of crawler application operation 503, in which the crawler application is launched in response to an indication by a user, protocol client, etc. For example, a protocol client obtains the current configuration version of the Web search service. Next, a request is made to search for an electronic document 504. This search requests a specific type of information and/or provides specific search criteria. A search engine, such as search engine 210 in FIG. 2, receives the search request and determines 506 whether any electronic documents satisfying the request are indexed in an index catalog, such as index 212 in FIG. 2. Such indexing allows for fast search responses and is based on previous Web crawling results. If the index contains an item(s) matching the search criteria, process 500 branches YES to receive list of documents/content sources 508, in which a list of electronic documents/content sources matching the search criteria is provided to, and received by, the protocol client, for example. Next, the user, or client, determines 510 whether any documents/content sources should be requested from the list received of indexed documents at operation 508. If no indexed documents are desired, process 500 branches NO to end operation 516, and process 500 terminates. If an indexed document(s)/content source(s) from the list is desired, process 500 branches YES to request document operation 512. In response to request operation 512, the electronic document(s) is then received at operation 514. Process 500 then terminates at end operation 516.
Returning to operation 506, if no electronic documents satisfying the search request are indexed in index 212, for example, process 500 branches NO to request to start crawl operation 518. In initiating the Web crawl 518, the user is prompted to determine whether to configure the crawl function 520. If NO, the crawl proceeds, and the client, or user, for example, receives results of the crawl at receive operation 522. If the client desires to configure the crawl function, process 500 branches YES to provide input data 524. The crawl then proceeds, and the client receives the results of the crawl at operation 522. Process 500 then terminates at end operation 516. Although FIG. 5 depicts the prompting of a user to configure a crawl function, it should be noted that this illustration is offered by way of example only in accordance with an embodiment disclosed herein. In other embodiments, no prompt is given, and the client, instead, initiates a configuration request. In still other embodiments, such configuration request occurs before any Web crawling begins for a particular search. Further, while FIG. 5 separates the steps for determining whether an index includes a document/content source satisfying search criteria from the steps for starting a Web crawl (if no document/content source satisfying the search criteria is indexed), in other embodiments, a request to start a crawl operation also applies to retrieving documents/content sources from an index. In other words, a Web crawl is initiated to discover both documents/content sources in the index of the protocol server and external Web sites in accordance with other embodiments disclosed herein. FIG. 5 is merely an example of possible operational characteristics for requesting a search and configuring a crawl function in accordance with embodiments disclosed herein. Operational steps depicted may be combined into other steps, or additional steps may be added, for example.
While FIG. 5 depicts the operational steps for prompting a client as to whether configuration of a crawl function is desired, FIG. 6 illustrates the operational steps for sending a configuration request from a protocol client, such as protocol client 204 in FIG. 2, to a protocol server, such as protocol server 206 in FIG. 2, in accordance with embodiments disclosed herein. Start operation 602 is initiated and process 600 proceeds to a protocol client obtaining the current configuration version of a Web search service 604, or crawler application. For example, a Web site for the Web search service is accessed in accordance with an embodiment. The protocol client knows the address of the Web site for the Web search service in one embodiment and accesses such Web site upon an indication to invoke or launch the Web search service. As an example, a user âclicksâ on a box, icon, or other component indicating that the Web search service should be launched. This box, icon, or other component, in embodiments, is on a user's desktop, within another application, etc. Numerous types of ways to launch the Web search service can be used in accordance with embodiments of the present disclosure. In an embodiment, the protocol client accesses the Web search service by connecting to it by its network address, in which any authentication and/or authorization methods are automatically executed for access by the protocol client. Upon obtaining the current configuration version of the Web search service, or crawler application, the protocol client presents a user interface for initiating a crawl by the Web search service or configuring a crawl 606. It should be noted that steps 604 and 606 are offered by way of example only. For example, in other embodiments, the protocol client automatically displays the user interface or other interface mechanism for enabling configuration of the crawl function. The protocol client thus presents the user interface for requesting Web crawling prior to any received indication to launch the Web search service. In still other embodiments, the protocol client does not display such user interface. Rather, the user interface is displayed by other software running in conjunction with the protocol client. In yet further embodiments, the protocol client displays the user interface for requesting Web crawling and/or configuring of the crawl function in response to a user launching the Web search service by directly accessing a Web site for such service.
Next, the user interface, such as user interface 400, prompts the user to determine whether the user would like to configure the crawl function 608. If the user (or other client, such as computing device or computer program) desires to configure the crawl function, process 600 branches YES to receive input data for configuring the crawl function 610. Examples of such input data are shown with respect to the sample message operations discussed above. Upon receiving the input data, the protocol client formats the input data into a configuration request for communication to the API on the protocol server and sends the request for configuring the crawl function 612 to the protocol server. After the protocol server (through the Web search service 208, for example, depicted in FIG. 2) processes the configuration request, the protocol client receives a response 614 from the protocol server. The protocol client parses the response to determine whether it indicates a fault exception or other error 616. If an error is detected, process 600 branches YES to indicate to the user, for example, that revised input data is needed 618 to make the configuration request allowable. Process 600 then proceeds to receive input data 610 operation. If no error is indicated, process 600 terminates at end operation 620. Returning to operation 608, if it is not desired to configure the crawl function, process 600 proceeds to crawl or search index operation 609, in which a crawl process or other search process is conducted by the crawler application or search engine, respectively. FIG. 6 is merely an example of possible operational characteristics for requesting a search and configuring a crawl function in accordance with embodiments disclosed herein. Operational steps depicted may be combined into other steps, or additional steps may be added, for example.
While FIG. 6 depicts the operational steps for sending a configuration request from a protocol client to a protocol server, FIG. 7 illustrates the operational steps for creating and configuring a content source, in accordance with embodiments disclosed herein. Start operation 702 is initiated, and process 700 proceeds to obtain the current configuration version of the crawler application 704. In an embodiment, the protocol client obtains the current configuration version of the crawler application in response to an indication by a user to perform a crawl or configure an existing crawl function by using the âGetVersionâ operation, for example. After obtaining the crawler application, process 700 proceeds to request the list of existing content sources 706, in which indexed content sources, for example, are compiled into a list according to embodiments. In other embodiments, the list of content sources already exists and no compilation is required. In accordance with an embodiment disclosed herein, the list of content sources is obtained by using the âGetContentSourcesâ operation. Next, the protocol client receives the list of existing content sources 708, and adds a new content source 710, such as through an âAddContentSourceâ input message. After adding a new content source, the content source properties are updated in embodiments at update operation 712. For example, the protocol client requests to edit the content source and an âEditContentSourceâ input message is relayed to the crawler application through the use of an API on the protocol server. The protocol client then receives a response message to the configuration request for updating the content source properties at receive operation 714, and process 700 terminates at end operation 716. FIG. 7 is merely an example of possible operational characteristics for creating and configuring a content source in accordance with embodiments disclosed herein. Operational steps depicted may be combined into other steps, or additional steps may be added, for example.
Further, while FIG. 7 illustrates the operational steps for creating and configuring a content source, FIG. 8 depicts the operational steps for configuring a crawl rule, such as by setting credentials for an authentication method using message âSetCredentials2,â as disclosed herein in accordance with embodiments of the present disclosure. Process 800 is initiated at start operation 802 and proceeds to obtain the current configuration version of the crawler application 804, such as by âGetVersionâ operation according to an embodiment. To configure a crawl rule, the protocol client next requests the current list of crawl rules 806, such as by âGetCrawlRuleListâ operation, and receives the list at operation 808. After obtaining the current list of crawl rules, process 800 proceeds to create a new crawl rule 810, such as through âAddAdvancedCrawlRuleâ operation, according to embodiments. The client, or user, then updates one or more properties of the new crawl rule 812, such as through the âUpdateCrawlRuleâ operation, and process 800 proceeds to set crawl rule credentials 814 for defining an authentication method for the crawl. In an embodiment, the âSetCrawlRuleCredentials2â operation is used to configure the authentication method and crawl account for the new crawl rule. Process 800 then terminates at end operation 816. FIG. 8 is merely an example of possible operational characteristics for configuring a crawl rule in accordance with embodiments disclosed herein. Operational steps depicted may be combined into other steps, or additional steps may be added, for example.
Finally, FIG. 9 illustrates an example computing system 900 upon which embodiments disclosed herein may be implemented. A computer system 900, such as client computer 104, server 102, or other computing device, which has at least one processor 902 and a system memory 904, is depicted in accordance with embodiments disclosed herein, such as to configure a crawl function as shown in FIG. 1. For example, according to embodiments, memory 904 comprises an index, existing content sources, and/or current crawl rules. In its most basic configuration, computing system 900 is illustrated in FIG. 9 by dashed line 906. System 900 may include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in FIG. 9 by removable storage 908 and non-removable storage 910. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. System memory 904, removable storage 908 and non-removable storage 910 are all examples of computer storage media (i.e., memory storage). Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store information and which can be accessed by system 900. Any such computer storage media may be part of system 900. Depending on the configuration and type of computing device, memory 904 may be volatile, non-volatile or some combination of the two. The illustration in FIG. 9 is intended in no way to limit the scope of the present disclosure.
Communication media may be embodied by computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term âmodulated data signalâ means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media, in accordance with an embodiment, includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
System 900 may also contain communications connection(s) 916 that allow the device to communicate with other devices. Additionally, to input content into the fields of the UI on client computer 104 as provided by a corresponding UI module (not shown) on client computer 104, for example, in accordance with an embodiment of the present disclosure, system 900 may have input device(s) 914 such as a keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 912 such as a display, speakers, printer, etc. may also be included, in which such devices may be used to display the UI for viewing configuration options, receiving input data for a crawl or configuration request, etc., in accordance with embodiments. All of these devices are well known in the art and need not be discussed at length here.
Having described embodiments of the present disclosure with reference to the figures above, it should be appreciated that numerous modifications may be made to the embodiments that will readily suggest themselves to those skilled in the art and which are encompassed within the scope and spirit of the present disclosure and as defined in the appended claims. Indeed, while embodiments have been described for purposes of this disclosure, various changes and modifications may be made which are well within the scope of the present disclosure.
Similarly, although this disclosure has used language specific to structural features, methodological acts, and computer-readable media containing such acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific structure, acts, features, or media described herein. Rather, the specific structures, features, acts, and/or media described above are disclosed as example forms of implementing the claims. Aspects of embodiments allow for multiple client computers, multiple protocol servers, multiple networks, etc. Or, in other embodiments, a single client computer with a single protocol server and single network are used. One skilled in the art will recognize other embodiments or improvements that are within the scope and spirit of the present disclosure. Therefore, the specific structure, acts, or media are disclosed as example embodiments of implementing the present disclosure. The disclosure is defined by the appended claims.
1. A computer-implemented method for cataloging authenticated crawl results, the method comprising:
receiving data indicative of a configuration request message, wherein the data indicative of the configuration request message comprises authentication credentials and one or more content sources;
initiating a crawl based on the one or more content sources, wherein the crawl uses the authentication credentials to access content associated with the one or more content sources; and
cataloging data in an index based on the crawl.
2. The method of claim 1, wherein the authentication credentials comprise one or more cookies.
3. The method of claim 1, wherein the authentication credentials comprise one or more certificates.
4. The method of claim 1, wherein the authentication credentials comprise at least one form-based authentication.
5. The method of claim 1, wherein the authentication credentials comprise at least one plain-text password.
6. The method of claim 1, wherein the index comprises a full-text catalog.
7. The method of claim 1, wherein the index comprises metadata relating to the content.
8. A computer-readable storage medium, wherein the medium does not consist of a propagated signal, the medium storing computer-executable instructions for performing a method for cataloging authenticated crawl results, the method comprising:
receiving data indicative of a configuration request message, wherein the data indicative of the configuration request message comprises authentication credentials and one or more content sources;
initiating a crawl based on the one or more content sources, wherein the crawl uses the authentication credentials to access content associated with the one or more content sources; and
cataloging data in an index based on the crawl.
9. The computer-readable medium of claim 8, wherein the authentication credentials comprise one or more cookies.
10. The computer-readable medium of claim 8, wherein the authentication credentials comprise one or more certificates.
11. The computer-readable medium of claim 8, wherein the authentication credentials comprise at least one form-based authentication.
12. The computer-readable medium of claim 8, wherein the authentication credentials comprise at least one plain-text password.
13. The computer-readable medium of claim 8, wherein the index comprises a full-text catalog.
14. The computer-readable medium of claim 8, wherein the index comprises metadata relating to the content.
15. A system for cataloging authenticated crawl results, the system comprising:
a server in a network environment configured to receive data indicative of a configuration request message, wherein the data indicative of the configuration request message comprises authentication credentials and one or more content sources;
the server in network communication with one or more computing devices, the server configured to send information that assists the one or more computing devices in a crawl, wherein the information comprises data indicative of the one or more content sources and data indicative of the authentication credentials, and
a database configured to catalog data in an index based on the crawl.
16. The system of claim 15, wherein the network communication is substantially constant.
17. The system of claim 15, wherein the data indicative of the authentication credentials comprise one or more of the following: at least one form-based authentication, at least one plain-text password, or cookies.
18. The system of claim 15, wherein the data indicative of the authentication credentials comprise information related to accessing content associated with the one or more content sources.
19. The system of claim 15, wherein the data comprises one or more full-text data related to the one or more content sources.
20. The system of claim 15, wherein the data comprises one or more metadata relating to the one or content sources.