30
Yandex.XML Developer's guide 22.05.2018

Developer's guide · PDF filetrademarks, service marks, ... and answers to common questions. ... provides access to Russian, Turkish,

Embed Size (px)

Citation preview

Yandex.XMLDeveloper's guide

22.05.2018

Yandex.XML. Developer's guide. Version 1.2 Document build date: 22.05.2018.This volume is a part of Yandex technical documentation.Yandex helpdesk site: http://help.yandex.ru© 2008—2018 Yandex LLC. All rights reserved.

Copyright DisclaimerYandex (and its applicable licensor) has exclusive rights for all results of intellectual activity and equated to them means of individualization, used for development, support,and usage of the service Yandex.XML. It may include, but not limited to, computer programs (software), databases, images, texts, other works and inventions, utility models,trademarks, service marks, and commercial denominations. The copyright is protected under provision of Part 4 of the Russian Civil Code and international laws.You may use Yandex.XML or its components only within credentials granted by the Terms of Use of Yandex.XML or within an appropriate Agreement.Any infringements of exclusive rights of the copyright owner are punishable under civil, administrative or criminal Russian laws.

Contact informationYandex LLChttp://www.yandex.comPhone: +7 495 739 7000Email: [email protected]: 16 L'va Tolstogo St., Moscow, Russia 119021

ContentsOverview .............................................................................................................................................................................................. 4Restrictions and requirements .............................................................................................................................................................. 4Getting started ...................................................................................................................................................................................... 5Registration .......................................................................................................................................................................................... 6

Request for search results ................................................................................................................................................................. 7GET requests ................................................................................................................................................................................ 7POST requests .............................................................................................................................................................................. 9

Response format ....................................................................................................................................................................... 11request ................................................................................................................................................................................ 13response .............................................................................................................................................................................. 14

Request for limits for the next day ................................................................................................................................................. 19Response format ......................................................................................................................................................................... 19

Formatting results .............................................................................................................................................................................. 21Protection from robots ....................................................................................................................................................................... 21

Questions and answers .................................................................................................................................................................... 24What is XSLT? .......................................................................................................................................................................... 24Notifications ............................................................................................................................................................................... 24IP address ................................................................................................................................................................................... 25Additional search features .......................................................................................................................................................... 25Encoding .................................................................................................................................................................................... 26

Appendices ........................................................................................................................................................................................ 27Validating XML files ................................................................................................................................................................. 27Error codes ................................................................................................................................................................................. 27Search regions ............................................................................................................................................................................ 28

Feedback ............................................................................................................................................................................................ 29

Yandex.XML Developer's guide

OverviewYandex.XML is a service that lets you send queries to the Yandex search engine and get responses in XMLformat.

This document covers restrictions and requirements for using the service, basic steps for getting startedand registering, formats of search queries and responses, and answers to common questions.

The document is intended for developers who need to set up a search across a website, group of sites, or theInternet.

Restrictions and requirementsYandex.XML provides access to Russian, Turkish, and Worldwide types of search. The desired search typeis selected during registration.

The search type determines the ranking formula, the set of documents that are searched (the search base), and therestrictions that are applied to usage of Yandex.XML.

The following types of restrictions are applied:

• Limit to the number of IP addresses that are associated with the account (by default, one).

• Daily limits on the number of search queries sent. If the IP address changes, the limit applies to the totalnumber of queries sent from all network addresses.

The following table provides information about how restrictions depend on the search type and other conditions.

Condition “Russian” search type “Turkish” search type “Worldwide” searchtype

Telephone number is notconfirmed.

Restrictions cannotbe changed.

10 search queries per day. 10 search queries per day.

Telephone numberconfirmed.

Restriction:One telephone numbermay be confirmed no morethan once, and only for asingle account.

Restrictions cannotbe changed.

10,000 search queriesper day.

10,000 search queriesper day.

The website is registeredin the Yandex.Webmasterservice.

The number of search queriesallowed is determinedindividually for each user.Restrictions depend on thesites registeredin Yandex.Webmaster.

The following types of limitsare also applied:

• hourly

• requests per second

Restrictions cannotbe changed.

Restrictions cannotbe changed.

License agreement

https://legal.yandex.ru/xml https://legal.yandex.com.tr/xml

https://legal.yandex.com/xml

Changing restrictions (increasing the maximum number of queries per day and IP addresses allowed)

Become a partner of theYandex AdvertisingNetwork.

Contact a Yandexrepresentative and discusshow to extend your use ofYandex.XML features.

Contact a Yandexrepresentative and discusshow to extend your use ofYandex.XML features.

Developer's guide

Yandex.XML Developer's guide

4

To get information about additional features of Yandex.XML and how to get access to them, contact a Yandexrepresentative.

For each search query, no more than 1000 results are returned.

When using the service, follow the requirements for formatting results and the recommendations for protectionfrom robots.

Hourly limits for the “Russian” search typeFor the “Russian” search type, additional hourly limits may be imposed that are calculated as percentages of thedaily query limit.

Information about hourly limits is available on the page Information on restrictions after registration.

The daily limit on the number of queries for a site is 1000. During each hour in the period from 7:00 to 19:00, no morethan 5% of the queries for this limit can be sent (50 queries).

Even if there were no search queries sent from the account in the period from 0:00 to 7:00, no more than 50 queries can besent during each hour from 7:00 to 19:00. In total, no more than 600 queries can be sent over this period.

Restrictions on the number of queries per second for the “Russian” search typeFor the “Russian” search type, additional limits may be applied to the number of requests sent per second (RPS).

The number of queries allowed per second(

) depends on the hourly limit(

) and is calculated using the formula:

Exceeding this threshold might lead to errors (up until the next second).

Getting startedTo set up and start using Yandex.XML, follow these steps:

1. Register the IP address that you plan to send search requests from.

2. Send a test request.

Make sure that requests are sent successfully from the specified IP address:

• Send a request from the service's interface. The interface should be accessed from the computer thatis assigned the IP address specified during registration.

• Form a GET request and send it from the computer that is assigned the IP address that was specifiedduring registration. For example, if during registration the field URL for queries displayedthe string “https://yandex.ru/search/xml?user=test-yandex&key=09.31114:e650g7j”, you woulduse the following GET request:

https://yandex.ru/search/xml?user=test-yandex&key=09.31114:e650g7j&query=yandex

3. Check the received XML document.

The response should correspond to the specified format and should not contain errors.

Developer's guide

Yandex.XML Developer's guide

5

Note:If there are no results for the search string, an error with the code “15” is acceptable.

4. Only for the “Russian” search type. Register your sites in the Yandex.Webmaster service. After registration,individual restrictions are determined for the current user.

5. Only for the “Russian” search type. Review the daily and hourly restrictions on the Information aboutrestrictions page.

Yandex.XML can only be used on sites for which the current user is the main owner in the Yandex.Webmaster. If necessary, ask the site owner to assign you the appropriate role.

6. Configure request parameters.

The GET and POST methods are supported.

7. Review the response format.

8. Set up response handling.

For formatting search results, you must comply with the design requirements.

9. If necessary. Request information about hourly restrictions for the next 24 hours.

10. Optional. Set up protection from robots.

RegistrationTo register on the Yandex.XML service, follow these steps:

1. Open the registration page (https://xml.yandex.com/settings.xml).

Authentication in Yandex.Passport is required. If necessary, register first.

2. Review the value in URL for queries:

• For GET requests, this is the base part of the address that request parameters are appended to.

• For POST requests, this is the URL to send the request body to.

3. Fill in the fields on the form:

Field Description

Main IP-address The unique network address of the computer that will be sending search queries.

To set the IP address of the computer you are using for registration, use the value shownin Your current IP-address is.

Search type The selected value determines the set of documents that are searched (the search base),the ranking formula, and usage restrictions.

Email notifications The email address to send notifications to.

List of events Choose events that notifications should be sent for.

Notificationlanguage

The language to use for delivering messages about selected events.

4. Review the terms of the license agreement. The terms depend on the search type you selected.

5. Confirm your agreement (check the I accept the terms of License Agreement box).

6. Save the information you entered (the Save button is available only when you have filled in both Emailnotifications and I accept the terms of License Agreement).

If necessary, registration data can be edited on the Settings page.

Developer's guide

Yandex.XML Developer's guide

6

Request for search resultsYandex.XML supports two ways of sending a search request: GET and POST.

The response format is the same for both supported methods.

Attention!To use algorithms for protection from robots, the request must pass information about the IP address and the"spravka" cookie for the query author.

GET requests

Attention!Special characters that are passed as parameter values must be replaced with the appropriate escape sequencesfor percent-encoding. For example, instead of the equal sign (“=”), the escape sequence “%3D” must be used.

Request formathttps://yandex.<domain>/search/xml ? user=<user name> & key=<API key> & query=<search query text> & [lr=<ID of the search country/region>] & [l10n=<notification language>] & [sortby=<type of sorting>] & [filter=<filter type>] & [maxpassages=<number of passages>] & [groupby=<parameters for grouping results>] & [page=<page number>] & [showmecaptcha=<yes>]

user User name. Must match the login for Yandex.Passport that was set during registration.

key Value of the API key that was issued during registration.

query Text of the search query. Instead of special symbols, the corresponding escape sequences must be used.

The query has the following restrictions: maximum query length — 400 characters; maximum numberof words — 40.

lr Supported only for “Russian” and “Turkish” search types.

ID of the country or region to search. Determines the rules for ranking documents. For example, if wepass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formulais used that is defined for the Novosibirsk region.

A list of IDs of common countries and regions is provided in the appendix.

l10n The notification language for the search response. It affects the text that is passed in the found-docs-human tag, as well as in error messages.

Acceptable values depend on the type of search used:

• “Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh).If omitted, notifications are sent in Russian.

• “Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish).

• “Worldwide (yandex.com)” — Supports only the value “en” (English).

Yandex.XML Developer's guide

7

sortby Rules for sorting search results. Possible values:

• “rlv” — By relevancy.

• “tm” — By time when the document was changed.

If omitted, results are sorted by relevancy.

When sorting by change time, the parameter may contain the order attribute, which is the orderfor sorting documents. Possible values:

• “descending” — Forward (from most recent to oldest). Used by default.

• “ascending” — Reverse (from oldest to most recent).

Format: sortby=<sorting type>.order%3D<sorting order>. For example, for reversesorting by date, you must use the following construction: sortby=tm.order%3Descending.

filter Rules for filtering search results (excluding documents from search results based on one of the rules).Possible values:

• “none” — Filtering is disabled. The output includes any documents, regardless of their content.

• “moderate” — Moderate filtering. The output excludes documents that fall into the “adults only”category, if the search is not explicitly directed at finding these types of resources.

• “strict” — Family filter. Regardless of the search query, the output excludes documents that fallinto the “adults only” category, as well as those that contain foul language.

If the parameter is omitted, moderate filtering is used.

maxpassages The maximum number of passages that can be used when creating a snippet for the document. A passageis an excerpt from a found document that contains the query words. Passages are used for creatingsnippets, which are textual annotations to found documents.

Acceptable values — from 1 to 5. The search result may contain fewer passages than the value set forthis parameter.

If the parameter is omitted, no more than four passages with the query text are returned for eachdocument.

groupby Set of parameters that define the rules for grouping results. Grouping is used to put documents fromthe same domain in a container. Within the container, documents are ranked using the sorting rulesdefined in the sortby parameter. Results passed to the container can be used for including severaldocuments from the same domain in search output.

Parameters are comma-separated and set in the format:

attr%3D<service attribute>.mode%3D<type of grouping>.groups-on-page%3D<number of groups per page>.docs-in-group%3D<number of documentsper group>You can find a description of the parameters mode, attr, groups-on-page and docs-in-group in the sectionPOST requests.

page Number of the requested page in the search output. This determines the range of document positionsreturned for the request. Numbering starts from zero (the first page corresponds to the value “0”).

For example, if the number of documents returned on a page is equal to “n”, and the value “p” is passedin the parameter, the search results will include documents that fall within the range of output positionsfrom p*n+1 to p*n+n inclusively.

If the parameter is omitted, the first page of search output is returned.

showmecaptcha Initiates user verification for possible protection from robots.

The only value used is “yes”.

Sample GET requestThe following request returns the second page of search results for the query “<table>” for the user “xml-search-user”.Search type: Russian (yandex.ru). Results are grouped by domain. Each group contains three documents, and five groupscan be returned per page.

Yandex.XML Developer's guide

8

https://yandex.ru/search/xml?user=xml-search-user&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=%3Ctable%3E&groupby=attr%3Dd.mode%3Ddeep.groups-on-page%3D5.docs-in-group%3D3&maxpassages=3&page=1

POST requests

Attention!Special characters that are passed as parameter values in the request body must be replaced with the appropriateescape sequences for XML-encoding. For example, instead of the ampersand sign (“&”), the escape sequence“&amp;” must be used.

Request URLhttps://yandex.<domain>/search/xml ? user=<user name> & key=<API key> & filter=<filter type> & [lr=<search region ID>] & [l10n=<notification language>] & [showmecaptcha=<yes>]

user User name. Must match the login for Yandex.Passport that was set during registration.

key Value of the API key that was issued during registration.

filter Rules for filtering search results (excluding documents from search results based on one of the rules).Possible values:

• “none” — Filtering is disabled. The output includes any documents, regardless of their content.

• “moderate” — Moderate filtering. The output excludes documents that fall into the “adults only”category, if the search is not explicitly directed at finding these types of resources.

• “strict” — Family filter. Regardless of the search query, the output excludes documents that fallinto the “adults only” category, as well as those that contain foul language.

If the parameter is omitted, moderate filtering is used.

lr Supported only for “Russian” and “Turkish” search types.

ID of the country or region to search. Determines the rules for ranking documents. For example, if wepass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formulais used that is defined for the Novosibirsk region.

A list of IDs of common countries and regions is provided in the appendix.

l10n The notification language for the search response. It affects the text that is passed in the found-docs-human tag, as well as in error messages.

Acceptable values depend on the type of search used:

• “Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh).If omitted, notifications are sent in Russian.

• “Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish).

• “Worldwide (yandex.com)” — Supports only the value “en” (English).

showmecaptcha Initiates user verification for possible protection from robots.

The only value used is “yes”.

Yandex.XML Developer's guide

9

Request body format<?xml version="1.0" encoding="XML file encoding"?><request><!--Grouping tag--> <query> <!--Search query text--> </query> <sortby> <!--Type of sorting for search results--> </sortby> <groupings> <!--Grouping parameters in child tags--> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="1" /> </groupings> <page> <!--Number of the requested page in search results--> </page></request>

Parameter Descriptionrequest Grouping tag. Child tags contain parameters of the search query.

query Text of the search query. Instead of special symbols, the corresponding escape sequencesmust be used.

The query has the following restrictions: maximum query length — 400 characters;maximum number of words — 40.

sortby Rules for sorting search results. Possible values:

• “rlv” — By relevancy.

• “tm” — By time when the document was changed.

If omitted, results are sorted by relevancy.

When sorting by change time, the parameter may contain the order attribute, which is theorder for sorting documents. Possible values:

• “descending” — Forward (from most recent to oldest). Used by default.

• “ascending” — Reverse (from oldest to most recent).

maxpassages The maximum number of passages that can be used when creating a snippet for thedocument. A passage is an excerpt from a found document that contains the query words.Passages are used for creating snippets, which are textual annotations to found documents.

Acceptable values — from 1 to 5. The search result may contain fewer passages thanthe value set for this parameter.

If the parameter is omitted, no more than four passages with the query text are returnedfor each document.

page Number of the requested page in the search output. This determines the range of documentpositions returned for the request. Numbering starts from zero (the first page correspondsto the value “0”).

For example, if the number of documents returned on a page is equal to “n”, and the value“p” is passed in the parameter, the search results will include documents that fall withinthe range of output positions from p*n+1 to p*n+n inclusively.

If the parameter is omitted, the first page of search output is returned.

Group tag groupings. The child tag contains parameters for grouping results.

groupby Set of parameters that define the rules for grouping results. Grouping is used to putdocuments from the same domain in a container. Within the container, documents are rankedusing the sorting rules defined in the sortby parameter. Results passed to the containercan be used for including several documents from the same domain in search output.

Yandex.XML Developer's guide

10

Parameter Description

Contains the following attributes:

• mode — Grouping method. Possible values:

• “flat” — Flat grouping. Each group contains a single document. Passed withan empty value for the attr parameter (“" "”).

• “deep” — Grouping by domain. Each group contains documents from a singledomain. Passed with the value “d” for the attr parameter.

If the parameter is not defined, flat grouping is used.

• attr — Utility attribute. Depends on the value of the mode attribute.

• groups-on-page — Maximum number of groups that can be returned per pageof search results. Acceptable values — from 1 to 100.

• docs-in-group — Maximum number of documents that can be returned per group.Acceptable values — from 1 to 3.

Tip:If necessary, use the XML feed validator in the Yandex.Webmaster service. Detailed information aboutvalidation is provided in the appendix.

Sample POST requestThe request and request URL shown below return the third page of search results for the query “<table>” for the user “xml-search-user”. Results are sorted by time when the document was changed. Search type: Russian (yandex.ru). Resultsare grouped by domain. Each group contains three documents, and ten groups can be returned per page. The maximumnumber of passages per document is two. The service returns an XML file in UTF-8 encoding.

Request URL:

https://yandex.ru/search/xml?user=xml-search-user&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8

Request body:

<?xml version="1.0" encoding="UTF-8"?> <request> <query>%3Ctable%3E</query> <sortby>tm</sortby> <maxpassages>2</maxpassages> <page>2</page> <groupings> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3" /> </groupings> </request>

Response formatIn response to the search request, Yandex.XML returns an XML file in UTF-8 encoding that contains the searchresults.

Yandex.XML Developer's guide

11

Restriction:No more than 1000 results are returned for each search query. Depending on the value of the docs-in-groupattribute, each result may contain from one to three documents. The maximum number of pages with searchresults is determined by the number of document groups returned on each page (the value of the groups-on-page attribute). For example, if the groups-on-page attribute is passed with the value “10”, no more than100 pages containing search results can be made.

Files consist of the grouping tags request (general information about query parameters) and response (resultsof processing the search query).

Below you will find the general structure of a resulting XML document with sample values.

Attention!This structure is for illustrative purposes. It contains mutually exclusive elements.

<?xml version="1.0" encoding="utf-8"?><yandexsearch version="1.0"><request> <query>yandex</query> <page>0</page> <sortby order="descending" priority="no">rlv</sortby> <maxpassages>2</maxpassages> <groupings> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3" curcateg="-1" /> </groupings></request><response date="20120928T103130"> <error code="15">Sorry, there are no results for this search</error> <reqid>1348828873568466-1289158387737177180255457-3-011-XML</reqid> <found priority="phrase">206775197</found> <found priority="strict">206775197</found> <found priority="all">206775197</found> <found-human>207 million pages found</found-human> <misspell> <rule>Misspell</rule> <source-text>yande<hlword>xx</hlword></source-text> <text>yandex</text> </misspell> <reask> <rule>Misspell</rule> <source-text><hlword>yn</hlword>dex</source-text> <text-to-show>yandex</text-to-show> <text>yandex</text> </reask> <results> <grouping attr="d" mode="deep" groups-on-page="10" docs-in-group="3" curcateg="-1"> <found priority="phrase">45094</found> <found priority="strict">45094</found> <found priority="all">45094</found> <found-docs priority="phrase">192685602</found-docs> <found-docs priority="strict">192685602</found-docs> <found-docs priority="all">192685602</found-docs> <found-docs-human>193 million pages found</found-docs-human> <page first="1" last="10">0</page> <group> <categ attr="d" name="UngroupVital223.ru" /> <doccount>34</doccount> <relevance priority="all" />

Yandex.XML Developer's guide

12

<doc id="ZD831E1113BCFDD95"> <relevance priority="phrase" /> <url>https://www.yandex.ru/</url> <domain>www.yandex.ru</domain> <title>&quot;<hlword>Yandex</hlword>&quot; search engine and internet portal</title> <headline>Search the entire internet based on the user's region.</headline> <modtime>20060814T040000</modtime> <size>26938</size> <charset>utf-8</charset> <passages> <passage><hlword>Yandex</hlword> —a search engine...</passage> </passages> <properties> <_PassagesType>0</_PassagesType> <lang>ru</lang> </properties> <mime-type>text/html</mime-type> <saved-copy-url>https://hghltd.yandex.net/yandbtm?text=yandex&amp;url=https%3A%2F%2Fwww.yandex.ru%2F&amp;fmode=inject&amp;mime=html&amp;l10n=ru&amp;sign=e3737561fc3d1105967d1ce619dbd3c7&amp;keyno=0</saved-copy-url> </doc> </group> </grouping> </results></response></yandexsearch>

requestGeneralized information about request parameters. May be omitted if the response contains errors.

The request tags are described in the table below.

The "request"group tags

Description Attributes

query Text of the search query that was passed. None.

page Number of the search results page returned. Numbering startsfrom zero (the first page corresponds to the value “0”).

None.

sortby Parameters for sorting results Possible values:

• “rlv” — By relevancy.

• “tm” — By time when the document was changed.

• order — Sorting order.The “descending” value(forward) is used by default.When sorting by change time,it can take the value“ascending” (reverse).

• priority — For service use.Takes the value "no".

maxpassages Maximum number of passages that can be passed in a singlesearch result.

None.

groupings Grouping.

Contains grouping parameters in the groupby tag.No attributes

None.

Yandex.XML Developer's guide

13

The "request"group tags

Description Attributes

groupby Grouping parameters for found search results.• mode — Grouping method.

• attr — For service use.

• groups-on-page —Maximum number of groupsthat can be returned per pageof search results.

• docs-in-group —Maximum numberof documents that can bereturned per group. Any groupmay contain fewer documentsthan the value set in thisparameter.

• curcateg — For service use.Takes the value “-1”.

The following example shows the content of the request grouping tag returned for the query https://yandex.com.tr/search/xml?lr=983&l10n=en&user=xml-search-user&key=03.79031114:b631r9j587dkl4jko987hgg7bn2kl8a2&query=%22%22has%20sample%20applications%20for%20the%20most%20popular%20programming%22&sortby=tm&maxpassages=2&groupby=attr%3Dd.mode%3Ddeep.groups-on-page%3D5.docs-in-group%3D3&maxpassages=3&page=1

<request> <query>"has sample applications for the most popular programming"</query> <page>1</page> <sortby order="descending" priority="no">tm</sortby> <maxpassages>2</maxpassages> <groupings> <groupby attr="d" mode="deep" groups-on-page="5" docs-in-group="3" curcateg="-1"/> </groupings></request>

responseResults of processing the search query for which information is provided in the request child tags.

Contains the date attribute — the date and time of the request in the format<year><month><day>T<hour><minute><second> for UTC.

Consists of the following sections:

• General information about search results.

• The misspell / reask block.

• The results block.

General information about search resultsThe tags for the block with general information about search results are shown in the table below.

Tags with generalinformation aboutsearch results

Description Attributes

error Error description.Present only when the search request is processed incorrectly(for example, for an empty request, incorrect parameters,etc.).

code — Error code.

Yandex.XML Developer's guide

14

Tags with generalinformation aboutsearch results

Description Attributes

In certain cases, it is mutually exclusive of other tags in theresponse grouping tag.

reqid Unique request ID. None.

found Approximation of the number of documents found for thequery.

priority — For service use.Possible values:

• “phrase”

• “strict”

• “all”

found-human A string in the language corresponding to the search typeselected. Contains information about the numberof documents found and accompanying information.

None.

The misspell / reask blockOptional. Present if a typo was found (misspell) or corrected (reask) in the query.

The block tags are presented in the table below.

Tags for themisspell / reaskblocks

Description Attributes

misspell Grouping.

Contains information about a possible typo in the searchquery.

None.

reask Grouping.

Contains information about corrections made to the sourcequery before searching for documents.

None.

rule The type of error found in the query.

Possible values:

• “Misspell” — Typo.

• “KeyboardLayout” — Wrong keyboard layout.

• “Volapyuk” — Query made in Russian using Englishtransliteration. Used if the search type is set to “Russian(yandex.ru)”.

None.

source-text Source text of the query.

The fragment of the search query that presumably containsan error is highlighted by the hlword tag.

None.

text-to-show Optional (only for the reask grouping tag).

Contains the corrected text of the search query. In most casesit matches the value passed in the text tag.

None.

text Corrected text of the search query. None.

The results blockOptional. Present if results were found for the query.

Yandex.XML Developer's guide

15

The block tags are presented in the table below.

Tags for the resultsblock

Description Attributes

results Grouping. Child tags contain information about searchparameters and found documents.

None.

grouping Grouping. Child tags contain information about searchparameters and found documents.

Attributes reflect the grouping rulesfor found documents.

• mode — Grouping method.

• attr — For service use.Depends on the value of themode attribute.

• groups-on-page —Number of groups that can bereturned per page of searchresults.

• docs-in-group —Number of documents thatcan be returned per group.

• curcateg — For service use.Takes the value “-1”.

found Estimated number of groups formed. priority — For service use.Possible values:

• “phrase”

• “strict”

• “all”

found-docs Approximation of the number of documents found for thequery.

A more precise estimate compared to the value passed in thefound tag for the block with general information about searchresults.

priority — For service use.Possible values:

• “phrase”

• “strict”

• “all”

found-docs-human

A string in the language corresponding to the search typeselected. Contains information about the numberof documents found and accompanying information.

The value that is passed should be used when formattingsearch results.

None.

page Number of the search results page returned. Numbering startsfrom zero (the first page corresponds to the value “0”). • first — Ordinal number

of the first group with searchresults that is displayed on thepage.

• last — Ordinal numberof the last group with searchresults that is displayed on thepage.

group Grouping.

Each group tag contains information about a found groupof documents.

None.

Yandex.XML Developer's guide

16

Tags for the resultsblock

Description Attributes

categ Identifying data about a group of found documents.• attr — For service use. Must

match the value passed in therequest.

• name — Unique group ID.

doccount Approximation of the number of documents that are usedfor forming the group.

Documents that potentially may be included in the groupare ranked according to the request conditions (thesortby parameter). Depending on the value of the docs-in-group parameter, from one to three of the first documentsare included in the group.

None.

relevance For service use. priority — For service use.

doc Grouping.

Each doc tag contains information about a found document.

Depending on the value of the docs-in-group parameter, eachgroup can contain from one to three of the doc grouping tags.

name — Unique ID of a founddocument.

url Address of a found document. None.

domain The domain that the found document is in. None.

title Title of the found document.

Words that are in the search query are highlighted withthe hlword tag.

None.

headline Optional. Document summary.

It is created using the HTML meta tag containingthe name attribute with the “description” value.

None.

modtime Date and time the document was changed, in the format:

<year><month><day>Т<hour><minute><second>

Attention!This tag is optional and may be omitted.

None.

size Size of the found document, in bytes. None.

charset Encoding of the found document. None.

passages Grouping tag that contains a list of document passages. None.

passage Passage with the document summary.

Words that are in the search query are highlighted withthe hlword tag.

The maximum number of passages to be passed in a singlepassages tag is defined by the value of themaxpassages parameter for the search request.

None.

mime-type The document type in accordance with RFC2046. None.

properties Grouping tag that contains document properties. None.

_PassagesType Passage type. Possible values:

• “0” — Standard passage (created from the documenttext).

• “1” — Passage based on the link text. It is used if thedocument was found via a link.

None.

Yandex.XML Developer's guide

17

Tags for the resultsblock

Description Attributes

lang Optional.

Document language.

None.

saved-copy-url Address of a saved copy of the document. None.

Yandex.XML Developer's guide

18

Request for limits for the next dayReturns information about restrictions on the number of queries that can be sent each hour.

The response contains information for each hour in the next 24 hours.

Note:Hourly limits are only applied to the “russian” type of search.

Request formathttps://yandex.<domain>/search/xml ? action=limits-info & user=<user name> & key=<key>

user Username. Must match the Yandex.Passport username that was set during registration.

key Value of the API key that was issued during registration.

Sample requestThis request returns information about hourly limits restricting the number of search queries that can be sent by the “xml-search-user” user during the next 24 hours:

https://yandex.ru/search/xml?action=limits-info&user=xml-search-user&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8

Response formatIn response to a request for hourly limits, Yandex.XML returns an XML file in UTF-8 encoding.

Note:

• If the allowed number of queries in one of the hours is exceeded, the excess queries are subtracted fromthe same hour the next day. These excesses are calculated when generating the response.

• Hourly limits are only applied to the “russian” type of search. For the other types of search, the servicereturns information for each hour about the daily limits on the allowed number of queries.

Below you will find the general structure of a resulting XML document with sample values.

<yandexsearch version="1.0"><response><limits><time-interval from="2014-07-22 20:00:00 +0000" to="2014-07-22 21:00:00 +0000">500</time-interval><time-interval from="2014-07-22 21:00:00 +0000" to="2014-07-22 22:00:00 +0000">450</time-interval><time-interval from="2014-07-22 22:00:00 +0000" to="2014-07-22 23:00:00 +0000">590</time-interval><time-interval from="2014-07-22 23:00:00 +0000" to="2014-07-23 00:00:00 +0000">600</time-interval><time-interval from="2014-07-23 00:00:00 +0000" to="2014-07-23 01:00:00 +0000">300</time-interval><time-interval from="2014-07-23 01:00:00 +0000" to="2014-07-23 02:00:00 +0000">200</time-interval>

Yandex.XML Developer's guide

19

<time-interval from="2014-07-23 02:00:00 +0000" to="2014-07-23 03:00:00 +0000">500</time-interval><time-interval from="2014-07-23 03:00:00 +0000" to="2014-07-23 04:00:00 +0000">500</time-interval><time-interval from="2014-07-23 04:00:00 +0000" to="2014-07-23 05:00:00 +0000">500</time-interval><time-interval from="2014-07-23 05:00:00 +0000" to="2014-07-23 06:00:00 +0000">100</time-interval><time-interval from="2014-07-23 06:00:00 +0000" to="2014-07-23 07:00:00 +0000">100</time-interval><time-interval from="2014-07-23 07:00:00 +0000" to="2014-07-23 08:00:00 +0000">100</time-interval><time-interval from="2014-07-23 08:00:00 +0000" to="2014-07-23 09:00:00 +0000">100</time-interval><time-interval from="2014-07-23 09:00:00 +0000" to="2014-07-23 10:00:00 +0000">200</time-interval><time-interval from="2014-07-23 10:00:00 +0000" to="2014-07-23 11:00:00 +0000">300</time-interval><time-interval from="2014-07-23 11:00:00 +0000" to="2014-07-23 12:00:00 +0000">300</time-interval><time-interval from="2014-07-23 12:00:00 +0000" to="2014-07-23 13:00:00 +0000">300</time-interval><time-interval from="2014-07-23 13:00:00 +0000" to="2014-07-23 14:00:00 +0000">300</time-interval><time-interval from="2014-07-23 14:00:00 +0000" to="2014-07-23 15:00:00 +0000">300</time-interval><time-interval from="2014-07-23 15:00:00 +0000" to="2014-07-23 16:00:00 +0000">300</time-interval><time-interval from="2014-07-23 16:00:00 +0000" to="2014-07-23 17:00:00 +0000">400</time-interval><time-interval from="2014-07-23 17:00:00 +0000" to="2014-07-23 18:00:00 +0000">500</time-interval><time-interval from="2014-07-23 18:00:00 +0000" to="2014-07-23 19:00:00 +0000">500</time-interval><time-interval from="2014-07-23 19:00:00 +0000" to="2014-07-23 20:00:00 +0000">600</time-interval></limits></response></yandexsearch>

Tag Description Attributesresponse Grouping. None.

limits Grouping.

Contains entries about hourly limits on the allowed numberof search queries.

None.

time-interval The number of search queries that can be sent duringthe specified time interval.

The boundaries of the time interval are defined by attributes.

• from — Date and time(inclusively) of the start of thetime interval the limit appliesto.

• to — Date and time (notinclusively) of the end ofthe time interval the limitapplies to.

Data format in attributes:

YYYY-MM-DD HH:MM:SS +HHMM

“HHMM” specifies the event offsetrelative to UTC0.

Attention!At this time, information abouthourly limits is output for UTC0.

Yandex.XML Developer's guide

20

Formatting resultsWhen formatting search results, you must adhere to the rules described in the License for use of the Yandex.XMLservice. The license differs for Russian, Turkish, and Worldwide search types.

Every page generated using Yandex.XML must contain:

• A link to the Yandex home page, formatted as a logo.

• Text with information about the number of documents found (“NNN pages found”). The information aboutthe number of documents found is passed in the found-docs-human tag in the XML file with searchresults.

The links to logos that must be used depending on the background color, along with formatting examples,are provided in the table below.

Background color Logo Formatting example

Black/dark Download.

White font and a red letter “Y”.Transparent background.

Red Download.

White font. Transparent background.

White/light Download.

Black font and a red letter “Y”.Transparent background.

Protection from robotsSearch queries can be submitted not only by users, but by robots, as well. When there is a flood of queries fromrobots, you may exceed the limitations applied for usage of the Yandex.XML.

To prevent unauthorized access to the search by robots, a security algorithm is used. If it is suspected that a querywas submitted by a robot, a CAPTCHA is returned instead of search results (see this Wikipedia article aboutCAPTCHA).

To use the algorithm for protection from robots, the partner must pass information about the IP address and the"spravka" cookie for the request's author. The "spravka" cookie is generated on the Yandex.XML side and isreturned the first time the user accesses search results. In the value that is received, the partner must replacethe domain with his own, and then add the following string to the search response:

Set-Cookie: spravka=...

Information about the IP address and the "spravka" cookie are passed in the request header in the format:

X-Real-Ip: 99.999.999.99Cookie: spravka=<value passed from Yandex>

The diagram below illustrates the steps performed for protection from robots.

Developer's guide

Yandex.XML Developer's guide

21

1. The user sends a query to the Yandex.XML partner.

2. The search query is sent to the Yandex.XML service. The request must match the specified format.

3. Yandex.XML initiates the algorithm for protection from robots. The values of the IP address and "spravka"cookie (if previously issued) are used for verification.

Possible results of verification:

• The request was probably not sent by a robot. The process continues to step 13.

• The request was probably sent by a robot. The decision is made to display a CAPTCHA.

4. Yandex.XML returns the partner an XML file in the following format:

Developer's guide

Yandex.XML Developer's guide

22

<?xml version="1.0" encoding="utf-8"?><yandexsearch version="1.0"><response> <error code="100">Robot request</error></response><captcha-img-url>http://captcha.image.gif</captcha-img-url><captcha-key>CAPTCHA ID number</captcha-key><captcha-status>Status</captcha-status></yandexsearch>

5. The user is returned a page containing a CAPTCHA.

6. The user sends the CAPTCHA value to the partner.

7. The partner sends the CAPTCHA value obtained from the user via a GET request in the following format:

https://yandex.ru/xcheckcaptcha?key=<CAPTCHA ID number>&rep=<CAPTCHA value entered by user>

8. The value received is checked by the Yandex.XML service. If the CAPTCHA value was entered incorrectly,the process continues to step 4. In addition, the captcha-status parameter is passed with the value“failed”.

9. If the CAPTCHA value was entered correctly, Yandex.XML issues the user a "spravka" cookie and passesit to the partner in the header with the following format:

HTTP/1.1 200 OKSet-Cookie: spravka=<cookie value>

If the request passed to Yandex.XML in step 1 was saved successfully, the process continues to step 12.

10. The partner lets the user enter a query.

11. The user sends a query to the Yandex.XML partner.

12. The search query is sent to the Yandex.XML service. Along with the request, the user's IP addressand "spravka" cookie are passed.

13. Yandex.XML processes the search query and generates results.

14. An XML file with search results is returned to the partner.

15. The partner returns the processed response to the user. If in step 9 the Yandex.XML issued a "spravka"cookie, it is saved on the user's computer.

Tip:To try out how this flow works, use this script.

Verifying correct CAPTCHA displayTo get familiar with the response format returned by Yandex.XML when a CAPTCHA is displayed, senda request (the value of the query parameter of the search request) with the following string:“e48a2b93de1740f48f6de0d45dc4192a”.

The following GET request can be used by the user “xml-search-user” for reviewing the response format returned whena CAPTCHA is displayed:

wget -q --header="X-Real-Ip: 127.0.0.1" -SO- 'https://yandex.ru/search/xml?user=xml-search-user&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=e48a2b93de1740f48f6de0d45dc4192a&showmecaptcha=yes'

Developer's guide

Yandex.XML Developer's guide

23

Questions and answers

What is XSLT?XSLT is a language for converting and rendering XML documents and is a part of the set ofXSL recommendations.

Detailed information about the XSLT language is provided in the following documents:

• Extensible Stylesheet Language (XSL).

• XSL Transformations (XSLT).

NotificationsWhat are notifications?Notifications are a service for automatically sending email when problems arise during use of Yandex.XML.The email address, settings for sending notifications, and thresholds are all set during registration.

What should I do if I get a notification that the number of requestshas sharply decreased?The table below shows possible reasons for a decreased number of requests, how to diagnose it, andrecommended solutions.

Reason Diagnostic methods Recommended solutions

Decreased number of searchesperformed. For example, thismay occur due to natural variation in thenumber of visitors depending on theday of the week or the time of day.

Review the site usage statistics for daysof the week and time of day.

Increase the notification threshold on the Settings page.

Unavailability or partial availabilityof Yandex.XML on the website.

Try submitting several search queriesyourself. Check the accuracy of theresults that are returned.

Check whether the request formatis correct.

What should I do if I get a notification that the number of requestshas sharply increased?The table below shows possible reasons for an increased number of queries, how to diagnose it, andrecommended solutions.

Reason Diagnostic methods Recommended solutions

Increased number of searchesperformed. For example, thismay occur due to natural variation in thenumber of visitors depending on theday of the week or the time of day.

Review the site usage statistics for daysof the week and time of day.

Increase the notification threshold on the Settings page.

DoS attack. Check the server log files for datasuggesting a DoS attack.

What should I do if I get a notification that there were no requestsfor a 24-hour period?Check how the search is working on the site.

Yandex.XML Developer's guide

24

If statistics show a sharp decrease in the number of queries made from the site, it is possible that this is due to thesearch not working correctly.

What should I do if I get a notification that the number of requestsis approaching the daily limit?Review the restrictions applied to the service and ways to get around them. Contact a Yandex representativeto discuss details of extending search features.

IP addressWhy is an IP address required for registration?The IP address, in combination with a Yandex.Passport account, is used for identifying a Yandex.XML user.The results of user identification determine the restrictions applied to service usage.

How do I find out what my IP address is?The way to determine the IP address depends on the type of computer being used to access the Yandex.XMLservice.Device type Possible methods for detecting the IP address

Server• Ask your provider for the IP address.

• Set up a remote connection to the server and run the ipconfig command (Windows)or ifconfig (Unix).

• Run the ping <server name> command from the command line of a personal computer.

Personal computer• Use the Yandex.Internetometer.

• If static addresses are used, ask your service provider.

Note:Note that if a modem is used, the IP address can change each time a connection is made.

The IP address being registered is in useThe table below shows possible reasons and solutions.

Reason Possible solution

An open proxy server is being usedto access Yandex.XML.

User your Internet provider's proxy server.

A modem is being used to accessthe Internet.

Your provider assigns a dynamic IP address, which can change each timeyou connect. Try disconnecting and reconnecting to the Internet.

The service is being accessed froma server.

Obtain a dedicated IP address.

Additional search featuresSetting up site searchTo restrict the search to the website only, use the host operator.

Syntax:

<query text> host:<URL of the site to search on>

Yandex.XML Developer's guide

25

The following request is used for searching for the phrase “search settings” on the website https://help.yandex.ru/:

search settings host:help.yandex.ru

Restricting the search to a region or categoryTo restrict the search to documents that are relevant to a particular region or category, use the cat operator.

Syntax:

<query text> cat:<adjusted ID of the region or category>

For the value of the cat operator, pass the adjusted value of the region ID (added to “11000000”) or categoryID (added to “9000000”).

The request may specify multiple regions and categories. To do this, use the logical operators “AND”(“&amp;&amp;”) and “OR” (“|”).

The following request is used for searching for the word “meat” in documents that are relevant to the category “bodybuildingnutrition” (ID “3783”) in the city of “Samara” (ID “51”):

meat cat:11000051 &amp;&amp; cat:9003783

Search in resultsTo set up a search in results, use the &amp;&amp; operator.

Syntax:

(<original query text>) &amp;&amp; (<query text to search for in results>)

The following request is used to search for documents with the phrase “manual transmission” in results for the query“autos”:(autos) &amp;&amp; (manual transmission)

EncodingHow to correctly set encoding for a request being sent?The request encoding is set in the header of the XML file:

<?xml version="1.0" encoding="<encoding>"?>

Which encoding is used for sending the search response?The XML file with search results is sent in UTF-8 encoding. To convert it to a different encoding, you can use alibrary such as the libiconv library or the Convert::Cyrillic module.

Incorrect characters in the responseIn most cases, incorrect characters in the response are the result of sending the request at the socket level.

Possible solutions:

• Use HTTP version 1.0 instead of 1.1.

• Use a higher-level type of interface.

• Configure handling for chunked responses.

Yandex.XML Developer's guide

26

AppendicesValidating XML files

To prevent incorrect processing of search queries, at the testing stage we strongly recommend validatingXML files that are generated for requests using the POST method.

You can validate XML files in the Yandex.Webmaster service. For validation, the XML request schema is used.

To validate a file, follow these steps:

1. Open the XML feed validator page.

2. Select others schemas → link in the Standard validation schema group.

3. Set the value “https://download.cdn.yandex.net/tech/ru/xml/doc/dg/files/request.xsd” in Specify the linkto your XSD schema.

4. Set one of the ways to pass the contents of the XML document in the Feed for validation field.

5. Press the Check button.

If the XML file complies with the schema, the message “XML complies with the XSD schema” is returned.If inconsistencies are discovered, it returns information about the line where you should look for an error.

Error codesWhen search requests are processed incorrectly, the server response contains the error tag.

Format:

<error code="error code">Error description text</error>

The table below lists codes and descriptions for common errors that occur when processing search requests.

Error code Description

1 The query text (the value passed in the query element) contains a syntactical error.

For example, a query was sent that contained only two slash symbols in a row (“//”).

2 An empty search query was defined (an empty value was passed in the query element).

15 There are no search results for the specified search query.

18 The XML file cannot be validated, or invalid request parameters are set. Possible reasons:

• Incorrect tags or tag values were passed.

• The request body contains non-escaped special characters. For example, the ampersandsymbol (“&”), and so on.

• The request page contains search results with more than 1000 entries. For example, if eachpage contains 10 results, this error will be returned when attempting to request page 101 andfurther in results.

19 The search query contains incompatible parameters (for example, incompatible values for thegroupings element).

20 The reason for the error is unknown. If the error persists, contact the support service.

31 The user is not registered on the service.

32 Limit exceeded for the number of queries allowed per day. Review the information aboutrestrictions and choose a suitable method for increasing your daily quota.

Appendices

Yandex.XML Developer's guide

27

Error code Description

33 The IP address that the search request was sent from does not match the one(s) set duringregistration.

34 The user is not registered on the Yandex.Passport service.

37 Error in request parameters. Maybe mandatory parameters were omitted, or mutually exclusiveparameters were defined.

42 The key that was issued during registration contains an error. Check whether the correct addressis used for sending requests.

43 The version of the key that was issued during registration contains an error. Check whetherthe correct address is used for sending requests.

44 The address that requests are sent to is no longer supported. Correct the value to matchthe address that was given during registration.

48 The search type that was specified during registration does not match the search type that is beingused for requesting data. Reset the domain that is being used to the correct domain. For corrections,use the URL for sending requests.

55 The number of requests sent during a second (RPS) exceeded the allowed value.

100 The request was most likely sent by a robot. When this error appears, a CAPTCHA must be returnedto the user.

Search regionsThe region to give preference to when generating search results is defined by the value of the lr parameterof the search query. Countries, federal subjects, and cities can be specified as the region.

A list of IDs for commonly used countries is provided in the table below.

ID Country

225 Russia

187 Ukraine

149 Belarus

159 Kazakhstan

See alsoOther popular regions

Appendices

Yandex.XML Developer's guide

28

Feedback

Developer's guide

Yandex.XML Developer's guide

29

Yandex.XMLDeveloper's guide

22.05.2018