Skip to content

Internal binary protocol

Aleksey N. Vinogradov edited this page May 1, 2025 · 12 revisions

Internal protocol is the one which used by old-fashion sphinx clients, and it is implemented with different kinds of language APIs as php-api, python-api, c-api and so on.

From the beginning internal proto was never documented; instead developers provided ready-to-use libraries which implemented the proto and gave access to manticore's functionality. That API reference mainly expressed in PHP terms is still available in documentation. However supporting ready implementations finally was considered as not so easy, so that implementing of new commands and flavours of the proto was frozen, and customers was recommended to use 'sphinxql' proto instead, leaving internal proto solely for communication between 'master' and 'agents' hosts in distributed indexes. That is exact text you may found in the header of any native API libraries provided by Manticore Software:

// WARNING!!!
//
// As of 2015, we strongly recommend to use either SphinxQL or REST APIs
// rather than the native SphinxAPI.
//
// While both the native SphinxAPI protocol and the existing APIs will
// continue to exist, and perhaps should not even break (too much), exposing
// all the new features via multiple different native API implementations
// is too much of a support complication for us.
//
// That said, you're welcome to overtake the maintenance of any given
// official API, and remove this warning ;)

Purpose of this document is near this message; that is written exactly to describe native API protocol in human language, in order to involve community people to write concrete implementations.

Time changes, and for now native API implemented in PHP is not looking so fast as, say, speaking with daemon using mysql-client or PDO library using sphinxql proto. And that is quite clear, since working with native API implies low-level working with raw bytes, chars, byte order and so on. That is really simple by nature, but is quite complex for interpreted language like PHP. PDO client, in turn, performs all these low-level operations using internal routines, which are written on compilable languages like C or C++ and so are quite fast because of it. Even the fact that mysql is actually textual protocol which NEED to be interpreted back to native representation is still not makes it slower than native API where each individual string or float are manually packet into blob using interpreted PHP code.

However, compiled implementation still shows 20-30% gain of native API over sphinxql. That is may be not so useful, since many developers now prefer lite interpreted languages, but among of them we now also have compiled Go lang, which is also easy, but compiled and may achieve all the profits of native API. Regarding this language, some examples will be provided using it's syntax.

Apart 'client' api manticore also uses special extended dialect of the proto, which is in play in distributed indexes; that is 'agent' flavour, the proto which uses head daemon (one which has distributed indexes in it's config) to speak with remote daemons (agents). That is related only to 'search' command, and that extension provides some special fields and attributes which help head daemon to perform grouping and calculate aggregate values like 'avg'. Details will be described in documentation on 'search' command below.

Internal protocol is described from lowest primitives, and then to more complex packets and essencies.

Primitives

Protocol uses bytes (octets), which organized into numbers and other essencies. Later terms byte and octet are considered as same, and means just 8-bits integer number.

Byte order

Until explicitly noted, all numbers are send in network byte order, i.e. big-endian. For example, 32-bit DWORD 0xDEADBEEF will be sent as sequence of octets:

0    1    2    3
0xDE 0xAD 0xBE 0xEF

That is done by well-known functions like htonl (in C/C++), or by binary.BigEndian package (Go).

Integers

  • byte (BYTE) = 1 octet
  • short (WORD) = 2 octets
  • unsigned integer (DWORD) = 4 octets
  • signed integer (int) = 4 octets
  • unsigled long (uint64) = 8 octets
  • signed long (int64) = 8 octets

Floats

  • float = 4 octets

Internally floats are coded according to IEEE-754 standard. To send over network, floats aliased to DWORDs, and then sent as DWORDs. Casting may be done by union like:

/// float vs dword conversion
inline DWORD sphF2DW ( float f )	{ union { float f; DWORD d; } u; u.f = f; return u.d; }

or, in Go lang:

d:=math.Float32bits(f)

Arrays

All blobs with unknown size are prepended with 32-bits signed integer contaning the number of objects. For example, array of 3 bytes {1, 2, 40} will be send as sequence of 7 octets:

0  1  2  3  4  5  6   
00 00 00 03 01 02 28

where first 4 octets are occupied by integer, in big-endian order, and then follow objects (bytes) themselves.

In following description terms as int array, uint64 array, byte array are used to define arrays of values of mentioned type. So, when you see uint64 array, it assumes that following octets, first, contains 32-bits signed int with quantity of uint64 numbers following by octets representing numbers themselves, as:

00 00 00 02
00 00 00 00 00 00 00 22
00 00 00 00 00 00 00 33

which represents uint64 ar[2] = {0x22ULL, 0x33ULL}

Also they may be called as array of ints, array of bytes, which is obviously same.

Also you can see clause as array of objects, or objects array, where you can guess only quantity of objects, but not their format. In such case format is described separately.

Strings

Strings internally expressed as arrays of octets; and so they send the same way as any other arrays. By default for now all strings represented in utf-8 encoding. Note that length of the string in codepoints (chars) is not necessary same as length in octets (bytes). This may be true for ASCII characters, but is not so for national texts, like russian, chinese, etc. When sending over network, length of the strings measured exactly in octets, NOT in codepoints!

Say, string 'hello' will be send as 9 octets:

0  1  2  3  4  5  6  7  8
00 00 00 05 68 65 6C 6C 6F
.  .  .  .  h  e  l  l  o

String 'привет' will be send as 16 octets:

0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
00 00 00 0C D0 BF D1 80 D0 B8 D0 B2 D0 B5 D1 82
.  .  .  .  п     р     и     в     е     т

Communication

Clients work with Manticore daemon over socket connection, which may be either unix-socket, either internet TCP socket. In last case daemon internally works with TCPv4 addresses, however if system provides kind of mapping between ipv6 and ipv4 address, it will be also accepted.

Connection is session-based; inside session communication is dialogue-based; i.e. client send a command, receives answer, then m.b. send another, receive another answer and so on.

During a session manticore has 2 kinds of different packages. They are

  1. Handshake
  2. Command

Handshake

Handshake is initial message which has to be send and responsed right after establishing stream connection. Each side send a DWORD with it's version, and opposite side receives and check it. Handshake is the only command where values may be send in any byte order, i.e. as network, as little-endian, and opposite side must be aware of it and check both cases.

Couple of valid examples:

  1. Usual workflow

    server says> 00 00 00 01
    client says> 00 00 00 01
    server receives < 00 00 00 01
    client receives < 00 00 00 01	
    
  2. Interleaved workflow

    server says> 00 00 00 01
    client receives < 00 00 00 01
    client says> 00 00 00 01
    server receives < 00 00 00 01
    
  3. Heterogenic arch workflow

    server says> 00 00 00 01
    client says> 00 00 00 01
    server receives < 00 00 00 01
    client receives < 01 00 00 00	
    

(in last example server send 0x00000001, but client received 0x01000000, and that is also valid).

Valid versions of server's and client's versions in handshake for now is only 0x00000001.

After handshake connection may be regarded as established. However note, that if daemon can't accept connection because of maxed out connection pool, it will send retry error command right after handshake, and then drop connection. So, after handshake client must expect more data from daemon, than just 4 bytes of handshake response, and react accordingly if receives retry command. That is critically important, since after daemon drops the connection because of 'maxed out' state, it will not receive any more data, and so, if client try to send it's command despite of possibly error, it will not luck and even may freeze for a while until streaming protocol recognizes dropped endpoint connection.

Retry response

This response is send by daemon right after handshake in case it can't serve the connection right now. The error is not critical for usual workflow, however it is critical for current connection.

Response takes about 50 octets and has the structure:

WORD 0x0002 // status code SEARCHD_RETRY
WORD 0x0000 // version of the status code (0)
DWORD nBytes // total length of following payload
string payload_error_message // the message with reason  

Typical error messages hardcoded in daemon's sources are

  1. "maxed out, dismissing client"
  2. "server maxed out, retry in a second"
  3. "failed to create worker thread"

First and second are the same, just coming from different workers (first came from thread pool, second from threads), they have same meaning. Third one is kind of worse, since it originates not from config limitation for 'max_children', but from system thread creation, it is kind of unknown by effect (may be retry will help, may be not).

For example, full session with maxed out message may look like:

client > 00 00 00 01 // handshake
daemon > 00 00 00 01 // answer to handshake
daemon > 00 02 00 00 // 'retry' status code and version
daemon > 00 00 00 20 // length of payload (that is string length and string chars which follows)
daemon > 00 00 00 1C // string lentgh (28 chars, big-endian)
daemon > "maxed out, dismissing client"
< daemon drops connection >

If retry response is not arrived and connection is not dropped, session regarded as healthy, and starting from that moment client and daemon may send messages to each other. Usually client sends message with necessary command, and then daemon answer with message with response (status code and body).

NOTE: handshake answer is ALWAYS sent; retry response is MAYBE sent.

Message

All messages coming from client to daemon and from daemon to client organized in typical structure with code, version and payload. That is:

 WORD code      // command (from client) or status (from daemon)
 WORD version   // version (0 for error and retry statuses)
 DWORD nBytes   // size of following payload (in octets)
 ... (blob of nBytes octets follow)

code is specific to command or response. Client puts there desired command code; daemon, in turn, answers with status code. version is critically important to commands which client sends to daemon. This WORD is treated as fixed-point float, as MAJOR.MINOR, where major part is high byte of WORD, and minor is low byte. I.e., command version 1.5 is encoded as WORD 0x0105. Daemon has internal table with command it supports, and on receiving command message it will always look into this table and check the version of incoming command. Major version num MUST match between daemon and client. Whole version of incoming command MUST be less or equal to one daemon supports.

When answering back, response also contains version field. For error codes it is usually '0'. For real success codes it usually contains version of the command from daemons' internal support table. But usually this number is not necessary to interpret and it may be safely dropped.

Optimization

Simplest session consists from handshake exchanging, then one message from client followed by the answer message from daemon, and then connection closed by both sides. So, it assumes at least 4 packages of dialogue itself (pair for handshake, pair for query/answer), and that is so despite the package exchanging during TCP session establishing. That may look ok if you have just one query, but if you have thousands it sounds not so optimal. Let's describe a little things which may help to solve it.

  1. Since usual workflow assumes exchanging with small packets, it is highly desirable to switch off TCP Naggle algorithm to avoid unnecessary delays. Consider to use TCP_NODELAY on your client connection.

  2. Among the others daemon supports special command 'PERSIST', which explicitly says, that you're going to send many commands next. In this case after first command query connection will NOT be dropped, but client can save opened socket and reuse if for following queries. That leads to the flow, that all network communication contains only 'query' and 'response' packets, without need to also have TCP establishing and handshake exchange each time. The cons of that approach is that daemon has permanently opened connection, and since number of available TCP ports is limited, you may not serve as many clients simultaneously, as you want. Also, number of simultaneous session is limited in daemons' config as 'max_children' value.

  3. Client may merge hadshake and command message into one blob, and send it in one TCP packet, and even in initial TCP SYN packet using TCP Fast Open technique. That is also supported, and in practice gives the same effect as using persistent connections, apart the fact that hadshake eats 4 bytes from that package.

Protocol details

Here is details about all commands, status codes and specific payloads.

Status codes

Daemon has 4 different status codes

  1. SEARCHD_OK = 0, ///< general success, command-specific reply follows
  2. SEARCHD_ERROR = 1, ///< general failure, error message follows
  3. SEARCHD_RETRY = 2, ///< temporary failure, error message follows, client should retry later
  4. SEARCHD_WARNING = 3 ///< general success, warning message and command-specific reply follow

For example, if client wants to ping server using API's 'ping' command, it will looks like:

client > 00 09 01 00    // command ping (09), version 1.0 (=0x100)
         00 00 00 04    // size of payload (4 octets)
         DE AD BE EF    // payload for ping command (just a DWORD which will be send back)
daemon > 00 00 01 00    // status (ok) and version (0x100, not used)
         00 00 00 04    // size of payload (4 octets)
         DE AD BE EF    // body of payload (command-specific, for ping is DWORD)

If, suppose, 'ping' command return a warning (that is not true for this command, but let it be for example):

client > 00 09 01 00    // command ping (09), version 1.0
         00 00 00 04    // size of payload (4 octets)
         DE AD BE EF    // payload for ping command (just a DWORD which will be send back)
daemon > 00 03 01 00    // status (warning) and version (0x100, not used)
         00 00 00 0D    // size of payload (13 octets)
         00 00 00 09 'a warning' // warning string
         DE AD BE EF    // usual body of payload (command-specific, for ping is DWORD)

That is, first in payload follows one(1) string with warning message, then usual command-specific response follows.

Just a note - here in example 00 00 00 09 'a warning' is the length, and then sequence of chars: 0x00, 0x00, 0x00, 0x09, 'a', ' ', 'w', 'a', 'r', 'n', 'i', 'n', 'g', each one is ASCII code of corresponding character, totally 4 (quantity) + 9 (chars) octets. That is exactly the num expressed by size of payload, i.e. 13, or 0x00 0x00 0x00 0x0D above, expressed in big-endian DWORD. Further examples will follow same model without explicit explanations each time.

For error message it will look like:

client > 00 09 02 00    // command ping (09), version 2.0
         00 00 00 04    // size of payload (4 octets)
         DE AD BE EF    // payload for ping command (just a DWORD which will be send back)
daemon > 00 01 00 00    // status (error) and version (0, not used)
         00 00 00 3E    // size of payload (62 octets)
         00 00 00 3A `major command version mismatch (expected v.1.x, got v.2.0)` // error string

Retry error message response with possible text messages is already described above.

Supported command codes and versions

Here is the list of all commands which manticore supports. Some of them are purely internal and not exposed to network; some used only in master-agent communication, and also some public ones.

All commands organized in described messages, it implies that all necessary prefixes with command code, version, and also size of payload are always attached as necessary. So, below described only payloads, assuming header is indeed present.

Public API commands

These commands are exposed and implemented by existing sphinx API packages.

SEARCHD_COMMAND_SEARCH		= 0 // version 1.33 (0x121)
SEARCHD_COMMAND_EXCERPT		= 1 // version 1.4 (0x104)
SEARCHD_COMMAND_UPDATE		= 2 // version 1.3 (0x103)
SEARCHD_COMMAND_KEYWORDS	= 3 // version 1.1 (0x101)
SEARCHD_COMMAND_PERSIST		= 4 // special, not versioned
SEARCHD_COMMAND_STATUS		= 5 // version 1.1 (0x101)
SEARCHD_COMMAND_FLUSHATTRS	= 7 // version 1.0 (0x100)

Private API commands

These commands are mostly internal and used in master-agent communication (when you invoke corresponding commands to distributed indexes). However they're not 'secret', but just dismissed in support, so you're welcome to implement and share your implementation.

SEARCHD_COMMAND_SPHINXQL	= 8  // version 1.0 (0x100)
SEARCHD_COMMAND_PING		= 9  // version 1.0 (0x100)
SEARCHD_COMMAND_UVAR		= 11 // version 1.0 (0x100)
SEARCHD_COMMAND_JSON		= 16 // version 1.0 (0x100)
SEARCHD_COMMAND_CALLPQ 		= 17 // version 1.0 (0x100)
SEARCHD_COMMAND_CLUSTERPQ	= 18 // version 1.0 (0x100)
SEARCHD_COMMAND_GETFIELD	= 19 // version 1.0 (0x100)

Not exposed commands

Since these commands for current moment are not implemented for networking, they have no assigned version. However if they will be implemented later, they most probably will take version 1.0 (0x100).

SEARCHD_COMMAND_DELETE		= 10
SEARCHD_COMMAND_INSERT		= 12
SEARCHD_COMMAND_REPLACE		= 13
SEARCHD_COMMAND_COMMIT		= 14
SEARCHD_COMMAND_SUGGEST		= 15
SEARCHD_COMMAND_SEARCH command payload for version 1.33

This is main command of the daemon; it performs the goal for which the whole software was created, namely - full-text search. And this is most complex by structure.

Structure of payload will be described from outer layer down to mostly internals.

Most generally search command consists from these parts:

DWORD master_version
array of queries
master_version

First field, master_version selects dialect of search command. Namely manticore distinguishes at least between 'public API' clients and 'distributed index' clients.

API clients usually perform queries which return results for end-user, as explicitly defined schema, expected fields, tokens and weights and so on. JSON attributes (if any) returned to client as formatted texts.

Remote distributed indexes, in turn, may expect additional service fields which help to group and calculate final result from resultsets arrived from different remote agents. Also JSONs (if any) returned with internal BSON form, which is not human readable, but assumes faster processing.

Legal values for master_version are:

0. Remote client is usual public API client.

All values >0 means that remote client is remote distributed index (i.e. another instance of searchd, which has current host mentioned in agent section of the config).

Generic difference between 'client' and 'agent' dialects, as already mentioned, is extended schema which daemon provides to 'agent' queries, in opposite to exact schema requested by 'client'. The simplest example to illustrate ist, is a query, like:

SELECT a, b from idx order by c

Here we exactly want only columns a and b. And this is exactly what daemon will return to us in 'client' mode. However imagine that you're one who actually calculates the query, and you also see order by c at the end. Having strict columset of only a and b you just can't calculate order by c. So, in the case remote client states that it is not just a client, but agent, and daemon looking to this fact will provide also column c in order to make whole query be possible to calculate.

Another example:

SELECT avg(a) from idx group by b

From client's end we expect only a column of numbers, which is result of avg(a) calculation. But imagine what you need on distributed indexes with, say, 2 agents. Apart the number for avg(a) from each of them, it also needs value of b in order to perform grouping by, and also number of matches in every group - in order to correctly calculate result between two already calculated averages. So, in this case daemon will provide also columns of @groupby, @distrinct, and @count.

These are quite clear difference, and since it may be useful even for end-point clients, Manticore has simple mechanism to tell remote host that it expects response in agent mode. To achieve this you need to start column's select list from double asterisk *,*. That is, SELECT a FROM idx ORDER BY b tells about client side and will return only the column a. However SELECT *,*,a FROM idx ORDER BY b will explicitly sign that we are remote distributed agent, and daemon will return columns a and b to us.

This special mark *,* may be used instead of providing non-zero master_version value in case you want just such simple thing as additional columns.

Described below concrete values of master_version also influences to command message format, providing sometimes another things, than just extra-columns. Concrete numbers, until specially pointed, need to consider as edge values, or, literally, as clause 'starting from version N'.

1. Query contains collation. That is one of LIBC_CI (default), LIBC_CS, UTF8_GENERAL_CI and BINARY. Details described below.

2 Query contains outer ORDER BY clause and m.b. also limit. That is used in calculating subselects. Daemon will sort resultset to make outer agent more happy when grouping results.

5 Filters in query have flag of equal bounds. That makes possible to pass not only 'less' or 'greater', but also 'less or equal' and 'greater or equal' filters.

6 Query contains limit for 'N best group by' clause. That clause entered in SphinxQL, but wasn't known to API proto, so it is extension.

13 Filters in query defines MVAFUNC, which may be 'ANY' (default), but also 'NONE' and 'ALL'

14 Query contains UDF plugin name and plugin's options. That make agent possible to use UDF ranker.

15 Query contains items and reference items for FACETs, which make FACET queries working in distributed case. Also filters in query passes properties as 'equal min', 'equal max' (in opposite to only one 'equal edge in v.5), and also 'open left' and 'open right' to define unlimited edges.

16 Query contains 'expand_keyword' flag.

Below are descriptions of concrete fields of search queries with variants and versions.

array of queries

Search queries may be batched together. I.e. you may send not only one, but several searches in one packet, and then expect the same number of answers in response. That is not only helps to avoid extra network traffic, but also assumes different internal optimizations. Most prominent is common sub-trees optimization. That is when you have one select-list which produces a num of matches, and then different set of group-by/order-by clauses, which produces different numbers for faceted search from this set of matches. Such set of queries may be obviously optimized by calculating matches only once, and then filtering/ordering them differently. And that is exactly achieved by packing multiple similar queries into one batch command. As usual, at the beginning of array it is quantity of the objects inside, in case of queries it just tells how many queries follow in the batch. That is mandatory (but for only one query it is, obviously, 1).

Search query

Search query is a complex structure containing different fields. They follow one-by-one, octet-aligned, and they are:

 1. DWORD qflags
 2. int offset
 3. int limit
 4. int mode
 5. int ranker
 6. [string ranker_expression]
 7. int sort
 8. string sort_by // as '@weight desc'
 9. string query_or_rawquery
10. int array of weights // m.b. empty
11. string list_of_indexes
12. int range64 // always 1
13. uint64 min_docid // as 0
14. uint64 max_docid // as DOCID_MAX
15. filters array // m.b. empty
16. int group_func
17. string group_by
18. int max_matches
19. string group_sort // as '@groupby desc`
20. int cutoff
21. int retry_count
22. int retry_delay
23. string group_distinct
24. int has_geoanchor
25. [ string geolatattr ]
26. [ string geolongattr ]
27. [ float fgeolatidute ]
28. [ float fgeolongitude ]
29. array of index weights // m.b. empty
30. DWORD query_timeout
31. array of field weights // m.b. empty
32. string comment
33. array of overrides // m.b. empty
34. string select_list
35. [ int max_predicted_msecs ]
36. string "" // deprecated, now simple a gap
37. int 0
38. int 0
39. int has_outer
40. [ master-agent extensions ]
41. string query_token_filter_lib ("")
42. string query_token_filter_name ("")
43. string query_token_filter_opts ("")
44. array of filter-tree
45. [ more master-agent extensions ]

(all detailed descriptions placed below. So, if something from that structure is not immediately clear, just scroll down and investigate details)

1. DWORD qflags

That is bit field wich describes following query flags and properties

QFLAG_REVERSE_SCAN	= 1 // 'reverse_scan', direction of full-scan
QFLAG_SORT_KBUFFER	= 2 // 'sort_method'; pq when 0, kbuffer when set
QFLAG_MAX_PREDICTED_TIME = 4 // `max_predicted_time` option. If set, value 35 (int max_predicted_msecs) in search query is present
QFLAG_SIMPLIFY		= 8 // 'boolean_simplify' flag
QFLAG_PLAIN_IDF		= 16 // idf is `plain`, otherwise `normalized`
QFLAG_GLOBAL_IDF	= 32 // 'global_idf' option. Use global idf or not
QFLAG_NORMALIZED_TF	= 64 // idf is `tfidf_normalized`, or `tfidf_unnormalized`.
QFLAG_LOCAL_DF		= 128 // 'local_df' flag
QFLAG_LOW_PRIORITY	= 256 // 'low_priority' option
QFLAG_FACET		= 512 // whether this is part of facet batch query
QFLAG_FACET_HEAD	= 1024 // if it is main facet query
QFLAG_JSON_QUERY	= 2048 // if it is JSON or API query.

The meaning of reverse_scan, sort_method, max_predicted_time, boolean_simplify, idf, global_idf, local_df and low_priority are described in options of SELECT syntax of SphinxQL reference. The only thing may be necessary to clearify is idf, since it is described as comma-separated list of IDF computation flags. That list is mapped into QFLAG_PLAIN_IDF and QFLAG_NORMALIZED_TF values, so that OPTION idf='normalized,tfidf_normalized' maps to bit QFLAG_PLAIN_IDF not set and QFLAG_NORMALIZED_TF set.

QFLAG_MAX_PREDICTED_TIME is not only bit in the flags, but also directly rules whether param 35 of search query is present or not. If bit is not set, param is skipped and right after param 34 (select_list) follows param 36 (emulated sub-query); otherwise param 35 present and express max predicted search time.

QFLAG_FACET and QFLAG_FACET_HEAD transfers depended queries which pefroms FACET searches (see command FACET in SELECT syntax of SphinxQL). Flag 'FACET_HEAD' marks the main query, and then one or more queries in the batch may be marked by 'FACET' flag also. Such combination 'links' head query with children and organizes one faceted query. Flags are mutually exclusive; i.e. a query might be either faced head, either faced member, either none, but not head and member simultaneously. Member queries differs from head query only by select list; the rest of the settings just copied from head to children.

QFLAG_JSON_QUERY indicates that query (that is ft-clause usually met in WHERE MATCH('...')) is json query. I.e. it is written as json with syntax described in HTTP API reference in section json/search - Fulltext queries. In opposite, usual query has another syntax which is described in Searching - Extended query syntax section of the docs. Since all are finally represented by plain texts, only this flag actually indicates whether the query contains search, or json syntax.

2. int offset
3. int limit

Define pagination of resultset; limit shows amount of matches to return, offset sets offset into server-side result set. Internally it is also implicitly invoked when hidden key 'divide_remote_ranges' is in game.

4. int mode

That is matching mode, described in matching modes. Actual numeric values are:

SPH_MATCH_ALL = 0
SPH_MATCH_ANY = 1
SPH_MATCH_PHRASE = 2
SPH_MATCH_BOOLEAN = 3
SPH_MATCH_EXTENDED = 4
SPH_MATCH_FULLSCAN = 5
SPH_MATCH_EXTENDED2 = 6
5. int ranker

That is ranking mode, described in available built-in rankers. Actual numeric values are:

SPH_RANK_PROXIMITY_BM25 = 0, // default
SPH_RANK_BM25 = 1
SPH_RANK_NONE = 2
SPH_RANK_WORDCOUNT = 3
SPH_RANK_PROXIMITY = 4
SPH_RANK_MATCHANY = 5
SPH_RANK_FIELDMASK = 6
SPH_RANK_SPH04 = 7
SPH_RANK_EXPR = 8
SPH_RANK_EXPORT = 9
SPH_RANK_PLUGIN = 10

All the rankers described in the docs, except the last two:

SPH_RANK_EXPORT export ranker that emits BM25 as the weight, but computes and export all the factors. It may be useful for research purposes, eg. exporting the data for machine learning.

SPH_RANK_PLUGIN uses ranking by UDF plugin, it need the details as UDF plugin name and options. These fields provided only in master-agent extension v.14, so using of that kind of ranking assumes setting of >=14 master_version in the header, and so implies using agent flavour of the command.

6. string ranker_expression

That is optional value, it present in query only of ranker is SPH_RANK_EXPR(=8) or SPH_RANK_EXPORT(=9). In first case the string contains ranking formula, as described in Available build-in rankers section.

7. int sort

That is sorting mode according to sorting modes. Concrete numeric values for the modes are:

SPH_SORT_RELEVANCE = 0
SPH_SORT_ATTR_DESC = 1
SPH_SORT_ATTR_ASC = 2
SPH_SORT_TIME_SEGMENTS = 3
SPH_SORT_EXTENDED = 4
SPH_SORT_EXPR = 5
8. string sort_by

That is sorting clause. It may be empty, but it is mandatory (can't be skipped). It provides param for all sorting modes, except the SPH_SORT_RELEVANCE. By default it may be set as @weight desc.

9. string query_or_rawquery

That string mainly intended to be used for logging purposes, since the query itself at this moment usually already parsed and it is impossible to restore it back. Query may be raw (which just taken 'as is' from original statement), or 'cooked' (possibly transformed during legacy matching modes fixup). General rule is that for JSON query types (ones with set QFLAG_JSON_QUERY flag) we provide 'cooked' query; for all the rest - raw (if non-empty), or 'cooked' (if raw is empty). In case of agent flavour query may be used to determine list of necessary fields which have to be returned together with explicitly asked columns.

10. array of weights

That is optional array of integers which represent field weights. First is (as usual) array size, then integers of declared amount, each of them defines user specified per-field weight. If array is skipped (i.e., if first injected value is sequence of bytes 00 00 00 00, indicates zero-length array), default weight considered as 1 for all fields.

11. string list_of_indexes

That is list of lindexes to query. It contains string with comma-separated names of the indexes. It may contain one or more indexes which assumed to exist on the daemon. Order of indexes are important, at least in Manticore v.2, since it assumes using of 'main+delta' searching schema with kill-lists, and so, earlier in list indexes will be targets of kill-lists for laters. Also in this field may be used wildchars, as "*" which means, 'all indexes'. Index names consist from latin letters (a-z), numbers (0-9) and underscore(_). Everything else considered as separator. And also index names should not start with underscore.

12. int range64
13. uint64 min_docid
14. uint64 max_docid
15. filters array

That are filters. Originally in ancient times sphinx supported only filtering by DocID, and it was 32-bits integer. A bit later it also introduced 64-bits DocIDs, and introduced special flag 'id64' to distinguish between them. Protocol was build evolution way, and it first includes filters by docID, and later another type of filters were also added. So, let's consider all these values together, having in mind their historical meaning.

range64 considered as boolean. Zero means that min_docid and max_docid are DWORDs (i.e., by 4 octets each, representing unsigned int values). Non-zero means that they're uint64, (i.e. by 8 octets each). Historically it was difference between 'id64' and non-'id64' builds exactly in size of document-ids. For now manticore totally switched to 'id64' flavour, so range64 is ALWAYS =1.

min_docid and max_docid - define global values, originally used in SetIDRange API function. For now you may set that values as '0' for min_docid and 'DOCID_MAX' for max_docid. That will say, that filtering by DocID is not at all necessary.

For now that values are just historical artefact; you may use them to define range filtering by DocumentID, however you may do the very same way by providing the filter in array of filters below. When speaking to remote agents, manticore itself now uses hardcoded values of 1 for range64, 0 for min_docid and 0xFFFFFFFFFFFFFFFF for max_docid. That combination simple means 'there is no filter here'.

Non-default values of min_docid and max_docid however still interpreted, and they will produce ranged filter by @id which will be appended to array of filters, defined below.

Consider to always inject default values here and use explicitly defined range filter, to keep all filtering stuff similar.

array of filters - define filters. First (as in any array) is integer expressing number of filters; it may be zero (no filters). Then placed filters themselves (if any), one-by-one.

Each filter has following structure:

string attribute_name
DWORD filter_type
<filter-specific data>
DWORD exclude_flag
[master-specific data]

attribute_name is, obviously, name of attr by which filtering to be done.

filter_type is one of these values:

SPH_FILTER_VALUES		= 0,	///< filter by integer values set
SPH_FILTER_RANGE		= 1,	///< filter by integer range
SPH_FILTER_FLOATRANGE	= 2,	///< filter by float range
SPH_FILTER_STRING		= 3,	///< filter by string value
SPH_FILTER_NULL			= 4,	///< filter by NULL
SPH_FILTER_USERVAR		= 5,	///< filter by @uservar
SPH_FILTER_STRING_LIST	= 6,	///< filter by string list
SPH_FILTER_EXPRESSION	= 7		///< filter by expression

According to each value <filter-specific data> is different and described below.

SPH_FILTER_VALUES provides filtering by integer values. Filter-specific data is the values, passed as array of uint64. Maximum size of that array is limited by manticore's configuration max_filter_values; by default it is 4096.

SPH_FILTER_RANGE filters by integer range. Filter-specific data contains pair of uint64 numbers:

uint64 MinValue
uint64 MaxValue

Numbers represent left and right edges of the range.

SPH_FILTER_FLOATRANGE filters by float values. Filter-specific data contains pair of floats:

float fMinValue
float fMaxValue

Floats represent left and right edges of the range.

SPH_FILTER_STRING filter by string value. Filter-specific data is one string.

SPH_FILTER_NULL filters by null/not null. Filter-specific data is one octet, treated as boolean flag (0=false,!0=true). True flag filters by 'is null', false by 'is not null'.

NOTE! That especial filter collides with exclude_flag described below. I.e. if you want to filter by 'is not NULL', you may either use IsNull(false) filter, or !IsNull(true). Both will work, however note that using exclude_flag for inversion adds special 'inverter' filter into the chain, so daemon will calculate two filters as Inv(IsNull(true)). Using param, in turn, just calculates IsNull(false) which has one less step and so, a bit faster.

SPH_FILTER_USERVAR filters by value of user-defined variable. Filter-specific data is one string, representing name of the variable.

SPH_STRING_LIST filters by set of strings. Filter-specific data is array of strings, representing the strings.

SPH_FILTER_EXPRESSION filters by expression which is already passed in attribute_name. So, there is no filter-specific data for this filter.

exclude_flag represents boolean value, 0 - no inversion, !0 - invert result. That flag causes 'inverter' filter to be created and added before the whole filter. It may be used always where necessary, however read the note to SPH_FILTER_NULL type if you want to use it with this type of filter.

For specific master_version filter also contains following fields

DWORD HasEqualEdges // for v>=5 to v<15
DWORD HadEqualMin // v >=15
DWORD HadEqualMax // v >=15
DWORD OpenLeft // v >=15
DWORD OpenRight // v >=15
DWORD eMvaFunc // v >=13

HasEqualEdges (boolean) forces ranged filters to include bounds, so 'less' became 'less or equal', 'greater' became 'greater or equal'. Note that from v.15 this field is removed, and HasEqualMin and HasEqualMax are used INSTEAD of this value!

HasEqualMin and HasEqualMax are the same, but rules separately with left and right bounds and assume master_version at least 15. Starting from this version ONLY these values exist, and HasEqualEdges is ABSENT!

OpenLeft and OpenRight provides filters without left or right bound. In this case 'min' or 'max' filter's value just ignored.

eMvaFunc set type of MVA or string-list folding function, and that may be one of:

SPH_MVAFUNC_NONE = 0,
SPH_MVAFUNC_ANY = 1
SPH_MVAFUNC_ALL = 2

Meaning of 'NONE' usually same as 'ANY', but it also fires a warning to use explicit ANY()/ALL() function.

After all explicitly defined filters, daemon will also add @id SPH_FILTER_RANGE filter in case when min_docid (param 13) is not 0, or max_docid (param 14) is not 0xFFFFFFFFFFFFFFFF (aka DOCID_MAX).

16. int group_func

This is grouping function. Possible values are described in Searching - Grouping (clustering) search results section of the docs. Concrete numeric values are:

SPH_GROUPBY_DAY = 0
SPH_GROUPBY_WEEK = 1
SPH_GROUPBY_MONTH = 2
SPH_GROUPBY_YEAR = 3
SPH_GROUPBY_ATTR = 4
// SPH_GROUPBY_ATTRPAIR = 5
SPH_GROUPBY_MULTIPLE = 6

SPH_GROUPBY_ATTRPAIR is invalid choice and it will fire error if you use it.

SPH_GROUPBY_MULTIPLE means grouping on multiple attribute values.

17. string group_by

This is name of the column to be used with grouping function from previous param. In case of grouping by attr (value 4) it is name of the attribute. In case of grouping by 'multiple' (value 6) it is several names sepatated by comma.

18. int max_matches

That is maximal quantity of matches to retrieve. It shows, how much matches daemon will keep in RAM while searching. You can't retrieve more than tnis number in client application.

19. string group_sort

That is how to sort groups in result set. Usually it is set to '@groupby desc` string.

20. int cutoff

Cutoff to stop searching. It tells daemon to forcibly stop search query once this quantity of matches had been found and processed. It is intended for advanced pefrormance control.

21. int retry_count

That param defines how much tries is acceptable. That is not useful for local indexes, however if daemon you connect to serves distributed indexes, it may help to repair if one of remotes failed.

Apart concrete 0 or positive numbers it also may be set to '-1', which means 'use default value from config'.

22. int retry_delay

That represents delay, in milliseconds, between retries. It works only with remote agents and not used for local searches. Also, apart concrete numbers this param may be set to '-1', which means 'use default value from config'.

23. string group_distinct

Set attribute name for per-group distinct values count calculation. Available only for grouping queries. For query like:

select ... COUNT(DISTINCT foo), count(*) as @count from idx group by fee order by @count desc

'foo' is value wich came to group_distrinct param, and 'fee' is one for group_by. Also group_sort from that clause will take '@count desc' value.

24. int has_geoanchor

That is boolean, representing whether next 4 values (geloatattr, gelongattr, fgeolatitude, geolongitude) present or not in the query. If it is 0, then values are skipped, and immediately after this param follows params starting from 29.

25. string geolatattr
26. string geolongattr

These two strings (optional, depends from has_geoanchor value) set names of attributes which contains latitude and longitude values.

27. float fgeolatidute
28. float fgeolongitude

These two floats (optional, depends from has_geoanchor value) set coordinates of anchor point for geodistance search. These values must be expressed in radians.

29. array of index weights

That is array of objects of:

string index_name
int index_weight

Array may be empty, in this case all weights assumed as '1'. Otherwise for each named index assigned integer with weight. This value originated from API call SetIndexWeights, and so you may refer this function for details.

30. DWORD query_timeout

That is max query time, in milliseconds. Local search query will be stopped once that much time elapsed.

31. array of field weights

That is array of objects of:

string field_name
int field_weight

Binds per-field weights by name. Match ranker can be affected by per-field weights, so you may manage ranking by specifying different non-default weights to different full-text fields.

Array may be empty.

32. string comment

That is just comment which will be written together with query into the log.

33. array of overrides

Meaning of overrides described in function 'SetOverride()' of API. Internally overrides represented as array of objects of:

string attribute_name
DWORD attribute_type
array of overrided_attributes

attribute_name is obvious; attribute_type is internal type, there are many of them, but here especially interested values as

SPH_ATTR_FLOAT = 5
SPH_ATTR_BIGINT = 6

overrided_attribute, in turn, is object, depended from attribute_type and that is

uint64 docid
DWORD/uint64/float overriding_value

Here docid is also obvious, and overriding_value is a number, with type:

float - for attribute_type==SPH_ATTR_FLOAT
uint64 - for attribute_type==SPH_ATTR_BIGINT
DWORD - for the rest.

Array of overrides may be empty. Moreover, since feature is declared as 'deprecated', prefer to always keep it empty.

34. string select_list

That is select list, i.e. list of columns/expressions you want to retrieve by the query. Consider to read API reference - General query settings - SetSelect description for syntax details.

35. int max_predicted_msecs

That is optional parameter, it present in query ONLY if QFLAG_MAX_PREDICTED_TIME bit is set in qflags param at the beginning of the query.

The meaning is described in Configuration reference - searchd program configuration options - predicted_time_costs section.

36. string outer_orderby
37. int outer_offset
38. int outer_limit
39. int has_outer

These values defines params for outer query, or subselect. Read SphinxQL reference - SELECT syntax - Subselects section for details about them. Namely, outer_orderby represents sorting clause, outer_offset and outer_limit represents LIMIT values, and has_outer is the boolean representing whether the query has subselect at all. Yes, it looks not optimal and it would be better to have has_outer flag before, and make following values optional depended from it, but it already made as described.

40 master-agent extensions

This section represent different values depended from master_version. They are totally absent if you make 'client' connection, but have different blobs if you connection with agent flavour, depended from the version of extension. Values are inclusive, so if you have master-version=10, it will include all the fields for versions 1, 2, 6, excluding fields for v.14 and above.

master_version>=1 adds the field:

DWORD eCollation

eCollation may be one of these values:

SPH_COLLATION_LIBC_CI = 0
SPH_COLLATION_LIBC_CS = 1
SPH_COLLATION_UTF8_GENERAL_CI = 2
SPH_COLLATION_BINARY = 3

Default value is SPH_COLLATION_LIBC_CI, i.e. 0.

master_version>=2 adds the fields

string outer_orderby
[ int outer_limit ]

outer_orderby is described above as param 36. It is a bit messing, since originally outer subselect was intended only for internal master-agent communication, so it was added as extension 2. Later it was considered to make it available for all API clients, so params were exposed as params 36-38 to everybody also. If both param 36 and this extended value presents, value of extension will be used.

outer_limit is dupe of param 38, however it present in extension ONLY if param 39 has_outer is true, otherwise this extension is skipped.

master_version>=6 adds the field

int groupbylimit

That represents value of N for N-best-groupby, i.e. when clauses like 'GROUP 10 by foo' used - here groupbylimit=10.

master_version>=14 adds the fields

string udf_ranker
string udf_ranker_opts

These params provides necessary values when ranker=SPH_RANK_PLUGIN is used. So, for now using this ranker assumes daemon-agent connection, and it may not be used from simple client flavour connection.

41. string query_token_filter_lib
42. string query_token_filter_name
43. string query_token_filter_opts

These params provides custom token filter plugin for query. They represents name of the library, name of the filter and options. Query-time token filter created on each search for every involved index and let you implement custom tokenizer that produces tokens from text by your own rules. These values originate from OPTION token_filter=... of SELECT clause.

44. array of filter-tree items

Filter tree is new generation of filters which allows to use complex expressions. Tree is build over existing array of filters (see param 15) and refers to them by index. Each filter-tree item consists of:

int iLeft
int iRight
int iFilterItem
int bOr

iLeft and iRight represents tree connection to another leaves; iFilterItem refers to the filter clause described in array of filters (param 15); boolean bOr flag shows that node represents logical opeator 'OR'.

Array may be empty.

45. more master-agent extensions

master_version>=15 adds these fields:

array of query items
array of ref query items

These are used mostly by facets. Query items came from parsed select-list and it is list of columns from this query. Ref query items is original select-list prior replacing by facets. These two arrays helps to perform multi-query optimization even with facets, since they hold 'common' and 'specific' schemas separate, and so, not confusing strict criteria for rejecting common optimization.

Both 'query item' and 'ref query item' has same structure as:

	string alias
	string expression
	DWORD eAggrFunc

alias and expression may be illustrated by chunk SELECT expression AS alias ... . eAggrFunc is for aggregate columns, and it may be one of:

SPH_AGGR_NONE = 0
SPH_AGGR_AVG = 1
SPH_AGGR_MIN = 2
SPH_AGGR_MAX = 3
SPH_AGGR_SUM = 4
SPH_AGGR_CAT = 5

They are represent, that the column is may be aggregate function instead of exact value. SPH_AGGR_NONE means plain value (no aggretation). SPH_AGGR_AVG, SPH_AGGR_MIN SPH_AGGR_MAX SPH_AGGR_SUM corresponds to 'AVG()', 'MIN()', 'MAX()' and 'SUM()' aggregates, and SPH_AGGR_CAT corresponds to 'GROUP_CONCAT()' function.

master_version>=16 add this field:

DWORD eExpandKeywords

This value represents how keyword expansion has to be done during the query. Possible values are:

QUERY_OPT_DEFAULT = 0
QUERY_OPT_DISABLED = 1
QUERY_OPT_ENABLED = 2
QUERY_OPT_MORPH_NONE = 3

Values of 'disabled' and 'enabled' are set when option 'expand_keywords' of the query is set to 0 or 1. Value 'morth_none' is in game when option 'morphology' is set to 'none'. Default value means, that actual behaviour will be ruled by index-side setting from config.

master_version>=17 add this field:

array of index hints

index hint is pair of values:

	DWORD eHint
	string sId

where eHint is one of:

INDEX_HINT_USE = 0
INDEX_HINT_FORCE = 1
INDEX_HINT_IGNORE = 2

and sId is name of a column.

Right now hints are not yet used; however they're parsed from sphinxql clauses as select id from test where id>0 ignore index(id). Here is one hint INDEX_HINT_IGNORE with sIndex id. In future these hints will be used to manage secondary indexes over attributes for tuning query execution.

This value is the very last of SEARCHD_COMMAND_SEARCH command payload.

SEARCHD_COMMAND_SEARCH response payload version 1.33

Answering to SEARCHD_COMMAND_SEARCH daemon sends message with appropriate status code and following payload.

The payload contains one or several responses, according to number of queries in bach which were send in the command. Responses follows one after another, without prepending number of them, assuming that the number is the same as number of queries, and so, is already known to remote side.

Each response has following structure:

DWORD result_status

result_status is one of:

SEARCHD_OK = 0
SEARCHD_ERROR = 1
SEARCHD_WARNING = 3

Values looks similar as one status code in global message header, however they ARE different and in this context related only to search query. This status code send as DWORD, i.e. as 4 octets, in opposite to global message's status code which occupies WORD (i.e. 2 octets). Be aware for this difference!

in case of SEARCHD_ERROR code, the rest of payload is:

string error_msg

and that's all. After string error_msg blob for current response is finished.

in case of SEARCHD_WARNING first part of response is:

string warning_msg

immediately after warning_msg follows response itself, as in case of SEARCHD_OK.

In case of SEARCHD_OK no extra fields present, and then follows response to the command, which contains following parts:

  1. Schema
  2. Matches
  3. Global properties
  4. Stats (io/cpu/predicted)
  5. Agent-specific properties
  6. Word stats.
1. Schema

Schema contains following parts:

array of strings // fields
array of attributes

First array is just names of ft-fields in schema. Second contains structures like:

string attr_name
DWORD attr_type

attr_name is also just a name of a column. attr_type is value representing type of a column. Possible values are:

SPH_ATTR_INTEGER = 1			///< unsigned 32-bit integer
SPH_ATTR_TIMESTAMP = 2			///< this attr is a timestamp
SPH_ATTR_BOOL = 4			///< this attr is a boolean bit field
SPH_ATTR_FLOAT = 5			///< floating point number (IEEE 32-bit)
SPH_ATTR_BIGINT = 6			///< signed 64-bit integer
SPH_ATTR_STRING = 7			///< string (binary; in-memory)
SPH_ATTR_POLY2D = 9			///< vector of floats, 2D polygon (see POLY2D)
SPH_ATTR_TOKENCOUNT	= 11		///< field token count, 32-bit integer
SPH_ATTR_JSON = 12			///< JSON subset; converted, packed, and stored as string
SPH_ATTR_UINT32SET = 0x40000001UL	///< MVA, set of unsigned 32-bit integers
SPH_ATTR_INT64SET = 0x40000002UL	///< MVA, set of signed 64-bit integers
SPH_ATTR_MAPARG = 1000
SPH_ATTR_FACTORS = 1001			///< packed search factors (binary, in-memory, pooled)
SPH_ATTR_JSON_FIELD	= 1002		///< points to particular field in JSON column subset
SPH_ATTR_FACTORS_JSON = 1003
SPH_ATTR_STORED_FIELD = 1008		///< string (binary; in-memory) which is stored field, since master-agent extension v. 18.

Note that for master_version<4 all columns of SPH_ATTR_JSON_FIELD will be rendered to strings and returned as SPH_ATTR_STRING. For master_version<3 also SPH_ATTR_JSON will be rendered and returned as SPH_ATTR_STRING. Otherwise these values will contains BSON blobs, that is 'binary-json'. This format is also described in this document.

So, you'll never see 'SPH_ATTR_JSON' or 'SPH_ATTR_JSON_FIELD' in responses received in client mode.

2. Matches

Matches follow in this format:

int num_of_matches
int use_64bits // ==1
matches of quantity `num_of_matches`

That looks like array, with the exception that between num of elements and elements themselves stands use_64bits param, that is integer and always 1 (means, we use 64-bits document ids).

Each match has this format:

uint64 document_id
int weight
attributes

attributes is blob with several attributes; their quantity is NOT send with every match as for usual arrays, but instead taken from already received and parsed schema.

Each attribute has own format according to it's type defined in schema.

SPH_ATTR_UINT32SET is send as array of DWORDs

SPH_ATTR_UINT64SET is send as array of uint64s

SPH_ATTR_FLOAT is send as single float

SPH_ATTR_BIGINT is send as single uint64

SPH_ATTR_JSON is send as array of octets representing value in BSON format.

SPH_ATTR_STRING is send as string, may be with json mark at the end; see the note below.

SPH_ATTR_FACTORS, SPH_ATTR_FACTORS_JSON are send as array of octets representing value as packed factor's blob, which described in separate section.

SPH_ATTR_JSON_FIELD is send as:

byte Bson_type
array of octets // BSON of value

SPH_ATTR_STORED_FIELD is the same as SPH_ATTR_STRING. This type represents that this string is not the attribute (and so, may NOT be used in any expressions/filters), but just content of a stored field.

The rest of the attributes may be represented and send as single DWORD value.

NOTE: for SPH_ATTR_STRING daemon gives textual strings (not null-terminated). However sometimes it is literally json, and may be processed accordingly. Say, if you expose the string somewhere as value of json attribute, you may either escape it according to the rules of json escaping, or may put it directly, if it is json. Such dual string may be returned, for example, if you display the list of percolate index stored queries, and they may be either ql query, stored from sphinxql 'insert' routine, either json query, stored from REST API. To mark such values, instead of the string, daemon actually sends array of bytes, where last 2 bytes is sequence of 0x00 0x00 for true json, or 0x00 0x01 for true string. So, processing received string you must first check pre-last byte, and if it is zero, consider the actual string's length is less to 2 bytes then number of bytes came from the daemon. Then check the last byte, and if it is 0, this is json; if it is 1, it is simple plain text string. If pre-last byte is non-zero, consider whole attribute as plain text string.

3. Global properties

They are:

int total // total num of matches transferred
int total_found // total found num of matches
int query_time // query time in milliseconds
4. io/cpu stats

This section is send only in agent mode (when query was issued with non-zero master_version value) and skipped in client dialect. It contains following data:

byte uStatMask
[iostats] // present if uStatMask & 1 is non-zero
[uint64 iCpuTime] // present if uStatMask & 2 is non-zero
[uint64 predicted_time] // present if uStatMask & 4 is non-zero

uStatMask indicates which exactly sections follows. Zero values mean 'nothing', and in such case next section of response follows. Otherwise lest significant bit indicates presense of iostats, second points to cpustats, and third displays predicted_time.

iostats, if uStatMast&1 is non-zero, represented as:

uint64 iReadTime // in microseconds
DWORD iReadOps
uint64 iReadBytes
uint64 iWriteTime // in microseconds
DWORD iWriteOps
uint64 iWRiteBytes

Meaning of the fields obvious from their names; only things need to be noticed that iReadTime and iWriteTime fields are expressed in microseconds.

iCpuTime present if uStatMast&2 is non-zero, it is expressed in microseconds.

predicted_time present if uStatMast&4 is non-zero, it is expressed in microseconds.

5. Agent-specific properties

This section is send only in agent mode for specific versions, and skipped in client dialect.

For master_version>=7 these properties sent:

DWORD iFetchedDocs
DWORD iFetchedHits

They show raw number of docs and hits processed during matching and may be helpful in regarding index size.

For master_version>=8 this additional property sent:

DWORD iSkips

That is number of Skip() internal calls during matching.

6. Word stats

That is final part of SEARCHD_COMMAND_SEARCH result payload, and it contains:

array of wordstats

For clients wordstat is:

string word
DWORD docs
DWORD hits

For agents, indicated by non-zero master_version, wordstat contains also one extra byte:

string word // as for client
DWORD docs  // as for client
DWORD hits  // as for client
byte 0 // empty statistic for expanded terms

If original command contains >1 queries in the batch, after whole response will be placed next responses. After number of expected responses is exceeded, payload is done.

SEARCHD_COMMAND_EXCERPT command payload version 1.4

This is second big command of the daemon; it performs snippets generation, and it is also quite complex.

Structure of payload will be described from outer layer down to mostly internals.

Exceprts (or, 'snippets') payload contains following fields:

int field_mode // not used, skipped
int iFlags
string sIndex
string sWords
string sBeforeMatch
string sAfterMatch
string sChunkSeparator
int iLimit
int iAround
int iLimitPassages
int iLimitWords
int iPassageId
string sStripMode
string sPassageSPZ
array of strings // queries

First param field_mode is deprecated and ignored, however you have to send an integer there.

iFlags is a bit mask. Individual bits have following values:

EXCERPT_FLAG_REMOVESPACES		= 1 // 1-st bit
EXCERPT_FLAG_EXACTPHRASE		= 2 // 2-nd bit
EXCERPT_FLAG_SINGLEPASSAGE		= 4 // 2-rd bit
EXCERPT_FLAG_USEBOUNDARIES		= 8 // 4-th bit
EXCERPT_FLAG_WEIGHTORDER		= 16 // 5-th bit
EXCERPT_FLAG_QUERY				= 32 // 6-th bit
EXCERPT_FLAG_FORCE_ALL_WORDS	= 64 // 7-th bit
EXCERPT_FLAG_LOAD_FILES			= 128 // 8-th bit
EXCERPT_FLAG_ALLOW_EMPTY		= 256 // 9-th bit
EXCERPT_FLAG_EMIT_ZONES			= 512 // 10-th bit
EXCERPT_FLAG_FILES_SCATTERED	= 1024 // 11-th bit
EXCERPT_FLAG_FORCEPASSAGES		= 2048 // 12-th bit

sStripMode string may has one of 'none', 'index', 'strip' and 'retain' values.

sPassageSPZ string may be one of 'sentence', 'paragraph' and 'zone', and also may be empty string.

All these values are well described in API reference - Additinal functionality - BuildExcerpts section, use it for explanations of meanings.

queries are either texts for snippets generation, either path to files with the texts; concrete meaning is ruled by LOAD_FILES flag in iFlags bitmask.

SEARCHD_COMMAND_EXCERPT response payload version 1.4

Response of snippets command is quite simple and consists of several strings. Number of values corresponds to number of queries passed to the command.

Each string represents snippet for one query.

SEARCHD_COMMAND_UPDATE command payload version 1.3

This command updates given attribute values in given documents. Updates are possible for numbers (integers, floats) and MVA attributes. Updating strings and jsons is not supported, at least in current version of protocol. However you can update numbers, saved in json attributes, using locator as name of attribute, as json.attr[4].intvalue

Update command payload contains following parts:

string indexes
DWORD num_of_attrs
DWORD flags
attr_descriptors[num_of_attrs] // schema
array of update // new values to assign

indexes contains names of indexes to update. It may be one or several indexes, including distributed. In latest case updates will be applied to all local and remote agents (with very same API command being sent by head master to daemons). Rules for naming and enumerating indexes here are the same as for SEARCHD_COMMAND_SEARCH command, except that for updates you can't use wildchar.

num_of_attrs contains, namely, number of attributes you want to update. And it is also quantity of attr_descriptors described below.

flags is a bit field. For now only one bit (LSB) used, it indicates whether to ignore non-existent attributes, or warn about them. If may be useful when you simultaneously update list of indexes using 'fat' package containing new values for all possible attributes, but some of the indexes just don't contain them in their schema.

attr_descriptors is actually array, but it's size is not passed with every blob, but determined by param num_of_attrs instead. Each attr_descriptor in the array contains:

string attr_name
DWORD mva_flag

where attr_name is, namely, name of attribute to update, and mva_flag regarded as boolean (0=false, !0=true) and determines that this attribute is actually MVA attribute.

Next follows array of update. That is usual array, prepended by it's size. Each update record in array contains:

uint64 document_id
update_blobs[num_of_attrs]

document_id is obvious.

update_blobs is actually array of update_blob, but it's size is not passed with every blob, but determined by param num_of_attrs instead. Structure of update_blob depends from whether corresponding attribute has set mva_flag or not.

For attributes with mva_flag is true (i.e. !=0), update_blob is array of DWORD.

For attributes with mva_flag is false (i.e. ==0), update_blob is single DWORD value.

NOTE, that SEARCHD_COMMAND_UPDATE can update attributes only by specifying direct document-ids; updates by condition as 'where docid<1000' is NOT possible by this command on daemon side. Also note, that update payload can hold only 32-bit values. For example, when updating 64-bits MVAs in this case only low DWORD will be written, and high DWORD will be explicitly set to 0.

If you want to perform more complex updates, consider to do it with private command SEARCHD_COMMAND_SPHINXQL, wrapping necessary clause into that command.

SEARCHD_COMMAND_UPDATE response payload version 1.3

Response of update command is single int containing number of updated documents.

SEARCHD_COMMAND_KEYWORDS command payload version 1.1

Keywords are used to extract tokens from query using tokenizer settings for given index.

Command's payload contains following parts:

string query
string index
int bNeedStats
int bFoldLemmas
int bFoldBlended
int bFoldWildcards
int ExpansionLimit

query is query to tokenize

index is the index from which tokenizer settins will be used

bNeedStats is boolean (0=false, !0=true), means that for each keyword will be also returned pair of ints - number of docs and hists.

bFoldLemmas will return only one normalized token from lemmatizer, even if it provides many lemmas (which is usual for non-trivial forms). For example, for token like 'синей' with russian AOT morphology with folding only 'синий' will be returned; however without folding it will be lemmas 'синий', 'синеть', and 'синь'.

bFoldBlended will return normalized tokens blended. I.e. for token like 're@bus' with blend char '@' normalized with folded blended will be only 're@bus', but without it will be 're@bus', 're', and 'bus'.

bFoldWildcards will return normalized token with wildcards. Without this flag wildcards will be unfold and return m.b. many tokens. For example, 'qu*' will produce 'qu*' for folded wildcards, but bunch of 'quorum', 'quorumm', 'quorumm', etc. for unfolded.

ExpansionLimit will work together with wildcards and helps to limit expansion when too short wildcard token otherwise would return you millions of expansions.

SEARCHD_COMMAND_KEYWORDS response payload version 1.1

Response of keywords command is array of keyword

Each keyword contains:

string tokenized
string normalized
int querypos
[int Docs] // optional, sent only if `bNeedStats` was set
[int Hits] // optional, sent only if `bNeedStats` was set

tokenized is just a token cutted from your query.

normalized is result of normalization, ruled by provided flags

querypos is position of original token in the query. For folded blends sequence of positions might contain gaps on the place of blends, as 're@bus fe@bus' will return qpos 1 for 're@bus' and qpos 3 for 'fe@bus'.

Optional Docs and Hits contains number of documents and positions, accordingly.

SEARCHD_COMMAND_PERSIST command payload

Persist is a special command which tells daemon to NOT close the connection after next commands, but keep it alive and expect new commands on the same connection. This command has to be send right after handshake in order to set 'persistend mode' for whole following session. Persistent command has no version, so you have to pass any WORD (say, 0) as it's value.

NOTE that persistent request is not necessary will be executed by daemon. If limit for 'max_children' is set in daemons' config, and if current connection would hit the limit, connection will not became persistent and will be closed right after the command. In this case, if non-persistent mode of work is ok for you, you have to connect again and perform your query.

Also, persistent connection may be closed by timeout. In this case you, again, have to establish new connection before continuing your work.

So, when working in persistent mode you have to always check whether connection still alive before using it, and re-establish it, if not.

Persist payload contains single int which interpreted as boolean (0=false, !0=true) and means either to enter (true) persistent mode, either to exit (false) it.

SEARCHD_COMMAND_PERSIST response payload

Persist command is the only one which has no payload, and also it doesn't send any response message at all. So, you have not to wait any response to this command at all.

SEARCHD_COMMAND_STATUS command payload version 1.1

Status returns you status of the daemon. As payload you have to send single DWORD global_status which will be interpreted as boolean (0=false, !0=true), which indicates whether you want to get global daemon status (true), or meta of last query (false).

Apart direct call by API, this command is also invoked when you invoke console 'searchd' with '--status' param. In that case daemon will parse the config, then locate appropriate listen directive, and send this very command to the listening poing with global_status set to 1 (true).

SEARCHD_COMMAND_STATUS response payload version 1.1

Response status command contains table of strings in following structure:

int num_of_rows
int num_of_columns // =2
string[num_of_rows*num_of_columns] values

That is, two integers containing dimensions of the table, and than many strings values placed one-by-one in quantity of table's capacity (that is number of rows * number of columns). Every odd string contains name of a variable; every even contains the value.

num_of_rows depends from the command, may be different. Consider to receive and process about 30-100 rows.

num_of_columns is always 2. I.e. you'll receive key-value pairs, one value per each key (column)

Refer to SphinxQL reference - SHOW STATUS syntax for data available in global status.

Refer to SphinxQL reference - SHOW META syntax for data available in last meta status.

In addition to data returned by 'SHOW META' sphinxql command, API command will also return, in addition, values for predicted_time and dist_predicted_time.

In opposite to sphinxql statements 'show status' and 'show meta', API command has no way to pass additional 'like' filtering string, so payload of response will always return all available lines of status or meta.

SEARCHD_COMMAND_FLUSHATTRS command payload version 1.0

Flush all in-memory attribute updates in all the active disk indexes to disk.

Command has zero payload, so you have nothing to send apart the message with that command, and 0 as size of it's payload.

SEARCHD_COMMAND_FLUSHATTRS response payload version 1.0

Response of SEARCHD_COMMAND_FLUSHATTRS contains a single int tag value, which identifies the result on-disk state (basically, a number of actual disk attribute saves performed since the daemon's startup).

Private API commands

These commands are mostly internal and used in master-agent communication (when you invoke corresponding commands to distributed indexes). However they're not 'secret', but just dismissed in support, so you're welcome to implement and share your implementation.

SEARCHD_COMMAND_SPHINXQL command payload version 1.0

Sphinxql is special 'proxy' command which allows you to invoke and execute sphinxql queries over binary API protocol. In difference to real SphinxQL each such command is stateless and intended only for appropriate sphinxql statements which are not implies state.

For example, you can perform a command like complex conditional update or delete; you may execute 'select' - but you can't execute properly 'show meta' after such select. However during the call temporary session is persist, so you may execute 'select ...;show meta' multistatement and take expected result.

Payload of the command is single string sqlquery, containing any query or batch of queries, ';'-separated, you want to execute, written in sphinxql syntax.

SEARCHD_COMMAND_SPHINXQL response payload version 1.0

Sphinxql response contains raw dump of mysql proto response. If it would be real sphinxql session, mysql client would receive this bytes from network and then interpret this blob and show you the result(s) accordingly. But with API command you just have this row blob with proto dump and have to parse/interpret it yourself. Small reference for this dump is placed at the end of this document.

SEARCHD_COMMAND_PING command payload version 1.0

Ping command is simple 'ping'. It has a single int cookie as payload. You can send any integer here.

This command may be used to evaluate daemon's connection responsibility and measure, if necessary, 'pure delay' of protocol interface. Since command does nothing, you can measure pure time of networking and protocol roundtrip. That command is by default issued by head's daemon to all the agents to imagine their's responsibility. That is done in period of ha_ping_interval, which is 1sec by default.

SEARCHD_COMMAND_PING response payload version 1.0

Ping command returns single int cookie value, the one which was passed in the command.

SEARCHD_COMMAND_UVAR command payload version 1.0

SEARCHD_COMMAND_UVAR command is used by head daemon to distribute user-defined variables, which usually used in different kind of filtering with 'IN' clause.

Look for SphinxQL reference - SET syntax to learn about user variables. Here we speaks about line which set so-called 'global user variable' and has general syntax as

SET [INDEX index_name] GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])

When you invoke it without pointing an index, only local variable will be set. However when you point index by name, head daemon will determine if the index is distributed and has remote agents, and if so, will send the values to agents over network using this very command.

Command contains following data:

string var_name
int num_values
array of octets // values as VLB-LE-packed delta-encoded blob

var_name is the name of user variable we want to set

num_values is quantity of values in the list we want to assign to var_name uservar

Then follows blob (array of octets) with packed values.

This blob produced from original array of uint64, which represents actual variable list of values. Before sending the array, first, has to be sorted. Then it values have to be transformed using delta-encoding. And these deltas finally need to be packed into blob one-by-one with VLB LE compression.

Example:

  • Let original variables are uint64 numbers: 100, 40, 2. Being written 'as is' they will occupy 24 octets.

  • After sorting we have sequence of uint64: 2, 40, 1000.

  • After delta-encoding, we have sequence of uint64: 2, 38, 960

  • After VLB little-endian compression, we have sequence of octets: 2, 38, 192, 7

So finally from 3 uint64 (which are 24 octets) we have blob of only 4 octets which will be send over network.

SEARCHD_COMMAND_UVAR response payload version 1.0

Response to UVAR has only single int in payload, which interpreted as boolean indicates whether command was success(1) or not (0). (actually if response status is SEARCHD_OK, this value is always 1, i.e. 'success')

SEARCHD_COMMAND_JSON command payload version 1.0

JSON is special 'proxy' command which allows you to invoke and execute REST queries over binary API protocol instead of HTTP. Payload contains two strings:

string endpoint
string request

endpoint contains endpoint's address, as one who usually written in URL after host root when you issue such query via HTTP proto.

request is the request in JSON format.

All the section about HTTP API reference will help you to imagine what kind of queries you can pass via this wrapper.

SEARCHD_COMMAND_JSON response payload version 1.0

Response of JSON consists from two fields:

string endpoint
array of octets // result

endpoint is actually the same that came in command;

result contains dump of the answer. By nature that is same 'raw dump' as in case of sphinxql. But since HTTP proto is textual, you may expect to recognize in this blob simple textual JSON object with result of your query.

SEARCHD_COMMAND_CALLPQ command payload version 1.0

Call PQ is internal function of Manticore which was implemented and exposed over network to make it working with distributed indexes.

Read first Searching - Percolate query section, and also SphinxQL reference - CALL PQ syntax to learn about percolate queries.

CALLPQ command payload contains following data:

DWORD flags
string sIdAlias
string sIndex
int iShift
array of arrays of bytes // documents

flags is a bitset, containing following masked booleans (each bit set to 1 means 'true', 0 means 'false'):

&1 - need_docs
&2 - need_query
&4 - json_docs
&8 - verbose
&16 - skip_bad_json

need_docs - whether we need to return list of matched docs (namely, they document-ids)

need_query - whether we need to dump original stored matched queries from the index

json_docs - is set when documents are jsons passed as BSON blobs, otherwise they are plain strings.

verbose - whether daemon need or not return also additional data for queries, for now is times spent by each matched query.

skip_bad_json - indicates what is to do if provided JSON document is invalid. Daemon should either immediately stop whole call and report error, or it should skip the doc and process the rest or the query. Actually this flag over API proto works only with totally empty JSON documents, since initial parsing is already done on client side, and it already processed all critical errors.

sIdAlias - has sense only for JSON documents. It says, which of attributes in such document contains documentID value. This value is useful only if you set boolean need_docs; otherwise these values will be never shown.

sIndex - name of the index we work with. That is one local PQ index, or one distributed index which built from local or remote PQ indexes.

iShift - is value which will be added to all document-ids. That works only if documents contains no ids, and so, auto-numerated. Otherwise provided Document-id will be returned always as-is, despite this value.

documents - is either array of strings (if json_docs is false), or array of BSONs (if json_docs is true). But despite that value it is always safe to consider them as 'array of arrays of bytes', since both strings and BSON blobs are arrays of octets by nature.

Plain-text documents contains nothing, but plain text. Json documents, passed as BSON blobs, could contain several fields and attributes, and so, saved filters of PQ may be applied to these attributes.

SEARCHD_COMMAND_CALLPQ response payload version 1.0

This payload is the answer to CALLPQ command. It contains from:

DWORD queryflags
array of querydesc
uint64 tmTotal
uint64 tmSetup
int QueriesMatched
int QueriesFailed
int DocsMatched
int TotalQueries
int OnlyTerms
int EarlyOutQueries
array of int QueryDT
string sWarnings

queryflags is a bit mask, describing content of querydesc descriptors in following array.

It provides following masked booleans (1 means 'true', 0 means 'false'):

&1 - has_docs
&2 - dump_queries
&4 - has_docids

querydesc is a structure, describing matched stored query from PQ index. It contains:

uint64 queryID
[docs] // present only if bit has_docs of flags is set
[query] // present ONLY if bit dump_queries of flags is set

queryID is id of stored query which matched.

docs is optional, and will present ONLY if has_docs bit of flags is set (i.e. if flags&1 is not zero)

If also has_docids is set (i.e. if flags&4 is not zero), docs is array of uint64s, containing list of documents which matched by stored query with queryID of the match.

If has_docids is NOT set, docs is array of ints, containing list of order numbers of matched documents, increased to value of iShift. Remote client may disbribute big chunk of docs among several agents, and using iShift param then collect returned data into one solid resultset.

query is optional, and will present ONLY if dump_queries bit of flags is set (i.e. if flags&2 is not zero). It consists, in turn, from:

DWORD uDescFlags
[string sQuery] // optional
[string sTags] // optional
[string sFilters] // optional

uDescFlags is a bit mask, describing presence of following strings and format of the query.

It provides following masked booleans (1 means 'true', 0 means 'false'):

&1 - query_present
&2 - tags_present
&4 - filters_present
&8 - query_is_ql

sQuery present ONLY if uDescFlags&1 is not zero, and it contains text of the query.

sTags present ONLY if uDescFlags&2 is not zero, and it contains text of the tags.

sFilters present ONLY if uDescFlags&4 is not zero, and it contains text of the filters.

query_is_ql indicates, that query is in ql-format; otherwise it is json. That flag may be used by client to format given text of the query to client application. Say, for ql queries published in json resultset it may be escaped and placed in special 'query' string, but pure json queries may be returned 'as is'.

In simplest case, when queryflags is zero (i.e. no bit is set), querydesc is just a single uint64 value of queryID, and so, whole set of querydesc is just array of unit64.

Then follow meta values, returned together with resultset.

tmTotal represents total time spent to call pq command, in microseconds

tmSetup is time spent to initial setup of matching - parsing docs, settins options, etc.; in microseconds

QueriesMatched - how many stored queries match the document(s)

QueriesFailed - number of failed queries

DocsMatched - how many times the documents match the queries stored in the index

TotalQueries - total number of stored (in the index) queries

OnlyTerms - how many queries in the index have terms. The rest of queries have extended query syntax

EarlyOutQueries - how many queries quickly matched and rejected with filters or other conditions

QueryDT - array of execution time, in microseconds, per each query. This array is filled only if you provided verbose flag in the command, otherwise it is empty array (but still present, i.e. will be single sequence of 00 00 00 00 bytes indicating an empty array)

sWarnings - warnings. That is a bit of out-of-rules, since warnings usually have to be send with status code before payload, and this is a little bug in CALL PQ payload implementation.

SEARCHD_COMMAND_CLUSTERPQ command payload version 1.0

ClusterQP is purely internal command which helps daemon to manage synchronious cluster work as initial synchronisation and so on. That is now under active development, so this description is most probably already out-of-date. So, don't implement and use this command! Also consider, that actually it may have sense only for master-agent communication, and it may be not usefull at all for aside clients.

Cluster PQ has its own set of commands, encapsulated into binary API protocol. Common payload of the command is quite simple, and contains:

DWORD cluster_command
cluster_command_payload

Command is one of these values:

CLUSTER_DELETE = 0
CLUSTER_FILE_RESERVE = 1
CLUSTER_FILE_SIZE = 2
CLUSTER_FILE_SEND = 3
CLUSTER_INDEX_ADD_LOCAL = 4
CLUSTER_SYNCED = 5

For each of cluster_command concrete payload is different.

CLUSTER_DELETE - has single string in payload, which is name of the cluster. Answer payload contains single byte =1.

CLUSTER_FILE_RESERVE has payload of:

string sCluster
string sIndex
string sFilename
uint64 fileSize
string sFileHash

Reply to CLUSTER_FILE_RESERVE has payload:

uint64 filesize
string sFileHash
string sIndexPath

CLUSTER_FILE_SIZE is not implemented.

CLUSTER_FILE_SEND has payload of:

string sCluster
string sFilename
uint64 iFileOffset
array of bytes // data

Reply to CLUSTER_FILE_SEND has payload:

uint64 IndexFileSize

CLUSTER_INDEX_ADD_LOCAL has payload of:

string sCluster
string sIndex
string sIndexPath
byte IndexType // == 3, means PERCOLATE
uint64 IndexFileSize
string sFileHash

Reply to CLUSTER_INDEX_ADD_LOCAL has payload:

uint64 IndexFileSize
string sFilePath

CLUSTER_SYNCED has payload of:

string sCluster
string m_sGTID
array of strings // names of indexes

Reply to CLUSTER_SYNCED has payload of single byte value (1).

SEARCHD_COMMAND_GETFIELD command payload version 1.0

GetField is internal command used to retrieve stored fields on final stage of a query. Stored fields are just 'a storage', they are invented to store texts which are indexed. That storage may be organized slow or resource-consuming way and is not assume fast access by an index for any computations. So, if you want to retrieve stored column with your query, picking actual stored values will be postponed to the very end of query execution, until the moment where just few final matches prepared to be returned as final result. That is: if you have thousand matches to process (sort some way, filter, calculate expressions over them) - you don't need all thousand of stored fields to be retrieved at this moment. But when everything is done, and you're going to return, say, 10 matches from that thousand - at this moment you may retrieve 10 stored fields for exactly that subset of matches, so that 'heaviness' of the way they stored will not affect overall resource/time consumption too much.

GETFIELD command payload contains following data:

string indexes
array of strings fieldnames
array of int64 docids

indexes is string with list of the indexes from which we want to fetch stored fields. Rules for naming and enumerating indexes here are the same as for SEARCHD_COMMAND_SEARCH command, except that for updates you can't use wildchar.

fieldnames is strings with names of stored fields (columns).

docids is list of the documents for which we want to fetch fields.

SEARCHD_COMMAND_GETFIELD response payload version 1.0

Response of GETFIELD consists from three vectors:

array of int64 docids
array of locators fieldlocators
array of octets // result

docids is just list of document ids for which data is provided. Note that it is not necessary the same set docids as you've provided in the command GETFIELDS, but may be subset of them (if not all stored fields was retrieved), and also they may be not in the same order as were in the list in the command payload.

fieldlocators is array of structures, pointing to bytes in result, each of them has two fields:

DWORD offset
DWORD length

Locators are just a plain vector, without any borders between fields of different documents. If you requested 3 stored fields for 5 documents, locators will be just 15 (i.e. 3*5) structs.

Result is just raw set of bytes containing stored fields. By iteration over locators and taking chunks from result blob with known offset and length you can take whole set of stored fields.

Appendix. Another formats and algorithms

Here is small reference description on formats and approaches, mentioned in the doc.

Delta encoding

Delta encoding is used when the sequence of the values to store is monotonically non-decreasing. Each value is replaced with its difference (delta) from the previous value.

Example:

	source-sequence = 3, 5, 7, 11, 13, 17, ...
	delta-encoded = 3, 2, 2, 4, 2, 4, ...

The resulting deltas are smaller, and compress more efficiently. Very first value is written 'as is'.

VLB compression

VLB compression encodes a fixed-length (32-bit or 64-bit) integer value to a variable-length byte string, depending on the value. 7 lower bits of every octet contain next 7 low bits of the compressed value; and 8-th bit signals whether the are more bytes following. Note that high bits come first! And that is essential difference from described below VLB LSB compression!

Hence, values that take 7 bits (0 to 127, inclusive) are stored using 1 byte, values that fit in 14 bits (128 to 16383) are stored using 2 bytes, etc.

Examples:

	source-value = 0x37
	encoded-value = 0x37
	source-value = 0x12345
	encoded-value = 0x84 0xC6 0x45
		// 0x84 == ( ( 0x12345>>14 ) & 0x7F ) | 0x80
		// 0xC6 == ( ( 0x12345>>7 ) & 0x7F ) | 0x80
		// 0x45 == ( ( 0x12345>>0 ) & 0x7F )

VLB Little-endian compression

VLB little endian compression encodes a fixed-length (32-bit or 64-bit) integer value to a variable-length byte string, depending on the value. 7 lower bits of every octet contain next 7 high bits of the compressed value; and 8-th bit signals whether the are more bytes following. Note that low bits come first!

Hence, values that take 7 bits (0 to 127, inclusive) are stored using 1 byte, values that fit in 14 bits (128 to 16383) are stored using 2 bytes, etc. In opposite to legacy (big-endian) variant little endian as a bit simpler in implementation since it doesn't require to calculate final blob length before you start the encoding.

Examples:

	source-value = 0x37
	encoded-value = 0x37
	source-value = 0x12345
	encoded-value = 0xC5 0xC6 0x04
		// 0xC5 == ( ( 0x12345>>0 ) & 0x7F ) | 0x80
		// 0xC6 == ( ( 0x12345>>7 ) & 0x7F ) | 0x80
		// 0x04 == ( ( 0x12345>>14 ) & 0x7F )

Binary json

Disclaimer: BSON is not part of binary API protocol, it is just internal structure that sometimes may come in blob from daemon, or sometimes may be provided from client side (consider using custom 'CALL PQ' implementation which sends structured documents in BSON form). So, from viewpoint of binary API protocol, BSON is just a blob, or array of bytes, and it is send over network as is, without any conversion like byte order adjusting and etc. So, one side sends array of bytes, another receives it - and area of communication protocol is finished at this point; next you need to parse or interpret that blob some way.

So, this section is just description of such internal blob format as BSON, it is not related to anything mentioned above, as status codes, messages, byte order and so on. That is totally different.

BSON is 'binary json' - serialized machine-readable binary format which Manticore uses internally to store and process json documents. It was inspired by standart BSON which developed by MongoDB, and was adapted to Sphinx requirements, and sometimes simplified. For example our BSON doesn't support long 128-bits floats. Also it has special packed form for fixed-size array. Also it has embedded bloom filters to boost navigation over associative arrays. So, just take in mind that BSON described here is NOT the standard BSON, but Manticore's own dialect, and they are NOT mutually compatible.

All numbers, which are stored values, are stored in native format, that is explicit little-endian (since majority of servers works on intel arch where this order is native) for length values.

Floats stored also in native IEEE-754 standard, as they used, without any conversions.

All variable-sised data, if possible, prepended by the size, size is assumed to fit in DWORD (i.e. unsigned 32-bits integer), but stored in special packed BSON compression format and occupy 1 to 5 octets of the storage.

All numbers, which are meta-data (sizes of arrays, lengths of strings, etc.) always stored in BSON compressed form. The only exception is Bloom hash at the beginning of associatives - since it is random by nature, this is no reason to compress it. Packed numbers are referenced later as packednum.

As a special example - strings in BSON are stored as packet length, followed by actual string characters, That is quite similar as binary protocol string, but the length, note, is packed number. Such strings later mentioned as packstr.

For example, 'hello world' in packstr is sequence of 0x0B 'h' 'e' 'l' 'l' 'o' ' ' 'w' 'o' 'r' 'l' 'd', and occupies totally 12 bytes. If length of the string is more than 251 chars, it will ocupy 3+ bytes, and so on.

For associative arrays (objects) BSON also stores Bloom filter, which helps quickly reject the whole array if necessary key is NOT present there.

All nodes are either named_node, either simple node.

Each node has following format:

byte node_type
[packstr node_name]
[node_value]

node_name may present ONLY if it is named_node, otherwise absent.

node_value may present and follow node_type, or may absent, if value is obvious from the type itself.

node_type is one of:

JSON_EOF = 0
JSON_INT32 = 1
JSON_INT64 = 2
JSON_DOUBLE = 3
JSON_STRING = 4
JSON_STRING_VECTOR = 5
JSON_INT32_VECTOR  = 6
JSON_INT64_VECTOR  = 7
JSON_DOUBLE_VECTOR = 8
JSON_MIXED_VECTOR  = 9
JSON_OBJECT = 10
JSON_TRUE = 11
JSON_FALSE = 12
JSON_NULL = 13
JSON_ROOT = 14

JSON_EOF node is terminator. It finished json object and root element, and has nor name, neither value.

JSON_TRUE, JSON_FALSE, and JSON_NULL has no node_value, since it is obvious from the type.

JSON_INT32 has value int.

JSON_INT64 has value int64

JSON_DOUBLE has value double, that is long float

JSON_STRING has value packedstr

JSON_STRING_VECTOR represented by:

packedint size
packedint length
packstr[length] values // packedstrs of length quantity.

size represents size (in bytes) of whole array; so that when navigating over BSON you can skip it as whole, using this value.

length represents number of stored strings, i.e. length of array.

values is set of packstr values, one-by-one.

JSON_INT32_VECTOR represented by:

packedint length
int[length] values // 32-bit integers of length quantity.

length represents number of stored numbers, i.e. length of array.

Since size of every number is determined (that is, 32 bits, or 4 bytes), no size as in case with JSON_STRING_VECTOR is stored, since you always can take the size by multiplying length to size of element (=4).

JSON_INT64_VECTOR represented by:

packedint length
int64[length] values // 64-bit integers of length quantity.

length represents number of stored numbers, i.e. length of array.

Since size of every number is determined (that is, 64 bits, or 8 bytes), no size as in case with JSON_STRING_VECTOR is stored, since you always can take the size by multiplying length to size of element (=8).

JSON_DOUBLE_VECTOR represented by:

packedint length
double[length] values // 64-bit doubles (floats) of length quantity.

length represents number of stored numbers, i.e. length of array.

Since size of every number is determined (that is, 64 bits, or 8 bytes), no size as in case with JSON_STRING_VECTOR is stored, since you always can take the size by multiplying length to size of element (=8).

JSON_MIXED_VECTOR represented by:

packedint size
packedint length
nodes // values of any type, except `JSON_ROOT` and `JSON_EOF`

size represents size (in bytes) of whole array; so that when navigating over BSON you can skip it as whole, using this value.

length represents number of stored nodes, i.e. length of array.

JSON_OBJECT represented by:

packedint size
DWORD Bloom
packedint length
named_nodes // values of any type, except `JSON_ROOT` and `JSON_EOF`
JSON_EOF

size represents size (in bytes) of whole object; so that when navigating over BSON you can skip it as whole, using this value.

Bloom is the Bloom filter hash. If you need to check if a key is represented in the object, you can calculate Bloom hash of that key and intersect it with this value by logical AND. If result is different from your calculated hash, then this object is definitely NOT contains value, corresponding to your key. Otherwise you have to linearly navigate array elem-by-elem and check name of every node, if it equals to the key.

length represents number of stored nodes, i.e. length of array.

Root element is the very first node, and it has it's own special format:

dword Bloom
named_nodes
byte node_eof // = 0

It differs from JSON_OBJECT by that it has no node_type (it is obviously JSON_ROOT, since it its very first), and also it has no size element. Last is nesessary only to step over the node, but root element occupies whole blob, so you don't even need to step over it, or can take size of payload, if you need it.

Bloom hash

Hashing of the keys done by following algorithm:

  1. Caluclate CRC32 of the string key.
  2. take ( 1UL<<( uCrc & 31 ) ) + ( 1UL<<( ( uCrc>>8 ) & 31 ) );

Otherwords, we take low WORD of CRC32, and then take two numbers from them, as:

000BBBBB 000AAAAA

Then we take zero DWORD and set A-th and B-th bits of it to '1'. Final value of hash contains mostly 2 non-zero bits in pseudo-random positions.

Whole hash mask (bloom filter) is made by logically summing all member hashes by arithmetic 'OR' operation.

So, say, for set of 'key1', 'key2', and 'key3' we first take hash1, hash2, and hash3, and then calculate common hash as hash1 | has2 | hash3. This common 32-bits hash is final bloom filter, and it is placed at the beginning of the root node, or at the beginning of any json object (which is associative array).

Hack: if you dont' want to calculate bloom mask, you may provide value 0xFFFFFFFF instead. That will effectively switch off all Bloom filtering. However searching of every key in such case will be pefromed as checking of all values, and can't be boosted by Bloom filter.

BSON number compression

BSON internally compress DWORD numbers with own VBL compression. Bits of the number are NOT shifted, instead whole significant bytes of original number are always written in little-endian order by following template:

  • numbers which are >0 and <0x000000FC occupies exactly 1 byte, that is the only significant byte of the num.
  • numbers <0x00010000 occupies 3 bytes; first is prefix 0xFC and then two bytes, little-endian order.
  • numbers <0x01000000 occupies 4 bytes; first is prefix 0xFD and then three significant bytes, little-endian order.
  • rest of the numbers occupies 5 bytes; first is prefix 0xFE and then number as is, in little-endian order.
  • blob starting from 0xFF is illegal.

Hints about BSON

  1. That is not enough to have just pointer to BSON node, since you can't immediately determine if it is node, or named_node. So, even if you read first byte and extract the type, you don't know either next bytes are part of value(s) itself, or it is packstr name of the node. So, if you need such kind of locator, you have to store there node's type, and pointer to real node's value.

  2. If you parse text json into BSON yourself and meet an array, you can always provide it as 'mixed vector'. However it would be better if you try to cast it's type to one of 'fixed-size' arrays. Collect all the types of the nodes, and if they are, say, all are 'string' - pack result as BSON's string array. If they're numbers, try to determine their range: if all of them are integer and fits into 32 bits - make vector of int32, or try vector of int64, or vector of doubles.

Packed factors

Disclaimer: Packed factors is not part of binary API protocol, it is just internal structure that sometimes may come in blob from daemon. So, from viewpoint of binary API protocol, packed factors is just a blob, or array of bytes, and it is send over network as is, without any conversion like byte order adjusting and etc. So, one side sends array of bytes, another receives it - and area of communication protocol is finished at this point; next you need to parse or interpret that blob some way.

Packed factors contains different metrics available during ranking of the matches. Also they may be passed outside as attribute of type SPH_ATTR_FACTORS, and in this case they will be provided as raw blob.

Packed factors is block of memory containing may DWORDs and floats. In opposite to numbers for internal binary sphinx API, they stored in raw byte order which is specific for runtime system. So, from intel platform you'll get them as little-endian, another (like MIPS) will give you big-endian blobs; be aware!

Blob has following format:

DWORD size // size of whole blob in bytes

// document level factors
DWORD doc_bm25
float doc_bm25a
DWORD matched_fields
DWORD doc_word_count

// field level factors
DWORD num_of_fields
DWORD exact_hit_mask[(num_of_fields+31)/32] // bitfield
DWORD exact_order_mask[(num_of_fields+31)/32] // bitfield
struct field {
	DWORD hit_count
	// if hit_count!=0 [
		DWORD id
		DWORD lcs
		DWORD word_count
		float tf_idf
		float min_idf
		float max_idf
		float sum_idf
		int min_hit_pos
		int min_best_span_pos
		int max_window_hits
		int min_gaps
		float atc
		DWORD lccs
		float wlccs
	// ] // end of optional block
} [num_of_field]

// word level factors
DWORD max_uniq_qpos
struct qpos {
	DWORD keyword_mask
	// if keyword_mask!=0 [
		DWORD id
		int tf
		float idf
	// ] // end of optional block	
} [max_uniq_qpos]

// fields tf factors
DWORD fields
int fields_tf[fields]

Read overview to learn about factors.

Sphinxql dumps

Disclaimer: Sphinxql dumps is not part of binary API protocol, it is just internal structure that sometimes may come in blob from daemon. So, from viewpoint of binary API protocol, sphinxql dumps is just a blob, or array of bytes, and it is send over network as is, without any conversion like byte order adjusting and etc. So, one side sends array of bytes, another receives it - and area of communication protocol is finished at this point; next you need to parse or interpret that blob some way.

Sphinxql dumps is usual dump of mysql proto message, and you can find necesary information on MySQL site. However Manticore supports mysql proto only partically, on the most basic level which is enough to send queries and receive answers, without any modern and sometimes complex things like compression and cryptography.

Manticore implements both parsing of incoming messages from mysql proto and also sending responses in mysql proto. Since in binary API you can meet only responses, encapsulated as blobs of SEARCHD_COMMAND_SPHINXQL, there is no need to describe format of incoming messages here, so such information is omited.

MYSQL packed numbers

Mysql uses own VLB compression for integer numbers. It is prefix-based, so that first byte of sequence determines the size of payload and m.b. even value itself.

An interer can consume 1, 3, 4 or 9 bytes depending from it's numeric value. A fixed-length integer stores its value in a series of bytes in little-endian order (least significant byte first).

  • Values < 251 stored as 1-byte integer
  • Values from 251 and up to 65535 (0xFFFF) stored as 0xFC + 2-byte integer
  • Values from 65536 and up to 0xFFFFFF (24 bits) stored as 0xFD + 3-byte integer
  • Values from 0x1000000 up to 0xFFFFFFFF (32 bits) stored as 0xFE + 4-byte integer + 0x00 0x00 0x00 0x00 sequence.

NOTE: prefix 0xFE actually sends 8-byte integer, but manticore uses only 4 byte, padding the rest 4 bytes with zeros.

In next sections such packet numbers will be refferd as having mysqlint type.

MYSQL strings

In following sections strings marked with following ways:

string[EOF] - a string, which is the last component of a packet, its length can be calculated from the overall packet length minus the current position.

string[lenenc] - a string, prefixed with mysqlint describing the length of the string.

MYSQL packet

Each packet starts with packet ID and size of following payload. These values packed into one DWORD, as:

0xPPS3S2S1 Highest byte 0xPP is packet ID, it is incremented for each next packed. When it reach 0xFF it starts again from 0x00. Rest 24 bits of value (0xS3S2S1)is size of following payload, in bytes. So, one mysql packet could contain maxumum 0x00FFFFFF bytes + 4 bytes for header itself.

This DWORD placed in little-endian order, so that in terms of raw bytes you'll see sequence 0xS1 0xS2 0xS3 0xPP in the beginning of mysql response.

Using the length part of header's DWORD you can rapidly step to the next packet.

Blob may contain one or many packets, placed one-by-one.

Manticore uses following types of messages:

OK packet
DWORD packet_header // described above
byte header // =0x00
mysqlint rows_affected
mysqlint last_insert_id // =0
WORD status
WORD num_of_warnings
string[EOF] message // ok message

OK packet is successive response to an operation which returns no rowset, like 'insert into', 'set variable', etc.

header for OK packet is always 0 (that is flag of 'OK' packet)

rows_affected - number of rows affected by operation (like '10 rows' inserted, or '1 row' deleted, etc.)

last_insert_id - Manticore always returns 0. (real mysql returns last ID used in ops with auto-incrementing values)

status - is a bitfield.

Manticore may use 2 bits to indicate:

FLAG_STATUS_AUTOCOMMIT 2 // auto-commit is enabled
FLAG_MORE_RESULTS = 8 // if packet is just a part of big resultset

The rest of bits are zero.

num_of_warnings - obvious. That is just a number; messages themselves are not placed in this packet.

message - status message.

Example: OK packet with 0 affected rows, enabled AUTOCOMMIT and 0 warnings.

07 00 00 02 00 00 00 02 00 00 00
EOF packet
DWORD packet_header // described above
byte neader // =0xFE
WORD num_of_warnings
WORD status

EOF packet looks similar as OK packet. It terminates any continuous sequence of packets as list of header's columns, or whole resultset.

Note that header byte 0xFE is same as prefix for 32-bits numbers, so that meeting this byte just after the header may occur both at the beginning of EOF packet, and also as a part of the number of columns of RESULTHEADER where right after the packet header follows packed int with the num. You must check whether the packet length is less than 9 to make sure that it is a EOF packet.

Example: EOF packet with 0 warnings, AUTOCOMMIT enabled:

05 00 00 05 FE 00 00 02 00
ERROR packet

It contains no rowset, but error message. Manticore implements these format:

DWORD packet_header // described above
byte header // 0xFF
WORD error_code
string[EOF] message // error message

header for Error packet is always 0xFF

error_code contains one of:

UNKNOWN_COM_ERROR = 1047
SERVER_SHUTDOWN = 1053
PARSE_ERROR = 1064
FIELD_SPECIFIED_TWICE = 1110
NO_SUCH_TABLE = 1146
TOO_MANY_USER_CONNECTIONS = 1203

message - error message. It begins with fixed 6-chars prefix, depends from error_code:

  • "#08S01" - for SERVER_SHUTDOWN and UNKNOWN_COM_ERROR codes
  • "#42S02" - for NO_SUCH_TABLE code
  • "#42000" - for the rest of the codes (PARSE_ERROR, FIELD_SPECIFIED_TWICE and TOO_MANY_USER_CONNECTIONS)
FIELD packet

That is packet, describing one field of a table as name, type, and size It has following structure:

DWORD packet_header // described above
string[lenenc] def // ="def"
string[lenenc] db // =""
string[nenenc] table // =""
string[lenenc] org_table // =""
string[lenenc] name
string[lenenc] org_name // same as name
byte filter // =12
WORD charset_nr // =0x0021, means utf8
DWORD column_length 
byte type
WORD flags
byte decimals // =0
WORD filter // =0 

All fixed values are defined in description and just mysql-specific. Manticore set following fields from the packet:

name - name of the column org_name - is, again, name of the column. Manticore set both name and org_name to one and same string.

type - type of the column. That is one of:

MYSQL_COL_DECIMAL = 0
MYSQL_COL_LONG = 3
MYSQL_COL_FLOAT = 4
MYSQL_COL_LONGLONG = 8
MYSQL_COL_STRING = 254

column_length - length of the column. It depends from type and is:

  • 11 for MYSQL_COL_LONG
  • 20 for MYSQL_COL_DECIMAL, MYSQL_COL_FLOAT and MYSQL_COL_LONGLONG
  • 255 for MYSQL_COL_STRING

flags - is logical part of the type. Manticore uses only 1 bit which describe whether numeric is signed or not. So that whole value may be either 0 (column is signed), or 32 (column is unsigned).

RESULTSET

Resultset contains table data from daemon. It is not one packet, but sequence of packets, describing first schema of resultset, and then data itself.

First packet in sequence determines number of columns in resultset. It is tiny and contains just header and mysqlint number of columns:

DWORD packet_header // described above
mysqlint column_count

Above it is already described possible collision between first 0xFE byte as flag of EOF packet in opposite of the same as part of mysqlint encoding schema of column_count.

Then follow FIELD packets, according to number of columns passed in first packet.

Then follows EOF

That 3+ packets determines table's header. After this header may follow data itself.

Each data packet contains values of a whole row of a table and consists from usual packed_header, followed by the data itself. Data is:

  • NULL values is sent as 0xFB
  • all other values are converted into a string and send as string[lenenc].

After all data packets follows EOF packet.

Clone this wiki locally