This is a list of functions you can use in Solr queries, lifted from the docs1 so you can find them more easily. (I’ve added several that are missing in the official docs)
Function |
Description |
Syntax Examples |
---|---|---|
abs | Returns the absolute value of the specified value or function. | abs(x) abs(-5) |
and | Returns a value of true if and only if all of its operands evaluate to true. | and(not (exists (popularity)), exists (price)): returns true for any document which has a value in theprice field, but does not have a value in thepopularity field |
“constant” | Specifies a floating point constant. | 1.5 |
currency | Converts a currency into a numeric value usable for sorting – in other words, if you have multiple dollar values, they are converted into a common form, usable for sorting. Requires a currencyField in the index and a source of exchange rates. See official docs for CurrencyField. | currency(10,USD) |
def | def is short for default. Returns the value of field “field”, or if the field does not exist, returns the default value specified. and yields the first value where exists()==true .) |
def(rating,5): This def() function returns the rating, or if no rating specified in the doc, returns 5 def(myfield, 1.0): equivalent toif(exists(myfield),myfield,1.0) |
div | Divides one value or function by another. div(x,y) divides x by y. | div(1,y) div(sum(x,100),max(y,1)) |
dist | Return the distance between two vectors (points) in an n-dimensional space. Takes in the power, plus two or more ValueSource instances and calculates the distances between the two vectors. Each ValueSource must be a number. There must be an even number of ValueSource instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector. | dist(2, x, y, 0, 0): calculates the Euclidean distance between (0,0) and (x,y) for each document dist(1, x, y, 0, 0) : calculates the Manhattan (taxicab) distance between (0,0) and (x,y) for each document dist(2, x,y,z,0,0,0): Euclidean distance between (0,0,0) and (x,y,z) for each document. dist(1,x,y,z,e,f,g) : Manhattan distance between (x,y,z) and (e,f,g) where each letter is a field name |
docfreq(field,val) | Returns the number of documents that contain the term in the field. This is a constant (the same value for all documents in the index). You can quote the term if it’s more complex, or do parameter substitution for the term value. |
docfreq(text,'solr') ...&defType=func &q=docfreq(text,$myterm)&myterm=solr |
exists | Returns TRUE if any member of the field exists. | exists(author) returns TRUE for any document has a value in the “author” field.exists(query(price:5.00)) returns TRUE if “price” matches “5.00”. |
field | Returns the numeric docValues or indexed value of the field with the specified name. In it’s simplest (single argument) form, this function can only be used on single valued fields, and can be called using the name of the field as a string, or for most conventional field names simply use the field name by itself with out using the field(...) syntax.
When using docValues, an optional 2nd argument can be specified to select the “ 0 is returned for documents without a value in the field. |
These 3 examples are all equivalent:
The last form is convinient when your field name is atypical:
For multivalued docValues fields:
|
hsin | The Haversine distance calculates the distance between two points on a sphere when traveling along the sphere. The values must be in radians. hsin also take a Boolean argument to specify whether the function should convert its output to radians. |
hsin(2, true, x, y, 0, 0) |
idf | Inverse document frequency; a measure of whether the term is common or rare across all documents. Obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient. See also tf . |
idf(fieldName,'solr') : measures the inverse of the frequency of the occurrence of the term 'solr' infieldName . |
if | Enables conditional function queries. In if(test,value1,value2) :
An expression can be any function which outputs boolean values, or even functions returning numeric values, in which case value 0 will be interpreted as false, or strings, in which case empty string is interpreted as false. |
if(termfreq (cat,'electronics'), popularity,42) : This function checks each document for the to see if it contains the term “ electronics ” in the cat field. If it does, then the value of the popularity field is returned, otherwise the value of 42 is returned. |
linear | Implements m*x+c where m and c are constants and x is an arbitrary function. This is equivalent tosum(product(m,x),c) , but slightly more efficient as it is implemented as a single function. |
linear(x,m,c) linear(x,2,4) returns 2*x+4 |
literal | Returns a constant value | literal(5) – returns “5” |
log | Returns the log base 10 of the specified function. | log(x)
|
map | Maps any values of an input function x that fall within min and max inclusive to the specified target. The arguments min and max must be constants. The arguments target and default can be constants or functions. If the value of x does not fall between min and max, then either the value of x is returned, or a default value is returned if specified as a 5th argument. |
map(x,min,max,target) map(x,0,0,1) – changes any values of 0 to 1. This can be useful in handling default 0 values.
|
max | Returns the maximum numeric value of multiple nested functions or constants, which are specified as arguments:max(x,y,...) . The max function can also be useful for “bottoming out” another function or field at some specified constant.
(Use the |
max(myfield,myotherfield,0) |
maxdoc | Returns the number of documents in the index, including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index). | maxdoc() |
min | Returns the minimum numeric value of multiple nested functions of constants, which are specified as arguments:min(x,y,...) . The min function can also be useful for providing an “upper bound” on a function using a constant.
(Use the |
min(myfield,myotherfield,0) |
mod | Divide one number by another, and return the remainder | mod(myfield,7) – returns numbers ranging from 0 to 6 |
ms | Returns milliseconds of difference between its arguments. Dates are relative to the Unix or POSIX time epoch, midnight, January 1, 1970 UTC. Arguments may be the name of an indexed TrieDateField , or date math based on a constant date or NOW .
|
ms(NOW/DAY) ms(2000-01-01T00:00:00Z) ms(mydatefield) ms(NOW,mydatefield) ms(mydatefield, 2000-01-01T00:00:00Z) ms(datefield1, datefield2) |
norm(field) | Returns the “norm” stored in the index for the specified field. This is the product of the index time boost and the length normalization factor, according to the Similarity for the field. | norm(fieldName) |
not | The logically negated value of the wrapped function. | not(exists(author)) : TRUE only whenexists(author) is false. |
numdocs | Returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index). | numdocs() |
or | A logical disjunction. | or(value1,value2): TRUE if either value1 or value2 is true. |
ord | Returns the ordinal of the indexed field value within the indexed list of terms for that field in Lucene index order (lexicographically ordered by unicode value), starting at 1. In other words, for a given field, all values are ordered lexicographically; this function then returns the offset of a particular value in that ordering. The field must have a maximum of one value per document (not multi-valued). 0 is returned for documents without a value in the field.
See also |
ord(myIndexedField) Example: If there were only three values (“apple”,”banana”,”pear”) for a particular field X, then: ord(X) would be 1 for documents containing “apple”, 2 for documnts containing “banana”, etc… |
pow | Raises the specified base to the specified power. pow(x,y) raises x to the power of y. |
pow(x,y) pow(x,log(y)) pow(x,0.5): the same as sqrt |
product | Returns the product of multiple values or functions, which are specified in a comma-separated list. mul(...) may also be used as an alias for this function. |
product(x,y,...) product(x,2) product(x,y) |
query | Returns the score for the given subquery, or the default value for documents not matching the query. Any type of subquery is supported through either parameter de-referencing $otherparam or direct specification of the query string in the Local Parameters through the v key. |
query(subquery, default) q=product (popularity, query({!dismax v='solr rocks'}) : returns the product of the popularity and the score of the DisMax query. q=product (popularity, query($qq))&qq={!dismax}solr rocks : equivalent to the previous query, using parameter de-referencing. q=product (popularity, query($qq,0.1)) &qq={!dismax} solr rocks : specifies a default score of 0.1 for documents that don’t match the DisMax query. |
recip | Performs a reciprocal function with recip(x,m,a,b) implementing a/(m*x+b) where m,a,b are constants, andx is any arbitrarily complex function.
When a and b are equal, and x>=0, this function has a maximum value of 1 that drops as x increases. Increasing the value of a and b together results in a movement of the entire function to a flatter part of the curve. These properties can make this an ideal function for boosting more recent documents when x is |
recip(myfield,m,a,b) recip(rord (creationDate), 1,1000,1000) |
rord | Returns the reverse ordering of that returned by ord . |
rord(myDateField) |
scale | Scales values of the function x such that they fall between the specified minTarget and maxTarget inclusive. The current implementation traverses all of the function values to obtain the min and max, so it can pick the correct scale.
The current implementation cannot distinguish when documents have been deleted or documents that have no value. It uses 0.0 values for these cases. This means that if values are normally all greater than 0.0, one can still end up with 0.0 as the min value to map from. In these cases, an appropriate map() function could be used as a workaround to change 0.0 to a value in the real range, as shown here: |
scale(x, minTarget, maxTarget) scale(x,1,2) : scales the values of x such that all values will be between 1 and 2 inclusive. |
sqedist | The Square Euclidean distance calculates the 2-norm (Euclidean distance) but does not take the square root, thus saving a fairly expensive operation. It is often the case that applications that care about Euclidean distance do not need the actual distance, but instead can use the square of the distance. There must be an even number of ValueSource instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector. | sqedist(x_td, y_td, 0, 0) |
sqrt | Returns the square root of the specified value or function. | sqrt(x)sqrt(100)sqrt(sum(x,100)) |
strdist | Calculate the distance between two strings. Uses the Lucene spell checker StringDistance interface and supports all of the implementations available in that package, plus allows applications to plug in their own via Solr’s resource loading capabilities. strdist takes (string1, string2, distance measure). Possible values for distance measure are: jw: Jaro-Winkler edit: Levenstein or Edit distance ngram: The NGramDistance, if specified, can optionally pass in the ngram size too. Default is 2. FQN: Fully Qualified class Name for an implementation of the StringDistance interface. Must have a no-arg constructor. |
strdist("SOLR",id,edit) |
sub | Returns x-y from sub(x,y). | sub(myfield,myfield2) sub(100, sqrt(myfield)) |
sum | Returns the sum of multiple values or functions, which are specified in a comma-separated list. add(...) may be used as an alias for this function |
sum(x,y,...) sum(x,1) sum(x,y) sum(sqrt(x),log(y),z,0.5) |
sumtotaltermfreq | Returns the sum of totaltermfreq values for all terms in the field in the entire index (i.e., the number of indexed tokens for that field). (Aliases sumtotaltermfreq to sttf .) |
If doc1:(fieldX:A B C) and doc2:(fieldX:A A A A): docFreq(fieldX:A) = 2 (A appears in 2 docs) freq(doc1, fieldX:A) = 4 (A appears 4 times in doc 2) totalTermFreq(fieldX:A) = 5 (A appears 5 times across all docs) sumTotalTermFreq(fieldX) = 7 in fieldX , there are 5 As, 1 B, 1 C |
termfreq | Returns the number of times the term appears in the field for that document. | termfreq(text,'memory') |
tf | Term frequency; returns the term frequency factor for the given term, using the Similarity for the field. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the document, which helps to control for the fact that some words are generally more common than others. See also idf . |
tf(text,'solr') |
top | Causes the function query argument to derive its values from the top-level IndexReader containing all parts of an index. For example, the ordinal of a value in a single segment will be different from the ordinal of that same value in the complete index.
The |
|
totaltermfreq | Returns the number of times the term appears in the field in the entire index. (Aliases totaltermfreq to ttf .) |
ttf(text,'memory') |
xor() | Logical exclusive disjunction, or one or the other but not both. | xor(field1,field2) returns TRUE if either field1 orfield2 is true; FALSE if both are true. |
- https://cwiki.apache.org/confluence/display/solr/Function+Queries [↩]