A Search DSL

Sherub Thakur
7 min readApr 27, 2017

If you have used google, slack, gmail, etc. you would have come across their search feature. A lot of apps support this feature. In this article I will try to create a search like feature using Haskell and Elasticsearch.

Prerequisites

  1. Understanding of Haskell programming language.
  2. Familiarity with Elasticsearch.

Elasticsearch

If you have gone through the ES docs you would have come across their bank query example. This is what we will be using as the base to what we are about to create. The steps to setup ES with bank data are covered in the docs so I won’t be going over them but the basic idea is we have documents of the following structure.

{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie@euron.com",
"city": "Hobucken",
"state": "CO"
}

And we can query them using something like the following.

GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}

Which will return us all bank accounts (out of 1000) which have account number equal to 20. As you would have noticed that the search syntax uses JSON! I repeat JSON!! Again JSON!!! You would not want to subject even your developers to write that (let alone users). So we need a developer friendly way to do this. Enter ….

Haskell

Disclaimer: I am a Haskell beginner so the quality of code might be questionable to say the least.

Haskell has a really sweet Elasticsearch client and query DSL, Bloodhound. Like any other haskell library/framework, there aren’t enough tutorials or docs around this library as well. But you can take a look at this example. The basic idea is that instead of writing something like the aforementioned JSON atrocity you can write something like.

TermQuery (Term "account_number" "20") Nothing

which (again) will return all the bank accounts (out of 1000) which have account number equal to 20. I don’t know about you but that is certainly and improvement. Which we can subject developers to work with!

But we are far from the end state we want our users to type something like

account_number:20

which (well again) would return the data for account numbered 20.

Now we need a way to parse the query above to the query above² (I hope you get the point, and agreed shitty joke) and from that Bloodhound will convert that to the query above³. Enter Parsec. According to docs “Parsec is an industrial strength, monadic parser combinator library for Haskell”.

At last we have all the tools we need and we can get to building something with it. YAY!

The Bank Query Language

Ok so first let’s declare the types. All we want for our purpose is to declare some operators that we will use. :, <, <=, >=, > . And the query structure that we will support are string, number, date, and, or . And that is what we are defining below.

data Operator
= B_LessThan
| B_GreaterThan
| B_LessThanEqual
| B_GreaterThanEqual
| B_Equal
deriving (Show, Eq)
data BQL
= B_String Text Text
| B_Num Text Operator Double
| B_Date Text Operator UTCTime
| B_Not BQL
| B_And [BQL]
| B_Or [BQL]
deriving (Show, Eq)

Now let’s define some primitive Parser Combinators that we will find useful. It is more or less like boilerplate and can even be ignored.

-- Lexical token spec
bqlDef :: LanguageDef st
bqlDef =
emptyDef
{ commentStart = ""
, commentEnd = ""
, commentLine = ""
, nestedComments = False
, identStart = letter
, identLetter = alphaNum
, reservedNames = [ "and", "or", "in"]
, reservedOpNames = [ ":", "-", "=", ">", "<", ">=", "<="]
, caseSensitive = False
}
lexer = Token.makeTokenParser bqlDef
reserved = Token.reserved lexer
reservedOp = Token.reservedOp lexer
double = Token.float lexer
integer = Token.integer lexer
int = fromInteger <$> Token.integer lexer
parens = Token.parens lexer
identifier = fmap pack (Token.identifier lexer)

Now let’s get to some slightly more involved Combinators. We are only supporting one kind of date parsing for now of the form yyyy/mm/dd.

date :: Parser UTCTime
date =
let
dayParser = fromGregorian <$> integer <*> (char ‘-’ *> int) <*> (char ‘-’ *> int)
in
UTCTime <$> dayParser <*> pure (secondsToDiffTime 0)

And for string field values we will parse either everything until we see a space or we will parse everything inside quotes.

fieldValue :: Parser Text
fieldValue =
let
anyStr = many (satisfy (not . isSpace))
escape = (:) <$> char ‘\\’ <*> ((:) <$> oneOf “\\\”0nrvtbf” <*> pure [])
nonEscape = noneOf “\\\”\0\n\r\v\t\b\f”
character = fmap return nonEscape <|> escape
parseString = concat <$> between (char ‘“‘) (char ‘“‘) (many character)
in
fmap pack (parseString <|> anyStr)

Now let’s parse the operators that we will be using which is quite self explanatory.

operator :: Parser Operator
operator
= (reservedOp “>” *> pure B_GreaterThan)
<|> (reservedOp “<” *> pure B_LessThan)
<|> (reservedOp “<=” *> pure B_LessThanEqual)
<|> (reservedOp “>=” *> pure B_GreaterThanEqual)
<|> (reservedOp “:” *> pure B_Equal)

Now let’s start with parsing the base cases of the BQL data type.

B_String is an identifier followed by : which is followed by fieldValue .

B_Num is an identifier followed by any operator followed by an integer or a double.

B_Date is an identifier followed by any operator followed by a date.

customString :: Parser BQL
customString = B_String <$> (identifier <* reservedOp “:”) <*> fieldValue
customNum :: Parser BQL
customNum = B_Num <$> identifier <*> operator <*> (try double <|> fmap fromInteger integer)
customDate :: Parser BQL
customDate = B_Date <$> identifier <*> operator <*> date

With that we can create the basic definition for the search query language parens' and not' will be defined later.

bql' :: Parser BQL
bql' = parens'
<|> not'
<|> try customDate
<|> try customNum
<|> customString

Now let’s define the recursive cases in our data type. Here not that we are treating empty spaces as and expressions

-- bql' queries seperated by and or spaces
andQ :: Parser BQL
andQ = B_And <$> sepBy1 bql' (reserved "and" <|> spaces)
-- bql' queries or and queries seperated by or expressions
orQ :: Parser BQL
orQ = B_Or <$> sepBy1 (andQ <|> bql') (reserved "or")
-- `-` followed by a bql query
not' :: Parser BQL
not' = B_Not <$> (reservedOp "-" *> bql')
-- bql query inside of parenthesis
parens' :: Parser BQL
parens' = parens bql

And finally our definition for the query language for which all this was build. Are your ready!!

-- bql is nothing but an or query
bql :: Parser BQL
bql = orQ

Now that we have our query let’s also try to optimize it. i.e remove a lot of redundant and or blocks. Good news is Haskell’s pattern matching makes it a breeze. This code block is quite self explanatory.

optimizebql :: BQL -> BQL
optimizebql (B_Or [query]) = optimizebql query
optimizebql (B_And [query]) = optimizebql query
optimizebql (B_Or ors) = B_Or (map optimizebql ors)
optimizebql (B_And ands) = B_And (map optimizebql ands)
optimizebql (B_Not notq) = B_Not (optimizebql notq)
optimizebql query = query

Lastly we need to parse the mess that the user input’s

parseBQL :: String -> Either ParseError BQL
parseBQL = parse bql ""

And we’re done with parsing.

Bloodhound DSL Conversion

Now that we are able to parse in user input let’s convert that to bloodhound’s DSL and we will be done. yay

getRangeValue :: Operator -> Double -> RangeValue
getRangeValue B_LessThan value =
RangeDoubleLt (LessThan value)
getRangeValue B_GreaterThan value =
RangeDoubleGt (GreaterThan value)
getRangeValue B_LessThanEqual value =
RangeDoubleLte (LessThanEq value)
getRangeValue B_GreaterThanEqual value =
RangeDoubleGte (GreaterThanEq value)
getRangeValue B_Equal value =
RangeDoubleGteLte (GreaterThanEq value) (LessThanEq value)
getDateRange :: Operator -> UTCTime -> RangeValuegetDateRange B_LessThan value =
RangeDateLt (LessThanD value)
getDateRange B_GreaterThan value =
RangeDateGt (GreaterThanD value)
getDateRange B_LessThanEqual value =
RangeDateLte (LessThanEqD value)
getDateRange B_GreaterThanEqual value =
RangeDateGte (GreaterThanEqD value)
getDateRange B_Equal value =
RangeDateGteLte (GreaterThanEqD value) (LessThanEqD value)

And we now for the main query conversion

bqlToES :: BQL -> Query
bqlToES (B_String k v) =
QueryMatchQuery $ mkMatchQuery (FieldName k) (QueryString v)
bqlToES (B_Date k op v) =
QueryRangeQuery $ mkRangeQuery (FieldName k) (getDateRange op v)
bqlToES (B_Num k op v) =
QueryRangeQuery $ mkRangeQuery (FieldName k) (getRangeValue op v)
bqlToES (B_And qs) =
QueryBoolQuery $ mkBoolQuery (map bqlToElastic qs) [] []
bqlToES (B_Or qs) =
QueryBoolQuery $ mkBoolQuery [] [] (map bqlToElastic qs)
bqlToES (B_Not q) =
QueryBoolQuery $ mkBoolQuery [] [bqlToElastic q] []

Now let’s get to executing this query.

queryES' :: (BH IO Reply -> IO Reply) -> IndexName -> BQL -> IO ByteString
queryES' withBH' bankIndex bankQuery =
let
searchQ = bqlToElastic bankQuery
filterQ = Nothing
query = mkSearch (Just searchQ) filterQ
in
fmap responseBody (withBH' $ searchByIndex bankIndex query)
-- quick Hack! Please don't judge.
queryES :: (BH IO Reply -> IO Reply) -> IndexName -> BQL -> IO [Value]
queryES withBH' bankIndex =
fmap (fromRight' . fmap (fromJust . mapM hitSource . hits . searchHits) . eitherDecode) . queryES' withBH' bankIndex

And we are done.

Conclusion

We saw that how easy it is to create DSL’s using haskell and in that effect we created our own DSL around ES’s crappy search syntax with the help of Parsec and Bloodhound.

Resources

  1. You can find all the code and more in the bank-query-es repo.
  2. You can execute some queries here.
  3. There is even a UI written in elm. (NOTE: You will need to allow mixed content for this to work as the connection from github to the place where Haskell code is hosted is over http).

--

--