Search Syntax
Use the ZyLAB ONE search language to search for one or more keywords within a data set. Not only the keywords used in text, image or audio files, but also the keywords used in the information about these files (the metadata), can be searched.
Search Language Techniques
View the ZyLAB search language techniques in the table below.
Operators | Term Operators | ||
---|---|---|---|
AND | Fuzzy | ~n | |
OR | Wild Cards | ? | |
NOT |
| * | |
TO |
| [character(s)] | |
IN fieldname{query} |
| [character-range] | |
BOD, EOS, OEP, EOG, EOD |
| [^] | |
Within
| W/n |
| + |
W/n/term |
| {m,n} | |
| /n,m/ |
| {m} |
Precedes
| P/n |
| {m,} |
P/n/term |
| ||
Number Range
| < |
|
|
<= |
|
| |
= |
|
| |
<> |
|
| |
> | Field Filter | ||
>= | fieldname=query | ||
Quorum | n of {term, term, ..} |
| |
Exclude List of Terms from Fuzzy/Wild Card Query | |||
fuzzy/wild card query - {exclude_term_1, ..., exclude_term_n} |
Please note that though some
operators are expressed in capital letters, this is only done for your personal clarity. The search engine does not differentiate between capital and lowercase letters.
There should always be a space between an operator and a keyword, otherwise both operator and keyword will be seen as one
term. These are correct: "NOT term", "not term". These are not correct: "NOTterm", "notterm".
However, when using parentheses to surround the keyword, no spaces are necessary: "NOT(term)". For more information, see the definition of Parentheses.
- Use the Term operators on the right to enhance your search queries. For more information, see:
-
Use the Boolean and Proximity operators on the left (AND, OR, NOT, etc.) to combine terms or phrases, making the search query more effective. Operators can be used to broaden or narrow your search. They can also be used to define your search query more precisely.
For more information, see: - Use the Field filter to search on field names. For more information, see:
- Use the Exclude List to exclude specific terms from your fuzzy or wild card query. For more information, see:
- For more search techniques see:
Search Results Explained
Once a
search query is being executed a result list will appear. Retrieved terms (occurrences) will be highlighted in the files. Of course, to be found, terms need to be present in the file. However, whether a term is retrieved also depends on the settings in the
character map, the indexing structure and the
tokenizer.
Based on the character map the tokenizer will process all files. How this is done, we will explain here.
The building blocks of a text file are characters, (hyphenated) terms and phrases.
Characters
are letters, numbers or symbols like %, @, &, ^, *, etc.
Terms
are characters or words; they are unique entries in the dictionary with a separator on either side.
Phrases
are two or more terms with no intervening operator.
Hyphenated terms
(such as sugar-free) are two or more separate terms, connected with a hyphen. Each part of a hyphenated term has the same token id, given by the tokenizer. When searching for "sugar-free", you will only find instances of "sugar-free". When searching for "sugar free", you will get more results, including "sugar", "free", "sugarfree" and "sugar-free".
Token |
I |
like |
sugar- |
free |
food |
EOS |
EOD |
Token id |
1 |
2 |
3 |
3 |
4 |
x |
x |
A token id is the natural number or position of a token, given by the tokenizer. Token ids are used to determine the distance between the terms. Separators do not have token ids. If a term or combination of terms you are searching for contains a hyphen, that term will be found, even if you did not include a hyphen in your search query. For example, when you search for 'email' or 'e mail', it will also find 'e-mail'. However, 'e-mail' will only retrieve 'e-mail'. In addition, 'e mail' will not find 'email' or the other way around ('email' will not find 'e mail').
The tokenizer extracts text from a file and produces tokens, based on the settings defined in the character map. Tokens can be anything between two separators. Tokens are the identified small parts that form or define a file. Tokens are not terms! For example, hyphenated terms all have the same token id, but are separate terms. And a separator (for example, EOD) can be a token, but not a term.
The character map determines which characters are used to separate terms, which characters are indexed, which ones are used for punctuation, etc. All possible characters that can be recognized and searched on are listed in the character map. By default some characters are not indexed and will not be found unless the default character map is adjusted. How characters are defined in the character map, influences the outcome of a search. For example, when brackets are set to be separators, the following text will be identified as 3 terms: 'most definite(ly)'.
For more information on the character map and how to configure it, please contact support (http://help.zylab.com).
In addition to the characters defined in the character map to be recognized by the tokenizer as separators, the tokenizer creates separators to mark beginning of a document (BOD), the end of a sentence (EOS), end of a paragraph (EOP), end of a page (EOG) or the end of a document (EOD). You can search for the operators BOD, EOS, EOP, EOG and EOD.
Tip: When searching for EOD, the query returns all files with nothing highlighted. Since each file has an EOD token, it is an easy query to find all files in a data set.