The Odessa query language permits users to find word patterns in documents in the Odessa document collection. Not all character patterns are English words (they may contain numbers or other symbols), so they will be called terms. The query language is capable of finding short phrases of up to three consecutive terms, or two such phrases close to each other in the document collection. Odessa supports umlauts, but with very few exceptions the umlauts in the documents have been replaced by their -e equivalents, and the ßs have been replaced by ss. Thus, umlauts should not be used in query terms.
When we refer to the distance between terms or phrases, remember that we are counting terms. Punctuation or whitespace is ignored, so that two terms might be considered adjacent even if they are several lines apart in a document.
If you search for Fred Bieber, you will find only the Fred Biebers and not Fred W. Bieber or Bieber, Fred because you are searching for a particular phrase. If you want to retrieve all three of these, use proximity search by entering the query, Fred w/2 Bieber. This means, find all the freds within 2 words of bieber, in either order. What you find will be only as good as your query.
To make an Odessa query, enter the query into the text box and select the document collection in the listbox beneath it. Some documents may be included in several collections. For example, some land records may also be included in the Prussia-Poland collection.
- A query consists of a single phrase or two phrases in spatial proximity.
- A phrase consists of 1, 2, or 3 consecutive terms.
- A query term must either be all-numeric or begin with a letter, in which case subsequent characters may be letters, numerals, or underscores.
- Query terms may end in wildcards % and *. Wildcards may appear only at the end of a term, but in any order. % matches any single character, and * matches a string of any length.
- At least one term in a query must begin with a letter.
- There are two proximity operators which separate phrases; these have the form w/n and b/n.
phrase1 b/n phrase2 means phrase1 must precede phrase2 by at most n words (n-1 words between phrases).
phrase1 w/n phrase2 means phrase1 must occur within n words of phrase2 (n-1 words bwtween phrases).
- xyz matches all occurrences of xyz
- xyz* matches all terms of all lengths that begin with xyz
- xyz%%% matches all 6-letter terms beginning with xyz
- xyz%%* matches all terms of at least 5 letters that begin with xyz
- But xy*z, x%%yz, and xy%*z would not interpret % or * as wildcards because they don't appear at the end
The first example consists of a single term with no wildcards, and the second example consists of a single phrase of 3 terms. The third example will locate all the occurrences of terms of 4 charcters or more, beginning with Joh, that occur within 3 terms of Naasz. This query would locate Johann and Magdelina (Naasz), Naasz, Johann, and John Naasz but not John and Magdalena Fischer-Naasz since the two query terms are 4 terms apart. The fourth example is similar to the third, except that it would locate phrases like Roger W. Ehrich in a file copyright header, which contains the word, copyright. The next example would find occurrences of the term, baerg, in files prepared by Elli Wise, since her name would be in the copyright notice preceding the query term. The last example would find any occurrence of Naasz within 2 words of any date in the 1870s.
- Catherine the Great
- Joh%* w/3 Naasz
- Rog%% W. Ehrich w/20 copy*
- wise b/100000 baerg
- Naasz w/2 187*
Feel free to try out these examples. Keep in mind, though, that you won't find matches in all file categories.
The output presentation depends upon the file type. For free format text files, results usually appear in context, while for tables, the results usually appear without context. Each data display is preceded by a hyperlink to the source file. The number after the word File: is the file length in bytes.
Back to Odessa