Elastic-search takes all our data, cleans it (stemming, HTML stripping, etc), then puts into an inverted index where content is separated and organized.
Search uses three (3) fields:
- The _all field is essentially a very large string made out of an item’s inverted index (non-module widget content, documents, module items, etc.).
- You "hit" on this data based on occurrence and relevancy metrics.
- It also searches title and tags, and if those "hit" it carries more weight (see Boosts) than hitting on the ALL string.
Search Result Weights
The only two things that receive a weighted factor (or a boost) in search results are Titles and Keywords. The rest of the results are based on the frequency of the search term.
The search tool looks at everything in a long string and calculates how many times a term is referenced in content areas, descriptions, widgets, (etc) and it suggests items based on how many times it is referenced in all the searches. If a term is also in a title or keywords then it shows higher in the results.
- Title = 1.5
- Tags/keywords = 1.25
- Decayed modules:
- Agenda Center
- Community Connection
- Community Voice
- Photo Gallery
- Real Estate Locator
- Pages do not decay.
- Elastic Guide for Decay Functions
- We use an exponential decay
- We are at:
- "Scale": "21d"
- "Offset": "21d"
- "Decay": 0.9
- "Scale": "21d"
- Decays use different dates in the modules
- Calendar decays on Event Date, Agenda Center on Meeting Date
Items in Document Center, Pages, Facilities, Form Center, Forms, Staff Directory, FAQs, and Notify Me do not have a decay date. We believe it would actually hinder best search results if we said after x amount of time, this expires from showing in results. The reason being is that it is quite possible that a document from 2010 is still the most accurate and relevant item that a site user needs. So, the date is not a factor for search results in regard to the modules mentioned earlier.