Searching Drupal
Damien McKenna
Searching Drupal
- Taken for granted
- Assumption that "it'll just work"
But first.. a story
- Hired by Bonnier for 3 month Drupal 5 project
- Short development cycle
- Made assumptions
- Made compromises
A story, continued
- One major assumption...
- "Search will work good enough"
- "Tweak later"
- Put another way...
- "Search will work"
A story, continued
- Launched site
- Seemed OK, could find results
- Complaints of search missing content
- 57,000 nodes - articles, images, etc
- 7,000 nodes indexed
A story, continued
- Dug around, asked around
- Mike Anello found it..
A story, continued
- D5 search engine indexing flawed
- Indexing tracks last timestamp, last nid, last comment timestamp...
- Kludgy
- If data converted, strong chance of missing some
- Out of 57,000 nodes..
- Only indexed about 5,500!
A story, continued
- Dug further
- Solution...
- Use Drupal 6's engine!
- Track each node individually
- Recommended for all D5 sites!
Two parts of search
- Internal
- Search when already on the site
- External
- Search from outside
- Google, etc
- Good amount of overlap
Internal Search
- Logical content hierarchy
- Each item element given different weight
- Title field most important
- Then body structure - h1, h2, h3, etc
Internal Search - Tip
- Put key words in Title field
- SkiNet's Gear Finder
- Title field has ski model name
- Word "ski" nowhere to be found
- Search for "k2 skis" - no results
- Should be: "[make] [model] ski"
- e.g. "K2 Apache Recon ski"
Internal Search - Accuracy
- "ski" vs "skis" vs "skiing"
- Porter Stemmer module
- Breaks search terms down to root form
- e.g. "skis" becomes "ski"
Internal Search - Configuration
- Standard search configuration
- Taxonomy weighting
- Search Config module
- Limit indexing:
- Works pretty well
Internal Search - Issues
- Limited control on search
- All words handled the same
- Can't limit based on specific fields
Internal Search - Step Forward
- Faceted Classification
- Each content type field selectable
- e.g. product color, book publication date, etc
- Becoming defacto standard..
Internal Search - Target.com
Internal Search - Target.com
Internal Search - Solution 1
Internal Search - Larger Problems
Internal Search - Larger Problems
- Faceted Search module very database intensive
- Very slow
- Solution:
- Separate search to external system
- Lots of options...
Internal Search - Solution 2
- LuceneAPI module
- Engine written in PHP
- Requires Zend Framework
- Advanced syntax, facets.. lots of good stuff
- Good solution for medium-sized sites
- May not be suitable for shared hosts, but perfect for going beyond the core search engine without getting involved with Java engines.
Internal Search - Solution 3
- Apache Solr module
- Dries uses it!
- Acquia uses it!
- Drupal.org uses it!
- Bonnier uses it! :-)
Internal Search - ApacheSolr
- Lucene in Java
- Separate to another server
- Use same infrastructure with other sites
- Keep Java developers employed ;-)
- Facets, sorting, related content block, multi-site (soon)..
- Best solution for large sites
External Search - Yay Drupal
External Search - Analytics