Searching Drupal
Damien McKenna
Searching Drupal
- Taken for granted
- Assumption that "it'll just work"
But first.. a story
- Hired by Bonnier for 3 month Drupal 5 project
- Migrate from proprietary Java CMS
- Short development cycle
- Made compromises
- Made assumptions
A story, continued
- Grand assumption..
- "Search will work good enough"
- "Tweak later"
A story, continued
- Put another way...
- "Search will work"
A story, continued
- Launched site
- Seemed OK, could find results
- Complaints of search missing content
- Total 57,000 nodes - articles, images, etc
- Only 7,000 nodes indexed
A story, continued
- D5 search engine indexing flawed
- Indexing tracks last timestamp, last nid, last comment timestamp...
- If data converted, strong chance of missing some
- Out of 57,000 nodes..
- Only indexed about 7,000!
Moral of the story
- Don't assume it'll work!
- Take an hour,
a few small tweaks go a long way
Two parts of search: Internal
- Search when already on the site
- Note: Only node content
Two parts of search: External
- Search from outside
- Google, Bing, yadda
Two parts of search: Good news!
Search Basics
- Logical content hierarchy
- Body structure - h1, h2, h3, etc
- Lots of fiddly bits
- Drupal SEO book
Internal Search Basics
- search.module
- Title field most important
- Each node element given different weight
- Only node fields considered
External Search Basics
- Page title super important
- Considers everything on page:
- Lots of trick
Internal Search - Tip
- Put most important words in node Title
- SkiNet.com products
- Search for "k2 skis" - no results
- Title field has ski model name
- Word "ski" nowhere to be found
- Should be: "[make] [model] ski"
- e.g. "K2 Apache Recon ski"
Internal Search - Accuracy
- "ski" vs "skis" vs "skiing"
- Porter Stemmer module
- Breaks search terms down to root form
- e.g. "skis" becomes "ski"
Internal Search - Configuration
- admin/settings/search
- Number to index at a time
- Minimum word length
- Content weighting
Internal Search - Improving Config
- Search Config module
- Control Advanced Search fields
- Hide vocabs filters
- Hide content types
- Disable indexing content types
- Works pretty well
Internal Search - Step Forward
- Faceted Classification
- Each content type field selectable
- e.g. product color, book publication date, etc
- Becoming defacto standard..
Internal Search - Target.com
Internal Search - Target.com
Internal Search - Problems
- Limited control on search
- Won't work:
- apple AND orange
- apple OR orange
- apple AND (orange OR banana)
- No facets
Internal Search - Solution 1
Internal Search - Problems
Internal Search - Problems
- Faceted Search module very database intensive
- Very slow
- Solution:
- Separate search to external system
- Lots of options...
Internal Search - Solution 2
- LuceneAPI module
- Tremendous power
- Simple to install
- All PHP, no crazy extras
- Best option for most sites
Internal Search - LuceneAPI
- Sorting
- Facets
- CCK fields
- Content type
- Taxonomy
- "More like this"
- @cpliakas is awesome!
Internal Search - Two bugs to note
LuceneAPI Installation
LuceneAPI Configuration
- Replace search box
- Minimum word length
- Words to ignore
- Error logging
- Advanced: file permissions
LuceneAPI Content Settings
- admin/settings/luceneapi_node
- Results per page
- Default: AND vs OR
- Tab name
- Hide core search (!!!)
- Exclude content types
- Node access
- Language support
LuceneAPI Content Settings 2
- Performance tab
- "Optimize" button
- Optimize after cron runs
- Caching
- Caching threshold
- Cache max size
- Number to index
- Memory limits
LuceneAPI Content Settings 3
- Content bias!!!
- Change importance level for
- Body, title, author, terms, comment text, HTML tags, sticky, promoted, content type
- Good stuff!!
- No craziness
LuceneAPI Index
LuceneAPI Content Settings 4
- Facets..
- Vocabularies
- Author
- Content type
- Display order
- Tweaks
LuceneAPI Content Settings 5
- More Like This..
- Number of items
- Word length
- Fields to work from
- Exclude content types
LuceneAPI Content Settings 6
- Did You Mean..
- admin/settings/luceneapi_dym
- Performance settings
Internal Search - Solution 3
Internal Search - ApacheSolr
- Lucene in Java
- Separate to another server
- Keep Java developers employed ;-)
- Lots of what LuceneAPI has
- Requires more infrastructure
- Only for VPS / own server(s)
External Search - Yay Drupal
External Search - Analytics