Skip to content

Most relevant search results #142

@IanTrudel

Description

@IanTrudel

How would you implement most relevant search results? The follow code is using sort_by with some success but is somehow limited. Trying to use sort to have two elements to compare is difficult because it returns two Allocation containing terms. No idea what to do with those.

Perhaps Picky already has something for that? In any case, I would like to sort results by most relevant, then uri, then body. It would be useful to be able to do some kind of sort { |a, b| ... } to compared two elements.

The code below prioritize the uri containing the term. Could be an interesting example for the manual #140.

require("picky")
require("open-uri")
require("net/https")
require("pp")

Picky.logger = Picky::Loggers::Silent.new

doc = open(
      "https://raw.githubusercontent.com/Shoes3/shoes3/master/static/manual-en.txt", 
      {ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE}
   ).read

Entry = Struct.new :id, :uri, :body

entry = nil
entries = []
doc.each_line { |n|
   if n =~ /^=+ (.*) =+$/
      entry = Entry.new(entries.size, n.strip, "")
      entries << entry
   elsif n.strip.size > 0 and not entry.nil?
      entry.body += n
   end
}

index = Picky::Index.new :terms do
   indexing removes_characters: %r{[^a-z0-9\s\/\-\_\:\"\&\.]}i,
      splits_text_on:     %r{[\s/\-\_\:\"\&/\.]}
   category :uri, :from => lambda { |doc| doc.uri.dup }
   category :body, :from => lambda { |doc| doc.body.dup }
end
search = Picky::Search.new index do
   searching removes_characters: %r{[^a-z0-9\s\/\-\_\:\"\&\.]}i,
      splits_text_on:     %r{[\s/\-\_\:\"\&/\.]}
end

puts "total entries #{entries.size}"
entries.each { |n| index.add n }

term = "image"

retval = []
results = search.search(term, entries.size, 0)
results.sort_by { |id| entries.detect { |n| n.id == id }.uri =~ /#{term}/ ? 0 : id }
results.ids.each do |id|
   entry = entries.detect { |n| n.id == id }
   retval << entry.uri unless entry.nil?
end
puts "total retval #{retval.uniq.size}"
pp retval.uniq

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions