[Yanel-dev] Enhancing Yarep Indexer/Searcher interface

Thu Aug 6 12:53:13 CEST 2009

Hi!

Michael Wechner schrieb:
> Hi
> 
> At the moment one has the following searcher interface
> 
> Node[] Repository.getSearcher().search(QUERY);

Does QUERY includes pagination ATM?

> and with the index the PATH/URL is saved and the FULLTEXT is indexed.
> 
> This is all nice and simple, BUT  .... ;-)
> 
> In most common search engines one receives the following search result 
> structure:
> 
> Title of Document
> Excerpt of Document
> Path/URL of Document
> Mime-Type of Document
> Last Modified of Document

What would be the types there? String for most probably, maybe a 
java.lang.Long timestamp for LMD?

> which means if we also want to provide this, then we need to reparse 
> each Node which has been found, which is not
> so nice (performance wise and also code wise).
> 
> Hence I would suggest that we enhance the Indexer/Searcher interface by 
> adding the fields above and introducing methods like
> 
> Result[] Repository.getSearcher().search(QUERY)

If QUERY does not include paging there, we'd better return a 
java.lang.Iterable<Result> to make the API easier to use (e.g. with a 
Java 5 for loop) but mostly to be able to load the results lazily if 
needed. We would also need a startIndex and maxCount, even if we do not 
implement them at once.

> whereas Result has methods like
> 
> Result.getTitle()
> Result.getExcerpt()
> etc.

We could also use:
String Result.getMetadata(String aDublinCoreOrWhateverRDFpropertyURI)
...or maybe both: the hard-coded ones because API users will most 
probably need them, and the generic one for extensibility?

> WDYT?
> 
> Thanks
> 
> Michi

HTH,
    Guillaume