[Yanel-dev] crawler
Michael Wechner
michael.wechner at wyona.com
Tue Feb 27 14:14:08 CET 2007
Josias Thöny wrote:
> Hi,
>
> I've had a look at the crawler of lenya 1.2, and it seems that a few
> features are missing:
>
> basic missing features:
> - download of images
> - download of css
> - download of scripts
> - link rewriting
> - limits for max level / max documents
>
> advanced missing features:
> - handling of frames / iframes
> - tidy html -> xhtml
> - extraction of body content
> - resolving of links in css (background images etc.)
>
> Or am I misunderstanding something...?
no ;-)
>
> IMHO some of these features are quite essential, because we want to
> use the crawler in yanel to import the complete pages with images and
> everything, not only text content.
>
> The question is now, does it make sense to implement the missing
> features into that crawler, or should we look for an alternative?
sure, if there is an alternative :-) Is there?
Thanks
Michi
>
> Josias
>
> _______________________________________________
> Yanel-development mailing list
> Yanel-development at wyona.com
> http://wyona.com/cgi-bin/mailman/listinfo/yanel-development
>
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
michael.wechner at wyona.com michi at apache.org
+41 44 272 91 61
More information about the Yanel-development
mailing list