[Yanel-dev] crawler
Josias Thöny
josias.thoeny at wyona.com
Mon Feb 26 23:35:33 CET 2007
Hi,
I've had a look at the crawler of lenya 1.2, and it seems that a few
features are missing:
basic missing features:
- download of images
- download of css
- download of scripts
- link rewriting
- limits for max level / max documents
advanced missing features:
- handling of frames / iframes
- tidy html -> xhtml
- extraction of body content
- resolving of links in css (background images etc.)
Or am I misunderstanding something...?
IMHO some of these features are quite essential, because we want to use
the crawler in yanel to import the complete pages with images and
everything, not only text content.
The question is now, does it make sense to implement the missing
features into that crawler, or should we look for an alternative?
Josias
More information about the Yanel-development
mailing list