[Yanel-dev] Boost access log file format

Michael Wechner michael.wechner at wyona.com
Tue Jun 29 10:09:36 CEST 2010


Cedric Staub wrote:
> Hello there
>
>   
>> Cedric pointed out to me that it might make sense to have a different 
>> format for the boost access log. It currently reads
>>
>> Date, log-level, URL, realm-id, boost-cookie, user (if available), 
>> Referer, User-Agent, E-Mail (if available)
>>     
>
> A single log entry currently looks like this:
> ---
> http://www.example.org/index.html r:example c:YA-1 ref:null
> ua:Mozilla/4.0 (example)
> ---
>
> While the field:value pairs are actually quite useful and make parsing
> a log entry easier, the problem is that the values in the fields (e.g.
> the user agent) can also contain colons/spaces, making it in certain
> cases impossible to tell where a field starts or ends. 
>
> I suggest we escape the colons and spaces, e.g. by using url encoding.
> This can easily be done using java.net.URLEncoder on the logging side
> and java.net.URLDecoder and the parsing side.
>
> I would also suggest that we add "url:" to the front of the url in
> order to make parsing more robust, right now the parser just assumes
> that the url is the first field. Then we could change the format later
> (e.g. moving the url to another position) without breaking the parser.
>   

sounds good

> Then we'd just have to make sure any module that appends data actually
> uses url encoding and doesn't just print it verbatim. Maybe we can add
> a convenciance function somewhere to make this easier?
>   

you mean for something like email:michi at wyona.org added by the contact 
form resource?

> Anyway, I will try to reply with a patch implementing this shortly.
>   

Looking forward to your patch :-)

Btw, you might want to consider encryption and decryption of the log 
file. The reason I am saying this
is because of privacy, etc.

Thanks

MIchi
> Greetings
> Cedric
>   



More information about the Yanel-development mailing list