Imagine that you want to check your WAF logs to figure out if you have any false positives that need to be corrected or any chronic offenders that need to be blacklisted. Doing so is a nightmare.
There are three types of WAF log records:
- Regular log Entries that are similar to Web Proxy log files, with this format:
- Standard Header with space delimiters:
- Timestamp
- UTMName
- Function name (reverseproxy)
- Message details, using a space delimited list with entries of the form keyword=”value”
- Standard Header with space delimiters:
- Detail entries, with a different format
- Standard Header (same format as above)
- Message details, with some space-delimited tokens of the form [keyword value], mixed in with undelimited text strings
- Continuation lines, which are inserted whenever one of the lines is too long. These can be identified because the beginning of the line does not look like a timestamp. (I have read that this happens at 1000 characters.) Because the Regular log entries include cookie text, and detail lines can contain OWASP details, continuation lines are pretty common.
The logs need to be parsed in sequence, because the continuation lines need to be appended to the lines that they precede whenever they appear. Depending on the data, multiple continuation lines are possible.
The Detail entries lack important information, so they must logically connected to the previous Regular Log Entry. There can be multiple detail entries for one Regular Log entry. After consolidating the continuation lines, I loaded the file into two SQL tables, with a one-to-many relationship between the Regular line and the Detail lines. There is no unique identifier, so I use the SQL NEWID() function to generate a unique identifier that can be used as a foreign key to link the data tables.
I am not going to elaborate on the Regular lines, because of the similarities to the web log. Custom code is needed to parse them, but they at least follow a pattern that makes parsing with custom code possible.
The detail lines are a different story altogether. After the standard header, the Detail lines have a second header in this format
- [timestamp] (essentially the same data, but with different format, than the Standard Header timestamp)
- [keyword1:keyword2] (an indication of the alarm type)
- [pid 99999] (the process id, where 99999 is a process number, 4 or 5 digits long in my sample data)
After the second header, the contents become pretty random, and can include the following sequence of information:
- Sometimes: A first message text, with embedded spaces and delimited by spaces. In the examples that I have seen, there is a trailing colon before the finaling space character.
- Sometimes: [client 999.999.999.999:9999], to show the client IP and Port. The word client acts as an identifying token.
- Sometimes: [username], the name of the user, without an identifying token, or [99999], the pid number repeated without an identifying token, sometimes [-], whose significance I have not ascertained, and sometimes a code of the form AH99999: The AH appears to be a constant, the number seems to be exactly 5 digits. It seems to represent an error code.
- Finally: another message text, undelimited, which may contain embedded tokens of the form [keyword “value”] Note that although these token values are quote-delimited, the previous tokens values for pid and client were not quoted.
This is based on a small sample. I parsed 6 days of data, identified the 12 different [keyword1:keyword2] pairs in the data, and then examined one log entry of each type.
My conclusion is that we have a log file format which is unreadable to the human eye, and also unreadable to the computer. One would be inclined to think that it was designed to be unusable, because it is hard to imagine anything being this bizarre by accident.
Of course, if you are subject to most any regulatory scheme, you are probably under a requirement to read or parse these log files at least daily, to verify that the defenses are tuned optimally working correctly, and defending successfully.
This thread was automatically locked due to age.