After launching UTM, I quickly concluded that I needed tools to parse the UTM log files into a format that could be effectively and efficiently analyzed, and that a SQL database was the obvious choice. All of my experience since then has only confirmed that decision.
Nearly anyone who manages a security device in a professional setting is expected, or more likely required, to collect and use the available log data to ensure that all available information is being used effectively to accomplish security goals. Without a database to analyze that data, the system administrator is flying blind. The raw logs from UTM are too complex to effectively interpret using only manual inspection, and IView lacks the granularity that I think is necessary. After refining my toolkit over several years, it seemed only right to share my work with the community.
I am currently using this toolkit to parse these logs:
- Firewall
- Application Control
- Advanced Persistent Threat
- Intrusion Protection System
- User Authentication
- Webserver Protection
- Web Filter
This solution is not automated -- a real person has to download the logs, extract the zipped file to disk, and then run the SQL Stored Procedures. But I have scripted the 7-zip extract and the SQL load process, and the processes are pretty fast, so it is not burdensome.
The solution bellow is based on Microsoft SQL Server. I used a fully licensed version, but I believe all of the described features are available in the free SQL Express package. A 64-bit implementation is recommended to ensure that you can parse log files larger than 2GB.
I hope it helps some of you as much as it has helped me.
The SQL Option for Text Parsing: OpenRowSet
The most important discovery was this SQL Feature:
OPENROWSET (BULK ‘textfile’ FORMATFILE=’xmlfile’)
This command causes a text file to behave like a read-only SQL Table. It also has very high processing speeds, even with the extensive string parsing needed for this application. The format file defines the layout of the text file. Because all of the important UTM log files follow a similar format, only one XML definition is needed.
<?xml version="1.0"?>
<BCPFORMAT
xmlns="schemas.microsoft.com/.../format"
xmlns:xsi="www.w3.org/.../XMLSchema-instance" >
<RECORD>
<FIELD ID="A1" xsi:type="CharTerm" TERMINATOR=" " />
<FIELD ID="A2" xsi:type="CharTerm" TERMINATOR=" " />
<FIELD ID="A3" xsi:type="CharTerm" TERMINATOR=" " />
<FIELD ID="A4" xsi:type="CharTerm" MAX_LENGTH="7998" TERMINATOR="\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="A1" NAME="EventTime" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="A2" NAME="ProxyName" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="A3" NAME="LogFunction" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="A4" NAME="Message1" xsi:type="SQLVARYCHAR" LENGTH="7998" />
</ROW>
</BCPFORMAT>
The first section of the file lays out the boundaries between text fields, and the second section of the file gives those text fields a SQL name and datatype. This SELECT statement reads the file.
SELECT A.EventTime, A.ProxyName, A.LogFunction, A.Message1
FROM OPENROWSET (BULK ‘textfile’ FORMATFILE=’xmlfile’) as A
Adding Flexibility and Keys
The downside to the command is that OPENROWSET requires string literals for the file names. Consequently, the files must either be placed in a hard-coded location, or a dynamic SQL command must be constructed to convert a file variable into a text constant before the command can be executed.
Additionally, we need to preserve original order, which SQL does not inherently provide. After some false starts, I chose to use a primary key consisting of the file date and a sequence number. To apply the sequence number, I read the text file into a temporary table (indicated by a name starting with “#”). The temporary table contains an IDENTITY field to assign the sequence number. After the text data is loaded into the temporary table, the file date can be extracted from the timestamp of the first data record. I also chose to convert the timestamp into a SQL DATETIME field rather than preserving it as text.
The code fragment below combines all of those features. The stored procedure takes a parameter of the filename to be used. The temporary table is declared, the dynamic SQL statement is used to load the text file into the temporary table, and the file date is extracted in preparation for subsequent processing into a permanent table.
CREATE PROCEDURE [dbo].[Load_Logfile]
@datafile varchar(250)
AS
BEGIN
CREATE TABLE [#TmpSysLog](
[FileSeq] [int] IDENTITY(1,1) NOT NULL,
[EventTime] [varchar](50) NULL,
[ProxyName] [varchar](50) NULL,
[LogFunction] [varchar](50) NULL,
[Message1] [varchar](7999) NULL,
CONSTRAINT [PK_TmpSysLog1] PRIMARY KEY CLUSTERED ( [FileSeq] ASC )
)
set @sqltext3 =' insert into #TmpSysLog select a.* FROM OPENROWSET( BULK ''' + @datafile + ''', ' + ' FORMATFILE = ''\\storage1.bayviewphysicians.com\software\Sophos UTM\weblog1.xml'' ) AS a '
exec(@sqltext3)
select top 1 @sqltext1=eventtime from #TmpSysLog
set @FileDate = cast( substring(@sqltext1,1,4) + '-' + substring(@sqltext1,6,2) + '-' + substring(@sqltext1,9,2) + ' ' + substring(@sqltext1,12,8) as datetime)
… remainder of the stored procedure is not shown at this time.
Parsing the Message field
Nearly all of the interesting log files have message content as a list of space-delimited tokens of the form
keyword1=”value1” keyword2=”value2”
To make this format useful, we need a function to extract the value for a keyword. I created a “Scalar-valued” function for this purpose:
dbo.ParseLog2(a.Message1,keyword,maxlength)
(I will show the code for this function later.) The function looks for the character sequence keyword=”value” and returns a string result containing the value. If it is not found, it returns a null.
SQL is unforgiving if a data field assignment is too long, so I also use the maxlength parameter ensure that the returned string will fit in the permanent data table.
The next step is to identify all of the expected keywords, create a table with those fields declared, and create an insert statement that uses the ParseLog2 function to supply the desired values. A full example procedure appears later in this document.
For most of the log files, I used visual inspection to identify the keywords to be extracted. For the Web Proxy log file, Sophos has provided a complete list in this KB article:
https://community.sophos.com/kb/en-us/126660
The next section explains an approach for finding overlooked keywords.
Anticipating Data surprises
The log file may contain data that you do not expect. For this purpose, I found it necessary to create a Comment field to contain any text that was unused after all of the expected fields had been parsed. The solution to this was a “reverse parse” function, which returns everything except the indicated keyword=value phrase. The logic looks something like this
Set @Comment = @Message1;
set @Comment = dbo.ParseLogRev(@Comment, 'id');
set @Comment = dbo.ParseLogRev(@Comment, 'severity');
set @Comment = dbo.ParseLogRev(@Comment, 'sys');
… and so forth…
To compute the value for @Comment, a SQL Cursor must be used to examine each record of the temporary table one at a time. The @Comment variable is computed first, then included in the field list when the current record is inserted in the permanent table.
Some of the logs have unformatted status messages interspersed in the log data. For these entries, only the header fields and the Comment field will be non-null. To review the parsed data for omitted data fields or other special conditions, you can simply query for records where Comment > ‘’
Finished Procedure
Below is the complete function for parsing a single Firewall log file and loading it into a SQL table. I have chosen the Firewall log because the field list is relatively short and the file is free of other complications. For brevity, I do not show the table definition, but it should be easy to infer from the code below.
CREATE PROCEDURE [dbo].[LoadFirewallLog3]
@datafile varchar(250)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
Declare @EventTm varchar(30);
Declare @ProxyName varchar(15);
DECLARE @LogFunction varchar(18);
Declare @Message varchar(1000);
declare @Comment varchar(8000);
Declare @FullMessage varchar(2000);
Declare @EventTime DateTime;
Declare @sqltext1 varchar(999)
Declare @sqltext3 varchar(999);
Declare @FileDate datetime;
Declare @FileSeq int;
declare @PriorSeq int;
DECLARE @SeqNum int;
SET @SeqNum = 0;
DECLARE @MainCnt int, @DetlCnt int, @ErrorCnt int, @MoreCnt int;
Set @MainCnt = 0;
Set @DetlCnt = 0;
Set @ErrorCnt = 0;
Set @MoreCnt = 0;
set @PriorSeq = 0;
CREATE TABLE [#TmpSysLog](
[FileSeq] [int] IDENTITY(1,1) NOT NULL,
[EventTime] [varchar](50) NULL,
[ProxyName] [varchar](50) NULL,
[LogFunction] [varchar](50) NULL,
[Message1] [varchar](7999) NULL,
CONSTRAINT [PK_TmpSysLog1] PRIMARY KEY CLUSTERED ( [FileSeq] ASC )
)
set @sqltext3 =' insert into #TmpSysLog select a.* FROM OPENROWSET( BULK ''' + @datafile + ''', ' +
' FORMATFILE = ''\\storage1.bayviewphysicians.com\software\Sophos UTM\weblog1.xml'' ) AS a '
exec(@sqltext3)
select top 1 @sqltext1=eventtime from #TmpSysLog
set @FileDate = cast( substring(@sqltext1,1,4) + '-' + substring(@sqltext1,6,2) + '-' + substring(@sqltext1,9,2) + ' ' + substring(@sqltext1,12,8) as datetime)
DECLARE Firewallxml CURSOR LOCAL FOR
select EventTime, ProxyName, LogFunction, Message1, FileSeq
FROM [#TmpSysLog] AS a
OPEN Firewallxml;
FETCH next from Firewallxml INTO @EventTm, @ProxyName, @LogFunction, @FullMessage, @FileSeq;
WHILE @@FETCH_STATUS = 0
BEGIN
set @EventTime = cast( substring(@EventTm,1,4) + '-' +
substring(@EventTm,6,2) + '-' + substring(@EventTm,9,2) + ' ' +
substring(@EventTm,12,8) as datetime);
Set @MainCnt = @MainCnt + 1;
Set @Comment = @FullMessage;
set @Comment = dbo.ParseLogRev(@Comment, 'id');
set @Comment = dbo.ParseLogRev(@Comment, 'severity');
set @Comment = dbo.ParseLogRev(@Comment, 'sys');
set @Comment = dbo.ParseLogRev(@Comment, 'sub');
set @Comment = dbo.ParseLogRev(@Comment, 'name');
set @Comment = dbo.ParseLogRev(@Comment, 'action');
set @Comment = dbo.ParseLogRev(@Comment, 'fwrule');
set @Comment = dbo.ParseLogRev(@Comment, 'initf');
set @Comment = dbo.ParseLogRev(@Comment, 'srcmac');
set @Comment = dbo.ParseLogRev(@Comment, 'dstmac');
set @Comment = dbo.ParseLogRev(@Comment, 'srcip');
set @Comment = dbo.ParseLogRev(@Comment, 'dstip');
set @Comment = dbo.ParseLogRev(@Comment, 'proto');
set @Comment = dbo.ParseLogRev(@Comment, 'length');
set @Comment = dbo.ParseLogRev(@Comment, 'tos');
set @Comment = dbo.ParseLogRev(@Comment, 'prec');
set @Comment = dbo.ParseLogRev(@Comment, 'ttl');
set @Comment = dbo.ParseLogRev(@Comment, 'type');
set @Comment = dbo.ParseLogRev(@Comment, 'code');
set @Comment = dbo.ParseLogRev(@Comment, 'srcmac');
set @Comment = dbo.ParseLogRev(@Comment, 'dstmac');
set @Comment = dbo.ParseLogRev(@Comment, 'srcport');
set @Comment = dbo.ParseLogRev(@Comment, 'dstport');
set @Comment = dbo.ParseLogRev(@Comment, 'tcpflags');
set @Comment = dbo.ParseLogRev(@Comment, 'mark');
set @Comment = dbo.ParseLogRev(@Comment, 'app');
set @Comment = dbo.ParseLogRev(@Comment, 'outitf');
set @Comment = dbo.ParseLogRev(@Comment, 'threatname');
set @Comment = dbo.ParseLogRev(@Comment, 'status');
set @Comment = dbo.ParseLogRev(@Comment, 'host');
set @Comment = dbo.ParseLogRev(@Comment, 'url');
INSERT INTO dbo.FirewallLog (EventTime, ProxyName, LogFunction,
ItmId, ItmSeverity, ItmName, ItmAction,
ItmFwrule, ItmInitf, ItmSrcmac, ItmDstmac,
ItmSrcIP, ItmDstIP, ItmProto,
ItmLength, ItmTos, ItmPrec, ItmTtl, ItmType, ItmCode,
ItmSrcPort, ItmDstPort, ItmTcpflags,
Comment, ItmMark, ItmAppId, ItmOutItf, FileDate, FileSeq,
ItmThreatName, ItmStatus, ItmHost, ItmUrl)
values
(@EventTime, @ProxyName, @LogFunction,
dbo.ParseLog2(@FullMessage, 'id',4),
dbo.ParseLog2(@FullMessage, 'severity',5),
dbo.ParseLog2(@FullMessage, 'name',75),
dbo.ParseLog2(@FullMessage, 'action',20),
dbo.ParseLog2(@FullMessage, 'fwrule',5),
dbo.ParseLog2(@FullMessage, 'initf',6),
dbo.ParseLog2(@FullMessage, 'srcmac',17),
dbo.ParseLog2(@FullMessage, 'dstmac',17),
dbo.ParseLog2(@FullMessage, 'srcip',40),
dbo.ParseLog2(@FullMessage, 'dstip',40),
dbo.ParseLog2(@FullMessage, 'proto',3),
case when dbo.ParseLog2(@FullMessage, 'length',10) > '' then
cast( dbo.ParseLog2(@FullMessage, 'length',10) as int)
else null end,
dbo.ParseLog2(@FullMessage, 'tos',4),
dbo.ParseLog2(@FullMessage, 'prec',4),
case when dbo.ParseLog2(@FullMessage, 'ttl',10) > '' then
cast( dbo.ParseLog2(@FullMessage, 'ttl',10) as int) else null end,
case when dbo.ParseLog2(@FullMessage, 'type',10) > '' then
cast( dbo.ParseLog2(@FullMessage, 'type',10) as int) else null end,
case when dbo.ParseLog2(@FullMessage, 'code',10) > '' then
cast( dbo.ParseLog2(@FullMessage, 'code',10) as int) else null end,
dbo.ParseLog2(@FullMessage, 'srcport',6),
dbo.ParseLog2(@FullMessage, 'dstport',6),
dbo.ParseLog2(@FullMessage, 'tcpflags',40),
@Comment,
dbo.ParseLog2(@FullMessage, 'mark',10),
dbo.ParseLog2(@FullMessage, 'app',10),
dbo.ParseLog2(@FullMessage, 'outitf',6),
@FileDate, @FileSeq,
dbo.ParseLog2(@FullMessage, 'threatname',75),
dbo.ParseLog2(@FullMessage, 'status',1),
dbo.ParseLog2(@FullMessage, 'host',255),
dbo.ParseLog2(@FullMessage, 'url',255)
);
FETCH next from Firewallxml INTO @EventTm, @ProxyName,
@LogFunction, @FullMessage, @FileSeq;
END
CLOSE Firewallxml;
DEALLOCATE Firewallxml;
END
Specific Issues in the Files
Most log files have data of the form sys=”value1” sub=”value2”. My inspection concluded that within a file, these values were fixed, so I chose to discard them.
The APT log and the Application Control Log use essentially the same log format as the Firewall log, so I have implemented the extra data fields in both the procedure and the related table so that all three logs can be populated using a single procedure and a single data table.
The web proxy log file has a small number of entries with function code starting “URID”, which contain “Comment” data about early DNS timeouts. I only found them because of this parsing process. I have been advised that they can probably be ignored.
For the web proxy log, I actually do a three-step process. For many analyses of web data, I wanted to have the Protocol, FQDN, Path, and Querystring as separate fields. Step one loads the log file into the temporary table. Step two parses the Message string in the temporary table to produce distinct values in an intermediate table. Step three parses the intermediate table to split each URL into four separate fields. Some other optimizations are applied during stage 3, to reduce space requirements.
First the raw data is loaded into a temporary table as indicated above, then the temporary data is parsed into an intermediate table.
The web proxy log file paired code-description information: AppID and Application, Category and CategoryName, ID and Name.
The WAF log file has multiple complications which I mentioned in a separate forum post:
- Some records are continued because they are too long to fit in one record
- Security records follow a different pattern, using tokens delimited by brackets and some prose between the bracketed tokens.
Because of these differences, the WAF log parsing required additional custom logic. I loaded the normal data and the security data into two different tables, then created multiple views for inspecting the different versions of security records. Since this document is introductory, I am not including anything that adds complexity.
Notes on File Relationships
When IPS blocks a web response, the block appears in the IPS log immediately, but the Web Proxy log captures the problem as a timeout condition which can be as much as two minutes later than the IPS event timestamp.
The Firewall log only captures IP addresses, and Reverse DNS on the IP address does not always provide a meaningful result. The Web Proxy log is a treasure of Forward DNS results, and can be used to provide meaningful DNS names for some of the entries in the Firewall log.
If using Country Blocking, it is recommended to deploy Transparent Web Proxy globally. This ensures that the blocks will appear in the Web Proxy log with a URL, rather than in the Firewall log, without any DNS information.
Supporting Procedures
This is the function to search for a keyword=”value” phrase, and return value if found.
CREATE Function [dbo].[ParseLog2](
@msg1 varchar(7999),
@itm varchar(999),
@ItmSize int = 8000
) returns varchar(8000)
as
BEGIN
declare @msg varchar(8000);
declare @tmpstr varchar(8000);
declare @tmpstr1 varchar(8000);
declare @tmppos int;
if @ItmSize <= 0 or @ItmSize > 8000
BEGIN
SET @ItmSize = 8000
END
set @msg = ' ' + @msg1
Set @tmpstr = '% ' + @itm + '="%'
Set @tmppos = patindex(@tmpstr,@msg)
If @tmppos = 0
Begin
Set @tmpstr1 = NULL
End
Else
Begin
Set @tmpstr1 = Substring(@msg, @tmppos + Len(@tmpstr) - 2 ,7999)
Set @tmppos = Patindex('%"%',@tmpstr1)
if @tmppos > 0
begin
set @tmpstr1 = Left(@tmpstr1, @tmppos - 1)
end
if len(@tmpstr1) > @ItmSize
BEGIN
SET @tmpstr1 = LEFT(@tmpstr1,@ItmSize)
END
End
Return @tmpstr1
End
GO
This is the procedure to discard a keyword=”value” phrase, in search of anything left after the known fields have been parsed.
ALTER Function [dbo].[ParseLogRev](
@msg1 varchar(7999),
@itm varchar(999)
) returns varchar(7999)
as
BEGIN
declare @msg varchar(8000);
declare @tmpstr varchar(1001);
declare @tmpstr1 varchar(7999);
declare @tmppos int;
Declare @MsgStub varchar(8000);
set @msg = ' ' + @msg1
Set @tmpstr = '% ' + @itm + '="%'
Set @tmppos = patindex(@tmpstr,@msg)
If @tmppos = 0
Begin
set @MsgStub = @msg1
End
Else
Begin
Set @MsgStub = Substring(@msg, 1, @tmppos - 1)
Set @tmpstr1 = Substring(@msg, @tmppos + Len(@tmpstr) - 2 ,7999)
Set @tmppos = charindex('"',@tmpstr1)
if @tmppos = 0
begin
set @tmppos = len(@tmpstr1) + 1
end
set @MsgStub = Rtrim(@MsgStub) + ' ' + LTrim(Substring(@tmpstr1, @tmppos + 1,7999))
End
Return @MsgStub
End
Finally, this is the procedure to parse the components of a URL: PROTOCOL=Protocol, PATH=FilePath, QUERY=QueryString, Default=FQDN / DNS name. The URLPart is case-insensitive and only the first four characters matter.
CREATE FUNCTION [dbo].[ParseURL] ( @paramURL varchar(5000)='', @param_urlpart varchar(5)) RETURNS varchar(255)
AS
BEGIN
-- Declare the return variable here
DECLARE @tmpURL varchar(5000), @urlprot varchar(50), @urlpath varchar(5000), @tmppos1 int
DECLARE @rslt varchar(5000), @urlquery VARCHAR(5000)
DECLARE @urlpart varchar(5)
set @urlpart = left(upper(@param_urlpart),4)
set @tmpURL = coalesce(@paramURL,'')
set @tmppos1 = charindex('://',@tmpURL,2)
if @tmppos1 > 1
begin
set @urlprot = left(@tmpURL,@tmppos1-1);
set @tmpURL = substring(@tmpURL,@tmppos1+3,5000);
end
else
begin
set @urlprot = ''
end
set @tmppos1 = charindex('?',@tmpURL,2)
if @tmppos1 > 1
begin
set @urlquery = substring(@tmpURL,@tmppos1+1,5000);
set @tmpURL = left(@tmpURL,@tmppos1-1);
end
else
begin
set @urlquery = ''
end
set @tmppos1 = charindex('/',@tmpURL,2)
if @tmppos1 > 1
begin
set @urlpath = substring(@tmpURL,@tmppos1+1,5000);
set @tmpURL = left(@tmpURL,@tmppos1-1);
end
else
begin
set @urlpath = ''
end
IF @urlpart='PROT'
set @rslt = @urlprot
ELSE IF @urlpart='PATH'
set @rslt = @urlpath
ELSE IF @urlpart='QUER'
set @rslt = @urlquery
ELSE
set @rslt = @tmpURL
RETURN LEFT(@rslt,255)
END
This thread was automatically locked due to age.