This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HOW TO: Using a SQL Database to Interpret UTM Log Files

After launching UTM, I quickly concluded that I needed tools to parse the UTM log files into a format that could be effectively and efficiently analyzed, and that a SQL database was the obvious choice.   All of my experience since then has only confirmed that decision.  

Nearly anyone who manages a security device in a professional setting is expected, or more likely required, to collect and use the available log data to ensure that all available information is being used effectively to accomplish security goals.   Without a database to analyze that data, the system administrator is flying blind.   The raw logs from UTM are too complex to effectively interpret using only manual inspection, and IView lacks the granularity that I think is necessary.   After refining my toolkit over several years, it seemed only right to share my work with the community.

I am currently using this toolkit to parse these logs:

  • Firewall
  • Application Control
  • Advanced Persistent Threat
  • Intrusion Protection System
  • User Authentication
  • Webserver Protection
  • Web Filter

This solution is not automated -- a real person has to download the logs, extract the zipped file to disk, and then run the SQL Stored Procedures.   But I have scripted the 7-zip extract and the SQL load process, and the processes are pretty fast, so it is not burdensome.

The solution bellow is based on Microsoft SQL Server.   I used a fully licensed version, but I believe all of the described features are available in the free SQL Express package.   A 64-bit implementation is recommended to ensure that you can parse log files larger than 2GB.

I hope it helps some of you as much as it has helped me.

The SQL Option for Text Parsing:  OpenRowSet

The most important discovery was this SQL Feature:

OPENROWSET (BULK ‘textfile’ FORMATFILE=’xmlfile’)

This command causes a text file to behave like a read-only SQL Table.   It also has very high processing speeds, even with the extensive string parsing needed for this application.   The format file defines the layout of the text file.   Because all of the important UTM log files follow a similar format, only one XML definition is needed.

<?xml version="1.0"?>

<BCPFORMAT

xmlns="schemas.microsoft.com/.../format"

xmlns:xsi="www.w3.org/.../XMLSchema-instance" >

 <RECORD>

<FIELD ID="A1" xsi:type="CharTerm" TERMINATOR=" " />

<FIELD ID="A2" xsi:type="CharTerm" TERMINATOR=" " />

<FIELD ID="A3" xsi:type="CharTerm" TERMINATOR=" " />

<FIELD ID="A4" xsi:type="CharTerm" MAX_LENGTH="7998" TERMINATOR="\n" />

 </RECORD>

 

 <ROW>

<COLUMN SOURCE="A1" NAME="EventTime" xsi:type="SQLVARYCHAR" />

<COLUMN SOURCE="A2" NAME="ProxyName" xsi:type="SQLVARYCHAR" />

<COLUMN SOURCE="A3" NAME="LogFunction" xsi:type="SQLVARYCHAR" />

<COLUMN SOURCE="A4" NAME="Message1" xsi:type="SQLVARYCHAR" LENGTH="7998" />

 </ROW>

</BCPFORMAT>

 

The first section of the file lays out the boundaries between text fields, and the second section of the file gives those text fields a SQL name and datatype.  This SELECT statement reads the file.

SELECT A.EventTime, A.ProxyName, A.LogFunction, A.Message1
FROM OPENROWSET (BULK ‘textfile’ FORMATFILE=’xmlfile’) as A

Adding Flexibility and Keys

The downside to the command is that OPENROWSET requires string literals for the file names.  Consequently, the files must either be placed in a hard-coded location, or a dynamic SQL command must be constructed to convert a file variable into a text constant before the command can be executed.

Additionally, we need to preserve original order, which SQL does not inherently provide.  After some false starts, I chose to use a primary key consisting of the file date and a sequence number.   To apply the sequence number, I read the text file into a temporary table (indicated by a name starting with “#”).  The temporary table contains an IDENTITY field to assign the sequence number.  After the text data is loaded into the temporary table, the file date can be extracted from the timestamp of the first data record.   I also chose to convert the timestamp into a SQL DATETIME field rather than preserving it as text.  

The code fragment below combines all of those features.   The stored procedure takes a parameter of the filename to be used.  The temporary table is declared, the dynamic SQL statement is used to load the text file into the temporary table, and the file date is extracted in preparation for subsequent processing into a permanent table.

CREATE PROCEDURE [dbo].[Load_Logfile]

       @datafile varchar(250)

AS

BEGIN

CREATE TABLE [#TmpSysLog](

       [FileSeq] [int] IDENTITY(1,1) NOT NULL,

       [EventTime] [varchar](50) NULL,

       [ProxyName] [varchar](50) NULL,

       [LogFunction] [varchar](50) NULL,

       [Message1] [varchar](7999) NULL,

 CONSTRAINT [PK_TmpSysLog1] PRIMARY KEY CLUSTERED ( [FileSeq] ASC )

)

 

set @sqltext3 =' insert into #TmpSysLog select a.* FROM OPENROWSET( BULK ''' + @datafile + ''', ' + ' FORMATFILE = ''\\storage1.bayviewphysicians.com\software\Sophos UTM\weblog1.xml'' ) AS a '

exec(@sqltext3)

 

 

select top 1 @sqltext1=eventtime from #TmpSysLog

set @FileDate =   cast( substring(@sqltext1,1,4) + '-' + substring(@sqltext1,6,2) + '-' + substring(@sqltext1,9,2) + ' ' + substring(@sqltext1,12,8) as datetime) 

 

… remainder of the stored procedure is not shown at this time.

Parsing the Message field

Nearly all of the interesting log files have message content as a list of space-delimited tokens of the form

keyword1=”value1” keyword2=”value2”

To make this format useful, we need a function to extract the value for a keyword.    I created a “Scalar-valued” function for this purpose:

dbo.ParseLog2(a.Message1,keyword,maxlength)

 

(I will show the code for this function later.)  The function looks for the character sequence keyword=”value” and returns a string result containing the value.   If it is not found, it returns a null.

SQL is unforgiving if a data field assignment is too long, so I also use the maxlength parameter ensure that the returned string will fit in the permanent data table.

The next step is to identify all of the expected keywords, create a table with those fields declared, and create an insert statement that uses the ParseLog2 function to supply the desired values.   A full example procedure appears later in this document.

For most of the log files, I used visual inspection to identify the keywords to be extracted.   For the Web Proxy log file, Sophos has provided a complete list in this KB article:

https://community.sophos.com/kb/en-us/126660

The next section explains an approach for finding overlooked keywords.

Anticipating Data surprises

The log file may contain data that you do not expect.   For this purpose, I found it necessary to create a Comment field to contain any text that was unused after all of the expected fields had been parsed.   The solution to this was a “reverse parse” function, which returns everything except the indicated keyword=value phrase.   The logic looks something like this

       Set @Comment = @Message1;

       set @Comment = dbo.ParseLogRev(@Comment, 'id');

       set @Comment = dbo.ParseLogRev(@Comment, 'severity');

       set @Comment = dbo.ParseLogRev(@Comment, 'sys');

       … and so forth…

 

To compute the value for @Comment, a SQL Cursor must be used to examine each record of the temporary table one at a time.   The @Comment variable is computed first, then included in the field list when the current record is inserted in the permanent table.

 

Some of the logs have unformatted status messages interspersed in the log data.  For these entries, only the header fields and the Comment field will be non-null.  To review the parsed data for omitted data fields or other special conditions, you can simply query for records where Comment > ‘’

 

Finished Procedure

Below is the complete function for parsing a single Firewall log file and loading it into a SQL table.  I have chosen the Firewall log because the field list is relatively short and the file is free of other complications.  For brevity, I do not show the table definition, but it should be easy to infer from the code below.

CREATE PROCEDURE [dbo].[LoadFirewallLog3]

       @datafile varchar(250)

AS

BEGIN

       -- SET NOCOUNT ON added to prevent extra result sets from

       -- interfering with SELECT statements.

       SET NOCOUNT ON;

       Declare @EventTm varchar(30);

       Declare @ProxyName varchar(15);

       DECLARE @LogFunction varchar(18);

       Declare @Message varchar(1000);

       declare @Comment varchar(8000);

 

       Declare @FullMessage varchar(2000);

 

       Declare @EventTime DateTime;

 

       Declare @sqltext1 varchar(999)

       Declare @sqltext3 varchar(999);

       Declare @FileDate datetime;

       Declare @FileSeq int;

       declare @PriorSeq int;

 

       DECLARE @SeqNum int;

       SET @SeqNum = 0;

       DECLARE @MainCnt int, @DetlCnt int, @ErrorCnt int, @MoreCnt int;

       Set @MainCnt = 0;

       Set @DetlCnt = 0;

       Set @ErrorCnt = 0;

       Set @MoreCnt = 0;

       set @PriorSeq = 0;

 

CREATE TABLE [#TmpSysLog](

       [FileSeq] [int] IDENTITY(1,1) NOT NULL,

       [EventTime] [varchar](50) NULL,

       [ProxyName] [varchar](50) NULL,

       [LogFunction] [varchar](50) NULL,

       [Message1] [varchar](7999) NULL,

 CONSTRAINT [PK_TmpSysLog1] PRIMARY KEY CLUSTERED ( [FileSeq] ASC )

)

 

set @sqltext3 =' insert into #TmpSysLog select a.* FROM OPENROWSET( BULK ''' + @datafile + ''', ' +

       ' FORMATFILE = ''\\storage1.bayviewphysicians.com\software\Sophos UTM\weblog1.xml'' ) AS a '

exec(@sqltext3)

 

select top 1 @sqltext1=eventtime from #TmpSysLog

set @FileDate =   cast( substring(@sqltext1,1,4) + '-' + substring(@sqltext1,6,2) + '-' + substring(@sqltext1,9,2) + ' ' + substring(@sqltext1,12,8) as datetime) 

 

DECLARE Firewallxml CURSOR LOCAL FOR

select EventTime, ProxyName, LogFunction, Message1, FileSeq

       FROM [#TmpSysLog] AS a

 

OPEN Firewallxml;

FETCH next from Firewallxml INTO @EventTm, @ProxyName, @LogFunction, @FullMessage, @FileSeq;

WHILE @@FETCH_STATUS =  0

BEGIN

       set @EventTime = cast( substring(@EventTm,1,4) + '-' +
              substring(@EventTm,6,2) + '-' + substring(@EventTm,9,2) + ' ' +
              substring(@EventTm,12,8) as datetime);

       Set @MainCnt = @MainCnt + 1;

                                 

       Set @Comment = @FullMessage;

       set @Comment = dbo.ParseLogRev(@Comment, 'id');

       set @Comment = dbo.ParseLogRev(@Comment, 'severity');

       set @Comment = dbo.ParseLogRev(@Comment, 'sys');

       set @Comment = dbo.ParseLogRev(@Comment, 'sub');

       set @Comment = dbo.ParseLogRev(@Comment, 'name');

       set @Comment = dbo.ParseLogRev(@Comment, 'action');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'fwrule');

       set @Comment = dbo.ParseLogRev(@Comment, 'initf');

       set @Comment = dbo.ParseLogRev(@Comment, 'srcmac');

       set @Comment = dbo.ParseLogRev(@Comment, 'dstmac');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'srcip');

       set @Comment = dbo.ParseLogRev(@Comment, 'dstip');

       set @Comment = dbo.ParseLogRev(@Comment, 'proto');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'length');

       set @Comment = dbo.ParseLogRev(@Comment, 'tos');

       set @Comment = dbo.ParseLogRev(@Comment, 'prec');

       set @Comment = dbo.ParseLogRev(@Comment, 'ttl');

       set @Comment = dbo.ParseLogRev(@Comment, 'type');

       set @Comment = dbo.ParseLogRev(@Comment, 'code');

       set @Comment = dbo.ParseLogRev(@Comment, 'srcmac');

       set @Comment = dbo.ParseLogRev(@Comment, 'dstmac');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'srcport');

       set @Comment = dbo.ParseLogRev(@Comment, 'dstport');

       set @Comment = dbo.ParseLogRev(@Comment, 'tcpflags');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'mark');

       set @Comment = dbo.ParseLogRev(@Comment, 'app');

       set @Comment = dbo.ParseLogRev(@Comment, 'outitf');

 

       set @Comment = dbo.ParseLogRev(@Comment, 'threatname');

       set @Comment = dbo.ParseLogRev(@Comment, 'status');

       set @Comment = dbo.ParseLogRev(@Comment, 'host');

       set @Comment = dbo.ParseLogRev(@Comment, 'url');

 

       INSERT INTO dbo.FirewallLog (EventTime, ProxyName, LogFunction,

                     ItmId, ItmSeverity, ItmName, ItmAction,

                     ItmFwrule, ItmInitf, ItmSrcmac, ItmDstmac,

                     ItmSrcIP, ItmDstIP, ItmProto,

                     ItmLength, ItmTos, ItmPrec, ItmTtl, ItmType, ItmCode,

                     ItmSrcPort, ItmDstPort, ItmTcpflags,

                     Comment, ItmMark, ItmAppId, ItmOutItf, FileDate, FileSeq,

       ItmThreatName, ItmStatus, ItmHost, ItmUrl)
 values

                     (@EventTime, @ProxyName, @LogFunction,

                     dbo.ParseLog2(@FullMessage, 'id',4),

                     dbo.ParseLog2(@FullMessage, 'severity',5),

                     dbo.ParseLog2(@FullMessage, 'name',75),

                     dbo.ParseLog2(@FullMessage, 'action',20),

                     dbo.ParseLog2(@FullMessage, 'fwrule',5),

                     dbo.ParseLog2(@FullMessage, 'initf',6),

                     dbo.ParseLog2(@FullMessage, 'srcmac',17),

                     dbo.ParseLog2(@FullMessage, 'dstmac',17),

                     dbo.ParseLog2(@FullMessage, 'srcip',40),

                     dbo.ParseLog2(@FullMessage, 'dstip',40),

                     dbo.ParseLog2(@FullMessage, 'proto',3),

                     case when dbo.ParseLog2(@FullMessage, 'length',10) > '' then
                     cast( dbo.ParseLog2(@FullMessage, 'length',10) as int)
                     else null end,

                    

dbo.ParseLog2(@FullMessage, 'tos',4),

                     dbo.ParseLog2(@FullMessage, 'prec',4),

                     case when dbo.ParseLog2(@FullMessage, 'ttl',10) > '' then
                     cast( dbo.ParseLog2(@FullMessage, 'ttl',10) as int) else null end,

                     case when dbo.ParseLog2(@FullMessage, 'type',10) > '' then
                     cast( dbo.ParseLog2(@FullMessage, 'type',10) as int) else null end,

                     case when dbo.ParseLog2(@FullMessage, 'code',10) > '' then
                     cast( dbo.ParseLog2(@FullMessage, 'code',10) as int) else null end,

 

                     dbo.ParseLog2(@FullMessage, 'srcport',6),

                     dbo.ParseLog2(@FullMessage, 'dstport',6),

                     dbo.ParseLog2(@FullMessage, 'tcpflags',40),

                     @Comment,

                     dbo.ParseLog2(@FullMessage, 'mark',10),

                     dbo.ParseLog2(@FullMessage, 'app',10),

                     dbo.ParseLog2(@FullMessage, 'outitf',6),

                     @FileDate, @FileSeq,

                     dbo.ParseLog2(@FullMessage, 'threatname',75),

                     dbo.ParseLog2(@FullMessage, 'status',1),

                     dbo.ParseLog2(@FullMessage, 'host',255),

                     dbo.ParseLog2(@FullMessage, 'url',255)

               );

 

       FETCH next from Firewallxml INTO @EventTm, @ProxyName,
       @LogFunction, @FullMessage, @FileSeq;

END

CLOSE Firewallxml;

DEALLOCATE Firewallxml;

 

END

Specific Issues in the Files

Most log files have data of the form sys=”value1” sub=”value2”.   My inspection concluded that within a file, these values were fixed, so I chose to discard them.

The APT log and the Application Control Log use essentially the same log format as the Firewall log, so I have implemented the extra data fields in both the procedure and the related table so that all three logs can be populated using a single procedure and a single data table.

The web proxy log file has a small number of entries with function code starting “URID”, which contain “Comment” data about early DNS timeouts.   I only found them because of this parsing process.   I have been advised that they can probably be ignored.

For the web proxy log, I actually do a three-step process.   For many analyses of web data, I wanted to have the Protocol, FQDN, Path, and Querystring as separate fields.  Step one loads the log file into the temporary table.   Step two parses the Message string in the temporary table to produce distinct values in an intermediate table.  Step three parses the intermediate table to split each URL into four separate fields.   Some other optimizations are applied during stage 3, to reduce space requirements.

First the raw data is loaded into a temporary table as indicated above, then the temporary data is parsed into an intermediate table.

The web proxy log file paired code-description information:   AppID and Application, Category and CategoryName, ID and Name.

The WAF log file has multiple complications which I mentioned in a separate forum post:

  • Some records are continued because they are too long to fit in one record
  • Security records follow a different pattern, using tokens delimited by brackets and some prose between the bracketed tokens.

Because of these differences, the WAF log parsing required additional custom logic.   I loaded the normal data and the security data into two different tables, then created multiple views for inspecting the different versions of security records.   Since this document is introductory, I am not including anything that adds complexity.

Notes on File Relationships

When IPS blocks a web response, the block appears in the IPS log immediately, but the Web Proxy log captures the problem as a timeout condition which can be as much as two minutes later than the IPS event timestamp.

The Firewall log only captures IP addresses, and Reverse DNS on the IP address does not always provide a meaningful result.   The Web Proxy log is a treasure of Forward DNS results, and can be used to provide meaningful DNS names for some of the entries in the Firewall log.

If using Country Blocking, it is recommended to deploy Transparent Web Proxy globally.   This ensures that the blocks will appear in the Web Proxy log with a URL, rather than in the Firewall log, without any DNS information.

Supporting Procedures

This is the function to search for a keyword=”value” phrase, and return value if found.

CREATE Function [dbo].[ParseLog2](

       @msg1 varchar(7999),

       @itm varchar(999),

       @ItmSize int = 8000

) returns varchar(8000)

as

BEGIN

       declare @msg varchar(8000);

       declare @tmpstr varchar(8000);

       declare @tmpstr1 varchar(8000);

       declare @tmppos int;

 

       if @ItmSize <= 0 or @ItmSize > 8000

       BEGIN

              SET @ItmSize = 8000

       END

 

       set @msg = ' ' + @msg1

    Set @tmpstr = '% ' + @itm + '="%'

    Set @tmppos = patindex(@tmpstr,@msg)

    If @tmppos = 0

    Begin

        Set @tmpstr1 = NULL

    End

    Else

    Begin

              Set @tmpstr1 = Substring(@msg, @tmppos + Len(@tmpstr) - 2 ,7999)

              Set @tmppos = Patindex('%"%',@tmpstr1)

              if @tmppos > 0

              begin

                  set @tmpstr1 = Left(@tmpstr1, @tmppos - 1)

              end

              if len(@tmpstr1) > @ItmSize

              BEGIN

                     SET @tmpstr1 = LEFT(@tmpstr1,@ItmSize)

              END

    End

       Return @tmpstr1

End

 

GO

 

 

 

This is the procedure to discard a keyword=”value” phrase, in search of anything left after the known fields have been parsed.

ALTER Function [dbo].[ParseLogRev](

       @msg1 varchar(7999),

       @itm varchar(999)

) returns varchar(7999)

as

BEGIN

       declare @msg varchar(8000);

       declare @tmpstr varchar(1001);

       declare @tmpstr1 varchar(7999);

       declare @tmppos int;

       Declare @MsgStub varchar(8000);

 

       set @msg = ' ' + @msg1

    Set @tmpstr = '% ' + @itm + '="%'

    Set @tmppos = patindex(@tmpstr,@msg)

    If @tmppos = 0

    Begin

        set @MsgStub = @msg1

    End

    Else

    Begin

              Set @MsgStub = Substring(@msg, 1, @tmppos - 1)

              Set @tmpstr1 = Substring(@msg, @tmppos + Len(@tmpstr) - 2 ,7999)

              Set @tmppos =  charindex('"',@tmpstr1)

              if @tmppos = 0

              begin

                     set @tmppos = len(@tmpstr1) + 1

              end

           set @MsgStub = Rtrim(@MsgStub) + ' ' + LTrim(Substring(@tmpstr1, @tmppos + 1,7999))

    End

    Return @MsgStub

End

 

Finally, this is the procedure to parse the components of a URL:   PROTOCOL=Protocol, PATH=FilePath, QUERY=QueryString, Default=FQDN / DNS name.    The URLPart is case-insensitive and only the first four characters matter. 

 

CREATE FUNCTION [dbo].[ParseURL] ( @paramURL varchar(5000)='', @param_urlpart varchar(5)) RETURNS varchar(255)

AS

BEGIN

       -- Declare the return variable here

       DECLARE @tmpURL varchar(5000), @urlprot varchar(50), @urlpath varchar(5000), @tmppos1 int

       DECLARE @rslt varchar(5000), @urlquery VARCHAR(5000)

       DECLARE @urlpart varchar(5)

       set @urlpart = left(upper(@param_urlpart),4)

 

       set @tmpURL = coalesce(@paramURL,'')

 

       set @tmppos1 = charindex('://',@tmpURL,2)

       if @tmppos1 > 1

       begin

              set @urlprot = left(@tmpURL,@tmppos1-1);

              set @tmpURL = substring(@tmpURL,@tmppos1+3,5000);

       end

    else

       begin

              set @urlprot = ''

       end

 

       set @tmppos1 = charindex('?',@tmpURL,2)

       if @tmppos1 > 1

       begin

              set @urlquery = substring(@tmpURL,@tmppos1+1,5000);

              set @tmpURL = left(@tmpURL,@tmppos1-1);

       end

    else

       begin

              set @urlquery = ''

       end

 

       set @tmppos1 = charindex('/',@tmpURL,2)

       if @tmppos1 > 1

       begin

              set @urlpath = substring(@tmpURL,@tmppos1+1,5000);

              set @tmpURL = left(@tmpURL,@tmppos1-1);

       end

    else

       begin

              set @urlpath = ''

       end

 

       IF @urlpart='PROT'

              set @rslt = @urlprot

       ELSE IF @urlpart='PATH'

              set @rslt = @urlpath

       ELSE IF @urlpart='QUER'

              set @rslt = @urlquery

       ELSE

              set @rslt = @tmpURL

 

       RETURN LEFT(@rslt,255)

 

END

 



This thread was automatically locked due to age.
Parents
  • Genius!

    Cheers - Bob
    PS I'll prioritize this permanently at the top of this forum.

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply
  • Genius!

    Cheers - Bob
    PS I'll prioritize this permanently at the top of this forum.

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Children
No Data