Microsoft Excel, already installed on most corporate desktops, is commonly underappreciated by IT security practitioners....
Data analysis is a common security task and Excel can often be the quickest option to analyze firewall logs, antivirus data, proxy logs, OS logs and a file listing from a compromised server. Data is everywhere and is often more useful than we expect, if we know how to look at it.
There are many ways to solve each of the challenges I am about to describe. Excel may not be the best answer, but it's often the quickest and simplest. Prior to Excel 2007, there was a hard limit of 65,000 rows of data. That limitation is gone, so you can import and manipulate large data sets.
One straight-forward use of Excel is to quickly create a list of repetitive commands.
Let's say you have a list of 1,000 specific files you want to delete from a system, but they are in various locations and cannot be selected by criteria such as name, size or last accessed. You can paste the list of files into the second column (B) in Excel. In the first column enter del ". You can select that cell then drag it down to the bottom of the list to create 1,000 duplicate copies of del ". In the third column add a double quote (") and repeat the select and drag process. Copy and paste the Excel sheet into notepad to create a batch file with 1,000 file-specific delete statements. You'll need to do a find and replace to eliminate the TAB that Excel used to delimit the columns. This technique also works well when you need to create SQL statements to add or alter data (see Figure below)
Here's a list of some specific features that may be new to you:
- Text to Columns
- Auto Filter
- Filter Unique
- Pivot Tables
TEXT TO COLUMNS
The Text To Columns option, in the Data menu may be well known, but it's so useful that it must be mentioned. If data is in one column you can have Excel break it into multiple columns via an arbitrary delimiter (comma, tab, colon, [, Q, etc.) or based on field size. Then you can sort and examine based on that value. If your logs happen to contain text such as "transaction time (23) ms" you could break the value between the parenthesis into its own column that can then be examined. This is often helpful when the data is automatically separated by one delimiter but then must be further broken down by a second delimiter. Active Directory logs are a good example -- there are comma-delimited fields but the last contains several tab-delimited values.
USING AUTO FILTER
The Auto Filter, found under Data > Filter, will add a drop-down to the top of each column with the unique values within that column. This allows you to quickly display a sub-selection of data. Using Active Directory logs as an example, we can use Auto Filter to show only the messages of a given Event ID, such as 540 (successful log on) (see figure below).
Filter Unique, found under Data > Filter > Advanced, shows only one row for each data value in a given row. This is a quick way of eliminating repeating messages that may clutter your data set. You could easily generate a list of all users from an authentication log. Have you ever had a collection of files that included duplicates? Use an MD5 hashing program, such as MD5Deep, to create a file listing with the MD5 hash. Load that file in Excel, use Filter Unique on the MD5 value, and you now have a list of all unique files. You can use the technique mentioned earlier to create the commands to either copy each of those files to a new location or to just delete all the leftover files.
Pivot tables, under Data > PivotTables > PivotTable Report, allows you to chart and examine data. Let's say you have logs from an antivirus product, with each entry showing an infection alert. A PivotTable would allow you to quickly determine which viruses are on the most systems and which systems have the most viruses. From an authentication log you could easily see which users logged on to each server. This reporting function allows for very quick drag and drop analysis of the data. Normal Excel reports allow for charting based on a raw value count but PivotTables allow you to aggregate and examine the data much more. Pivot tables will also coalesce identical entries and thus will show you the number of unique hosts and unique viruses. Pivot tables also allow you to double-click cells in order to drill-down to more detail (see figure below).
Hopefully I've shown that Excel can be helpful in ways that you did not expect. It's a tool you probably have installed already, and can be quickly used to examine that random log file or other data set that you encounter frequently.
Tom Chmielarski is a senior consultant with GlassHouse Technologies, Inc.
Send comments on this technical tip to firstname.lastname@example.org