This is a study we’d (a former colleague of mine and yours truly) done last year of a log analysis tool called Splunk
What you will read in this article are the results/excerpts of that study.
Some of the questions we asked are as follows –
- Why use a log analysis tool?
- What do most shops use
- What does a tool such as splunk buy us (as an IT shop)
- What are it’s benefits and pit-falls?
- What is the cost of ownership?
Why use a log-analysis tool?
The biggest reason to use such a tool would be to move from a Reactive to Proactive Systems Management paradigm
With the number of systems (about 900+ *nix servers in that shop) and the criticality (many systems cost millions of dollars in down-time) of availability of these, it is imperative to find a tool that can actually be used quickly and effortlessly to analyze valuable log information
If such a tool can look at various layers of a “delivered stack” (aka hardware, os, application, network, san, etc), it would be a gold-mine by virtue of being able to link the stack “end-to-end” and by speeding up the analysis process.
What do most shops use?
Most shops I’ve been in do log analysis like this –
a) Don’t do any log analysis unless absolutely required. And if it is required, admins log into the individual servers and parse through the logs using vi (or using a combination of grep/awk/sed if they are script-savvy)
b) Have a centralized ssh (or god forbid! rsh) trusted admin host from where they launch a log parser script that filters specific key words and that gets emailed to a mailbox or to the individual admins’ email boxes
c) have a centralized log host where they run a script akin to the one mentioned above
I’ve worked in shops of varying sizes — from a ISP/Telecom giant who ran 4000+ sun servers to a 50-server tiny sweatshop. Most of the shops I’ve been in fall some where in between (with hosts ranging from 200 - 1000 in number). That’s a lot of hosts to manage and a lot of logging that needs to be parsed.
What does a log-analysis tool buy an IT shop?
You’ve all probably thought about this — a centralized, easy-to-use log analysis tool buys an IT shop valuable time!
So what does Splunk claim to do?
In there own words –
“The Splunk Server indexes IT data from ANY source. No need to configure it for specific formats, write regular expressions or change your logging output. Search mountains of data by time, keywords, type of event, source, host or relationships to other events. “
Some key features of Splunk:
- Universal Indexing
- Can index terabytes of data all from one place
- Capable of indexing approx. 22,000 events/second at density of 150 bytes/event.
How does splunk acquire data?
Access data from any live source:
- Mounted files: NFS/SMB, CIFS/AFP, NAS/SAN, FIFO,
- Remote files: rsync, scp/ftp/rcp,
- Network ports: UDP & TCP, syslog/syslog-ng, log4j/log4php, JMX/JMS, SNMP
- Databases: SQL/ODBC
- Splunk Servers: Access data locally on production hosts and forward it to another Splunk Server over SSL/TCP
The actual evaluation results will be the next article.