Makes up 90 % of data collected by organisations
What is Splunk
Collects data from any source
Label the data with a source type
Timstamps identified and collected
Added to splunk so they can be searched
Search & Investigate
Search via SQL/queries
Add Knowledge Objects
Give data classifications
Enrich the data
Monitor and Alert
Monitor and alert, respond with actions
Dashboards can be used to visualize
Process incoming data store data in indexes
Stores data in directories with time frame components
Allows users to write SQL, distribute searches to various indexes, then returns data back
Tools such as dashboards and reports
Search Requests are processed by the Indexers…
Usually on machine where data originates.. – not possible in CGI..
Light weight, don’t require a lot of performance.
Single Instance contains all components on one node
Input,parsing,indexing and searching of data ( 4 main processes )
Ok for PoC, personal use, learning , and small departments/organizations
You would distribute the Input,Parsing , Indexing and Searching across multiple nodes i.e. More than 1 Search Head to allow concurrency
More forwarders to ingest more data more quickly
All can be clustered to ensure all are available – no single points of failure
Installing Splunk Enterprise
I installed on my AWS Linux machine
Essentially was a linux unzip, then /opt/splunk/bin/splunk start
Apps and Roles
Preconfigured environments that sit on top of Splunk Enterprise
Think of them as workspaces
Roles – what a user can do
Admin – install/ingest/create knowledge app
Power – create shared knowledge and run searches
User – use apps shared and
You can create and deploy your own Apps..
Apps that ship are “Home App” and “Search & reporting”
You can launch and manage apps from the home app
Get Data In
Add Data –
Upload – upload csv type data, one off, good for testing
Good for classifying data, if it recognizes data, it labels it dynamically
Source types : we can add different custom types, it can decide how to determine fields/delimiters for data from the source type i.e. csv…
Source types : can be amended.. can define how to split files
App Context : source types can be made available system wide…
Hostname : can be added from the content of the file, or from the host it originats from
Should split data into multiple indexes – bit like any Database
You can control who has access to indexes(data) by Role management
You can also set retention periods for each index, i.e. dropping partitions essentially so makes management faster and easier.
Monitor – monitor ports, locations etc
Files & Directories
You can continuously monitor
You can white list/blaclist files
Can dynamically pick source type
App Context can be selected
Click commit, starts the indexer allowing
HTTP Event Collector
Forward – installed on remote machines, these are forwarded to indexers..
Setting up forwarders.. outside scope
Windows – would allow to monitor local and remote event logs etc..
Search > conduct searches > enter query
Datasets > see what data sets are available
What to Search Panel > data summary > summary of data available
Contains host name
Search history menu
Search i.e. find failed authentication i.e. failed
Failed : ( last 30 days > make sure you set a time range )
Can save results as knowledge objects
Visualizations can be > transforming commands are used to create tables from which visualizations can be used
Stop a job..
We can share a job
Jobs remain active for 10 minutes after completion
A shared search job, remains active for 7 days… ( snapshotted )
Export results in JSON,RAW,CSV etc..
Fast – highlevel, fields discovery disabled
Verbose – discovers all fields
Smart – toggles behavior..based on the search being run
Timeline, shows you events during time range..
Zoom IN > uses original job output , zooming OUT it will run a new job/report
Use returned events… to dig further
Timestamp > is retrieved as per your user account timezone
Fields can be added removed from the search.. you can click on the field in the event, allows you to edit the search criteria
i.e. fail * > use wildcards
failed NOT password ( Booleans )
failed OR password
failed AND password
Order of evaluation
Escape character i.e. info=”user \”chrisv4\” not in database”
Fields > extracted fields from search > host,source,type are selected by default
FieldNames are Case Sensitive i.e. SourceType=
FieldValues are NOT case sensitive
Wildcards can be used with search fields
# = numeral
a = string i.e.
a dest 4 // it’s a string, contains 4 values
You can add fields to the query, will also add transforming events, so creates visualizatins.
You can filter fields, see statistics on fields.
Sourcetype=linux_secure >> bold case sensitive
=! String values
“<>” numerical values
i.e. NOT host=mail* >> wild cards.,
Lab 6 : No Events where didn’t end in HTTP 200 (i.e. Success ) = 1301,
19,235 in total
Last 7 days – limit time frame
The more you tell the search the better
Inclusion is better and NOT listing inclusion
Always use Sourcetype= as a first step
Date & time ranges
Real Time searches i.e. between specific times i.e. 10 minutes ago until SYSDATE, will perform search in rolling 10 mins.
Advanced tab – i.e. -30m – last 30 minutes…
-30d ( d = days ) ( w = weeks ) ( mon = month ) ( y = year )
@ ( round ) to a unit i.e. -30m@h events from start of the hour are returned.
Can be used in searches i.e.
Sourcetype=access_combined earliest=-2h latest=-1 or absolute values
This will limit the search run time i.e. web data, security data.. only search partitions/indexes
Can also limit access to data
To search a specific index : index=web OR index=security
Index=ma* ( also can use wildcards )
Splunk Search language
Sourcetype=acc* status=200 | stats list(product_name) as “Games Sold”
Commands — chart/visualization
Functions — how to chart
Can also pass result to another via Pipe |
index=web (sourcetype=acc* OR sourcetype=ven*)
| timechart span=1h sum(price) by sourcetype
Search results are piped into other commands
They are in memory, left to right
Index=web source_type=access_combined | fields status clientip // include status and clientip
Index=web source_type=access_combined | fields -status clientip // exclude status and clientip
Index=web source_type=access_combined | fields -_raw // exclude raw with hidden _
Field Extraction really slows down !! so include fields you want will increase performance, excluding will only occur after query has run, so will not improve performance, but will change the visualization
Table= returns data in tabulated format, so we can easily see what products were purchased i.e.
Index=web sourcetype=access* status=200 product_name=* | table jsessionid,product_name,price
Rename fields in a table
Index=web sourcetype=access* status=200 product_name=*
| table jsessionid,product_name,price
| rename jsession as “User Session” product_name as “Purchased Game” price as “Purchase Price”
Changing names be careful, as the pipeline will depend on the original table.
Make sure you quote the new name i.e .”User Session”
We can use dedup to remove dups
Index=security sourcetype=history* address_description=”San Francisco”
| dedup firstname lastname
| table username firstname lastname
We can sort desc/asc
| table vendor product_name sales_price
| sort vendor product_name
| sort – sale_price Vendor ( sort by sale price + or – )
Where you place the + or – is key, – sale price vendor will sort all rows by sale price then vendor. Whereas -sale price vendor will first sort sale price, then vendor.. so having a space makes a big difference to the output !!
Limit-20 > you can limit to first 20 rows like most SQL limit commands …
Order search results into a data table
Transform into viisualzations
TOP by default is 10 !! top 10
TOP – top n commands i.e.
Index=sales source_type=vendor_sales | top Vendor
Can add limit = 0 for all rows
Can add limit = 20
Clauses can be used
| top vendor limit=5 showperc=False
Countfield =”Number Of Sales” useother=True
Top 3 products sold by each vendor – GROUP by equivalent…
| top vendor limit=3 by Vendor showperc=False
Countfield =”Number Of Sales” useother=True
Bottom 3 products sold by each vendor – GROUP by equivalent…
| rare vendor limit=3 by Vendor showperc=False
Countfield =”Number Of Sales” useother=True
To produce stats i.e.
Total number of sales in last week
| stats count as “Total Sells By endors
Count of a number of fields were present i.e. an action vs Total events
| stats count(action) as “Action Events”,
Count as “Total Events)
Dc ( dictinct count )
| stats distinct_count(product_name) ( or dc)
As “number of games by vendor” by sales_price
Sum Function be sure to include all aggregations within the same pipe function.
| stats sum(price) as “Gross Sales” by product_name
stats count as “Units Sold”
| stats avg(sale_price) As “average sales price”
List() list all values of a given field
Lists all values for a field i.e .all assets an employee has
| stats list(Asset) as “company assets” by Employee
Values() list distinct values of a given field , i.e. list of sites users have visited.
Lists all UNIQUE values for a field i.e .all assets an employee has
| stats values(s_hostname) by cs_usename
We can save reports for use in dashboards etc.. saved content can be shared
We can save a report with Time Picker or not
You can edit the report, schedule it etc..
You can set permissions
Run-As >> run as the owner, or as the user running the report
Run-As USER >> will only allow a user to see data they are allowed to see
Accelerated > runs on an aggregated data set rather than ALL data i.e summary table
Can save report as Table, Chart etc and time picker etc..
Any report with Stat values, can be charted..
Allows drills from visualizations
Can be saved as a report or a dashboard panel
You can add panels which can be new searches, based on new pages
Time Frame pickers can be included, but searches must have a time frame within the search otherwise data will not dynamically refresh i.e. INLINE
Charts can be based on numbers,time or location
A time ranger picker can be included in a report
Pivots and Datasets
Data models > knowledge objects > which will drive pivots
Can be saved as report panels.. > can be added to a dashboard
Pivots is an interface to a data model/data
Datasets represented as tables.. slices of data
Pivots is kind of like a MART joined in a schema/star schema
Data Models > Pivot models…
Children data sets, are subsets of a dataset i.e. and AND condition
Pivots runs for ALL time by default, can change this to last 7 days etc
You can Limit rows..
Split columns by Product names.. you can update the table…
Pivots can be saved as a report, or added to a dashboard
Instant Pivot is displaed in the stats/viz tabs when a search is run that is non-transforming i.e. doesn’t include aggregations/stats
Quick way of creating a pivot, even if a datamodel doesn’t exist
Write a query with no transform functions
Select all fields you want to use in a data model, all fields, selected fields etc.. or % of columns
Data sets that make up data model, we can have give users access to data sets
Fields are columns, events are rows
Visualize in pivot or explore in search.
There is an add-on that allows you to create datasets without having to generate the queries.
Allow you to add external data to indexed data
i.e. csv files we can refer to
first row in a csv lookup represents field names you will see
they are defined as datasets
You must define a lookup table
Define a lookup
Configure it to run automatically
Lookup field vales are case sensitive
i.e. lookup table for HTTP request return codes
a CSV file i.e. 200,OK i.e. lots of rows >> input fields 200
output field OK
Only available in “search” application
Place the csv on a location of the lookup csv file on the file system
i.e. refer to it via a pipe i.e. | inputlookup http_status.csv, should see “code,description” as values
Define a lookup > settings > lookup
A csv lookup file as the source
INPUT field is not returned
OUTPUT NEWcan be used if field names already exist, otherwise they will be ovewritten
You can create time based lookup, as long as time is in the raw data
Can set min/max matched values
Can be case/case insensitive
Batch query index can improve performance
Match-type i.e. fuzzy matching…
index=web source_type=access_combined NOT status=200| lookup http_status code as status
OUTPUT code as “HTTP Code”, description as “HTTP Description”
>> will match http_status to code in the lookup and return descriptive name i.e. OK not 200 ..
Lookups can be configured to Automatically lookup values, so people don’t need to manually write lookup commands in the query
The mapping is defined within the Automatic Lookup
You can also
Populate lookup tables with search results
Define lookup based on an external script or command
Use Splunk DB connect application to lookups based on external databases within the organization.
Auto sending reports via email
Schedule reports to run on a schedule
Run a search . run a query, name it… don’t usually use picker as not interactive
Click schedule link from advanced options
Pick the time frame i.e. over last 7 days
Concurrent reports can put a high load.. keep an eye on things
Schedule priority can be set to ensure they run at times when things are quiet
Schedule window > splunk can decide when best to run a report
Only available to admin users
High priority reports will run over and above other reports are running
Reports can be delayed
Can output reports – trigger actions
Run a script
Send an email
Custom alert actions
Output to telemetry output
Tokens can be used to add variable content i.e. $name$
Can send in html or plain text format
Managing scheduled reports
Searches can be edited
Move,delete,clone all possible
Can edit via messages > searches,reports and alerts..
Reports tab > reports > edit schedule
You can embedd reports into output,emails…web pages.. via an iFrame.. so means users without Splunk access can see the output.. be careful !!
Triggered by a saved search
Based on alerts when result of a search meet a condition
Alerts can > list in interace, log events, output to a lookup , send emails, run scripts
Create a search.. save as > alert
Set alert permissions > set private or shared in app – all users will see alerts
Scheduled or realtime… so can be realtime, or use a CRON type expression. Realtime will run in the background at all times..
Realtime alerts > be careful, they could cause system load
Triggers can alert after a number of conditions i.e. number of results, number of hosts, or something you define.
You can confirm an alert should be sent, only if its an issue i.e. http500 only is set, if alerts is > 100, and within a timeframe… you don’t want to SPAM alerts !!
Only trigger alert once… will only fire once within timeframe. For each result . will trigger lots of alerts..
Throttle > can also suppress output
You can add actions > dependant on user roles
You could log events
You could update a lookup csv fle
Run a script on the local instance – deprecated now
Send an email…
Tokens to add custom comment
Webhook > popups in a webchat for example
Prebuilt alert actions can be used as admin users, can create your own
Activity > triggered Alerts
Anything can then be viewed and monitored within this page.