Uncategorized

Nifi Demo 1 – Ingest Logs, display in Hive Tables

Objectives

  • Nifi log generator canvas generates fictional firewall logs.
  • Nifi flow places files into a HDFS file system i.e. /tmp/server-logs
  • Place a Hive external table over the log files within HDFS
  • Create an ORC table within Hive looking at logfile data in HDFS

The following canvas was imported, this generates log file data in HDFS

image1

We can see files being placed into HDFS

image1

We can see the log data is very simply for this test case, is pipe delimited just containing 4 fields.

[root@sandbox nifi_local]# hadoop fs -cat /tmp/server-logs/170571687714398

Sat May 12 16:49:18 UTC 2018|173.1.3.138|ITA|0

Within Hive, or Ambari Hive Views we can then create a schema/external table over the log data in the specific location using this HiveQL. You could do this directly on the HIVE command line, or via a Hive SQL tool.

##Create external Hive Table

drop table firewall_logs;

CREATE TABLE FIREWALL_LOGS ( time STRING, ip STRING, country STRING, status INT )

CLUSTERED BY (time) into 25 buckets

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’

LOCATION ‘/tmp/server-logs’ TBLPROPERTIES(“transactional”=”false”);

 ##Now Create Hive ORC table (FIREWALL)

drop table firewall;

CREATE TABLE FIREWALL AS SELECT * FROM FIREWALL_LOGS;

image1

You can now expand the “default” database within hive, and see the Hive “FIREWALL” table has been created from the external table .

image1

You could the query the ORC Hive table :-

image1

Could also query within a Zeppelin Notebook :-

image1

Leave a Reply

Your email address will not be published. Required fields are marked *