Uncategorized

Tableau

Training https://onlinehelp.tableau.com/current/pro/desktop/en-us/default.htm https://www.tableau.com/learn/tutorials/on-demand/getting-started https://www.tableau.com/learn/starter-kitshttps://www.tableau.com/learn/webinars/introduction-tableau-server Desktop — downloads https://www.tableau.com/support/releases/desktop/2018.3.2 https://onlinehelp.tableau.com/current/desktopdeploy/en-us/desktop_deploy_download_and_install.htm Server — Downloads and top topics There is a trail version of Tableau Server for both Windows and Unixhttps://www.tableau.com/products/server/options Documentation of Tableau Server Unix Install https://onlinehelp.tableau.com/current/server-linux/en-us/install.htm https://www.tableau.com/support/serverhttps://www.tableau.com/support/releases/server/2018.3.2 Tech Specshttps://www.tableau.com/products/techspecs#server Notes Tableau desktop is used to initially create data sources and connections Tableau desktop is used to…

Uncategorized

SparkSQL Example

Download and Import Into HDFS %sh if [ -e /tmp/au.dataset.downloaded ] then echo ‘Files already downloaded so skipping the download …’ exit 0; fi #remove existing copies of dataset from HDFS hdfs dfs -rm /tmp/expenses.csv #fetch the dataset curl https://data.gov.au/dataset/f84b9baf-c1c1-437c-8c1e-654b2829848c/resource/88399d53-d55c-466c-8f4a-6cb965d24d6d/download/healthexpenditurebyareaandsource.csv -o /tmp/expenses.csv #remove header sed -i ‘1d’ /tmp/expenses.csv #remove empty fields sed -i “s/,,,,,//g” /tmp/expenses.csv…

Uncategorized

HDFS – Compressing Content

Need to look at the other types of compression :- https://www.cloudera.com/documentation/enterprise/5-3-x/topics/admin_data_compression_performance.html Although Bzip2 is splittable and gives best compression, is also the slowest at compressing and decompressing More Testing Test both Spark and Hadoop Streaming to compress 182mb and 14Gb file with codecs Snappy : org.apache.hadoop.io.compress.SnappyCodec LZO : com.hadoop.compression.lzo.LzoCodec Scala Script to Iterate over Files…

Uncategorized

Big Compute – Container Shell Script

The following Bash shell script will look for files in an Object Store, and copy locally. Tests if files exist.. #!/bin/bash #This script will create the LiveApps data tables from the files in the StorageContainer sent via Scott/CRC CONTAINER=sc-oc-vm-dev-ace-01 DIRECTORY= PRICINGREPORT=”PriceReport_$(date +”%Y-%m-%d”).csv” LIVEAPPSDATA=”RMSLiveAppsData_$(date +”%Y-%m-%d”).csv” FILELINECOUNT=0 echo “Object Storage Container Name :” $CONTAINER echo “LiveApps Pricing…

Uncategorized

Nifi – Stream to HIVE

Objectives Nifi log generator canvas generates fictional firewall logs. Use Nifi processor PutHiveStreaming to load directly into Hive Issue here is PutHiveStreaming requires AVRO file format So we could Ingest files Process incoming files changing delimiters from say “|” to “,” Infer the Avro schema format using the correct processor Then use ConvertCSVToAvro process Then…

Uncategorized

Unix SUDO

You can elevate privileges via sudo as a named user. This is configured by admin/root user.  You can view sudo’s list / functions via :- Type : more /etc/sudoers Edit SUDO list Type : visudo, this will allow you to edit /etc/sudoers You may see content such as ## Read drop-in files from /etc/sudoers.d (the…