Overview

This instruction manual provides a guideline how to perform tasks on KitWai platform ranging from starting Spark or Elasticsearch clusters, using the cluster to process and visualize data, to terminating clusters. Although we mainly use python and Jupyter notebook as the tools to give illustrative examples, other tools and programming language can still be used if needed.

KitWai consists of the following components.

Layer Component
Data exploration and visualization Kibana 6.2.4, Grafana 5.1.3, Superset 0.25
Connection and tools 1 Ganglia Monitoring, Thrift JDBC connection, Jupyter/Zeppelin Notebook, RStudio Server
Distributed computing platform for big data analytics Spark 2.3.0, Flink 1.4.0, Tensorflow on Spark
Distributed data streaming and storage platform Hadoop HDFS 2.7.1, Elasticsearch 6.2.4, Kafka 1.0.0
Cloud-based resource management CentOS 7.4, Openstack Pike (Nova, Cinder, Swift, Sahara, Glance, Neutron), Nvidia GPU support, KVM Hypervisor and LXC Linux Container Support

KitWai is not limited to be installed on bare-metal machines. Thanks to LXC container, virtual machine environment is also supported.

1 Several python packages are already installed in the cluster, including numpy, pandas, matplotlib, etc. Additionally, Java and Scala compilers are pre-installed too.

Scalability

A cluster can be scaled up or down to work with changing workload as per user demands. The ability to scale depends on the processes running on the node groups. Check the Capabities Matrix to find the complete list of all processes.