Hadoop is a framework that can be utilized to concoct large data sets across clumps of computers using simple programming models. But now, there is a new Hadoop project, Hadoop Submarine, for developing Deep Learning frameworks, such as TensorFlow, on Apache Hadoop. The submarine has integrations with Zeppelin and Azkaban for running jobs. The new project intended to advance the support for using Deep Learning to assess Hadoop data.
The Hadoop Submarine’s aim is to make it easier to launch, manage and supervise distributed Deep Learning or Machine Learning applications built in frameworks like TensorFlow. Other advancements besides Submarine include better GPU support, container-DNS support, Docker container support, and scheduling improvements. The developers said the advancements make it as easy to sprint distributed Deep Learning/Machine Learning applications on Apache Hadoop YARN as it would be to run such applications locally. This improvement will enable users to run Deep Learning workloads with other ETL or streaming jobs running on the same cluster. The Submarine project has two parts the Submarine computation engine and a set of submarine ecosystem integration plug-ins and tools. The Submarine computation engine submits customized Deep Learning applications such as Tensorflow, Pytorch, and the like. to YARN from the command line. These applications run alongside with other applications on YARN, for instance, Apache Spark and Hadoop Map/Reduce. A set of submarine ecosystem integrations sit on top of the computation engine. The current list appends incorporation between Submarine and Zeppelin, and between Submarine and Azkaban.
The Zeppelin integration indicates data scientists can code inside Zeppelin notebooks, and present and manage training jobs undeviatingly from the notebook. Zeppelin is a web-based notebook which promotes interactive data analysis through SQL, Scala, and Python to make data-driven, interactive, collaborative documents. On the other hand, Azkaban is a batch workflow scheduling service that was built at LinkedIn to sprint Hadoop jobs. Azkaban fixes the ordering through job reliance and offers an easy to utilize the web user interface to keep up and pursue workflows. The Integration with Submarine denotes a data scientist can present a set of tasks with dependencies directly to Azkaban from notebooks.