Blogs

IPython-Jupyter notebook: Secure integration with Platform Conductor

By Archive User posted Thu April 14, 2016 10:00 AM

Originally posted by: SherryXue

A few months ago, we featured a blog post on integrating the IPython-Jupyter notebook with Platform Conductor for Spark. Since then, we've enabled an additional layer of security for the IPython-Jupyter notebook, wherein the notebook user's credentials are required to log in to the IPython notebook GUI. This updated blog post details the IPython-Jupyter integration and includes additional steps for secure authentication.

NOTE: To enable this authentication check in your environment, you must install interim fix pcs-1.1-build398394 to your Platform Conductor for Spark v1.1.0 cluster. Refer to the readme bundled with the interim fix for instructions.

The IPython notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media. Its uses include data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and more. Since IPython 4.0, the IPython Notebook has been migrated to Jupyter. For information on the IPython notebook, refer to the official documentation.

To readily integrate the IPython-Jupyter notebook (with secure authentication) for IPython 3.2.1, use the master branch of the IPython-Jupyter notebook sample on ibmcws on IBM BlueMix DevOps Services. This sample enables the user to enter a password to log in to the IPython notebook GUI. While instructions on how to integrate the sample are included in the sample readme, this blog post provides detailed information on each of the steps.

Create the IPython notebook package

Download the following files:

ipython-3.2.1.tar.gz from https://pypi.python.org/pypi/ipython/3.2.1.
Anaconda2-2.4.1-Linux-x86_64.sh from https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh.

Download the scripts and the deployment.xml file from the master branch of the notebook sample.

git clone -b master https://hub.jazz.net/git/ibmcws/Conductor-IPython-Jupyter

cd Conductor-IPython-Jupyter/

You should see deployment.xml as follows:

cd Conductor-IPython-Jupyter/scripts/

You should see the following scripts:

3. Create the notebook package using the files from step 1 and 2. Place the files in folders using this structure:

packagename scripts <script_files> package ipython-3.2.1.tar.gz Anaconda2-2.4.1-Linux-x86_64.sh
deployment.xml

Make sure all the script files and the Anaconda2-2.4.1-Linux-x86_64.sh file have execution permission for all users and user groups.

Create the notebook package (for example):

tar czvf ipython.tar.gz deployment.xml scripts package

The ipython.tar.gz notebook package is now ready.

Add the IPython notebook package

Log in to the Platform Management Console.
Click Workload > Spark > Notebook Management.

In the Spark Notebook Management page, click Add.

Enter the required fields in the Deployment Settings tab as follows:

Name: Ipython
Version: 3.2.1
Package: Click Browse and select the ipython.tar.gz package that you created previously.
Prestart command: ./scripts/prestart_ipython.sh
Start command: ./scripts/start_ipython.sh
Stop command: ./scripts/stop_ipython.sh
Job Monitor command: ./scripts/jobMonitor.sh

NOTE: The IPython notebook sample does not support web monitoring; ensure that you do not select the Enable monitoring for the notebook option.

This IPython notebook supports Spark 1.4.1, Spark 1.5.2, and Spark 1.6.1. In the Environment Variables tab, specify the supported_spark_version variable to be one of the following options:

1.4.1
1.5.2
1.6.1
1.4.1, 1.5.2
1.4.1, 1.6.1
1.5.2, 1.6.1
1.4.1, 1.5.2, 1.6.1

Register and Deploy a Spark Instance Group with the IPython notebook

From the PMC, click Workload > Spark > Spark Instance Groups.

In the Spark Instance Groups page, click New and select the Spark version and the IPython notebook that you added. If required, define other settings for Spark and the IPython notebook by editing the default configuration. For more information, see Creating Spark instance groups in the IBM Knowledge Center.

Once you define the settings, click Automatically start deployment once the instance group is created and Create. After a while, the Spark instance group with the IPython-3.2.1 notebook is created and deployed.

Start the registered Spark Instance Group and assign users to the IPython notebook

Once the Spark instance group is created, click Continue to Instance Group. Then, select the registered Spark instance groups (sparkipython01 used in the example here); then click Start instance group.

In the Notebooks tab, click Assign users, select the IPython notebook and the user you want to assign to the notebook. Then, click Assign.

The IPython notebook with the cluster admin (Admin) as its user is started. The Admin user can assign other users to this IPython notebook. Notebooks with different users work separately for each. Remember, however, that within one Spark instance group, each user can only have one notebook of the same version.

Launch the IPython notebook

From the PMC, click Workload > Spark > Applications & Notebooks.
Under Launch a notebook, click the IPython notebook associated with the Spark instance group. For more information, see Launching notebooks in the IBM Knowledge Center.

The IPython notebook opens in a new tab.

Once the IPython notebook launches, enter the password assigned to this notebook user.
Click Log in.

Explore the IPython notebook

The IPython notebook supports over 40 programming languages, including those popular in Data Science such as Python, R, Julia, and Scala. Here we provide some usage examples:

# Python

body Code:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib

from matplotlib import pyplot as plt

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
plt.figure(); df.plot(); plt.legend(loc='best')

Here's a screenshot of input and output results when running the preceding code. NOTE: To run this code, your environment requires the matplotlib lib.

# scala

body Code:

from pyspark import SparkContext

sc = SparkContext()

raw_events=sc.textFile('/opt/ipython').map(lambda x: x.split('\t‘))

print (raw_events.first())

print (raw_events.take(5))

Here's a screenshot of input and output results when running the preceding code:

View the status of the IPython notebook service

You can monitor the status of a notebook service in two ways: from the PMC and from the command line.

From the PMC

From the Notebooks tab of a Spark instance groups, use the list of notebooks associated with the Spark instance group. The State column will tell you the status of the each notebook. For example:

From the command line

Each started notebook runs as a separate service in the cluster. A series of egosh service commands can be used to access the status of each notebook. For example:

egosh service list -ll|grep spark

We hope this blog helped get you get started with the IPython notebook sample in Platform Conductor. If you'd like to try out Platform Conductor, download an evaluation version from our Service Management Connect page. If you have any questions or require other notebook samples, post them in our forum!

#SpectrumComputingGroup

0 comments

1 view

Blogs

IPython-Jupyter notebook: Secure integration with Platform Conductor

By Archive User posted Thu April 14, 2016 10:00 AM

Permalink

Additional
Resources

Office

Quick Links

Blogs

IPython-Jupyter notebook: Secure integration with Platform Conductor

By Archive User posted Thu April 14, 2016 10:00 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources