Python packages effectively simplify many important processes such as analyzing and visualizing data, retrieving unstructured data from the web, image processing, building machine learning models, and textual information. Here are some the most important and popular libraries and packages in Python:
1- Pandas Pandas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool built on the Python programming language. Pandas is being used for for data wrangling and analysis and provides simple ways for cleaning, manipulating, and transforming data. If you are dealing with large amount of data, Pandas make it easier to work with them. Top features in Pandas can be categorized as:
- Explore & Analyze data speedily
- Read various file formats
- Cleaning the data
- Manipulating the data
Pandas works with Data Frame objects; A Pandas Data Frame is a 2 Dimensional Data Structure where the data is stored in a tabular manner in the form of rows & columns.
Some companies uses Pandas as a recommendation systems, such as Netflix that uses their large collection of data about their customers’ preferences to provide suggestion to their users. Amazon performs extensive data analysis to create powerful recommendation systems. YouTube uses data analysis to recommend videos to their users. Pandas is also used in various domains such as Healthcare to assess the risk of chronic diseases and cancer, Energy Sector to improve performance and reduce maintenance cost by predicting device failures, Ecommerce organisations use Pandas for customer segmentation, nowadays companies analyze customer data to provide personalized ads, Airline operators analyze their customer behavior for cost cutting, Stock markets are using Pandas to understand market activities.
2- NumPyNumPy is an open source library that contains multidimensional arrays. The NumPy ndarray can be used to store data in a homogeneous n dimensional array object. NumPy is used in industry to compute arrays, for example, the data of a colored image is stored in a 3D matrix containing 1000 pixels. To manipulate those images, we need to operate on those pixels. NumPy is very useful in this scenario. NumPy is also used by advanced Python libraries like Pandas and SciPy.
NumPy is more efficient than Python’s Listin terms of:
It provides a lot of built in functions like mathematical functions, linear algebra, random sampling, etc. Indexing and Slicing are used to access a subset of the data. Indexing in NumPy is identical to Python’s Indexing Scheme.
3- Scikit-learnScikit-learn is a machine learning library for the python programming language. After cleaning and manipulating your data with Panda or NumPy, scikit-learn is used to build machine learning models, as it has thousands of tools used for modeling and predictive analysis. There are several types of machine learning models that can be built using scikit-learn, namely; supervised and unsupervised learning, cross-validate the accuracy of models, and conduct feature importance. It has various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN and is designed to interact with NumPy and SciPy Python numerical and scientific libraries.
4- MatplotlibMatplotlib is the most popular library to explore and visualize data. You can use it to create basic charts such as line charts, scatter plots, histograms, bar charts and pie charts. Matplotlib is the foundation of any other visual library. It is a plotting library for the Python programming language and NumPy numerical extension. It enables decision makers to see patterns, trends, and correlations that might go undetected in text based data.
- A Histogram gives the distribution of data. Used to visualize continuous data such as Sales of a company segregated by month
- A Box Plot gives a summary of one or more numerical variables
- A Bar Plot or Bar Chart is a plot that shows the relationship between a numerical variable and a categorical variable
- A Scatter Plot is used to analyze the relationship between two numerical variables graphically. It is also handy in detecting the outliers in the dataset
Source: https://matplotlib.org/
5- PlotlyThe Python plotly package is an open source library built on plotly
javascript (plotly.js) which in turn is built on d3.js. Plotly is definitely an essential tool for creating visualization because it is powerful and easy-to-use library that able to interact with visualizations.
There are basically two ways to create figures with plotly.py:
- Figures as dictionaries
- Figures as graph objects
The
plotly.express module has functions that can create whole figures at once and is called PX. Plotly Express is an internal part of the graph library and is the recommended starting point for creating most figures. Each Plotly Express function uses graph objects internally and returns the plotly.graph_objects.Figure instance.
Along with Plotly is Dash. Plotly develops Dash and also provides a platform for writing and deploying Dash applications in an enterprise environment. Dash is a tool that allows you to create dynamic dashboards using Plotly visualizations.
6- SeabornSeaborn is built on top of Matplotlib. Used for drawing attractive and informative statistical graphics. Seaborn has a close Integration with the Pandas DataFrame and also it is a specialize support for category variables to show observations. Seaborn has tools for selecting color palettes that reveal hidden patterns in the data. Top features of why we use Seaborn:
- Functionality : Uses fewer syntax and has easy and interesting default themes.
- Flexibility: Provides most used default themes
- Handling Multiple Figures: Automates the creation of Multiple figures that may lead to Out Of Memory Issues
Matplotlib vs Seaborn vs Plotly
Source: Edureka
7-TensorFlowTensorFlow is a free, open source library for machine learning in Python. It can be used in a wide range of tasks, but has a special focus on training and inferring deep neural networks. It uses multidimensional arrays, also known as tensors, which allow it to perform multiple operations on a particular input. TensorBoard is alos a feature which comes with tensorflow , that helps you to visualize graphs and learning of the model. This helps in understanding nodes of the model and debug it to make it better. Graph Dashboard, is a powerful tool to examine the tensorflow model as well as gives, quick view of model’s structure and design.
TensorFlow APIs are hierarchically arranged, and high-level APIs are built on low-level APIs. Machine learning researchers use low-level APIs to create and discover new machine learning algorithms. In this class, you will use a high-level API called tf.keras to define and train machine learning and forecasting models. tf.keras is a TensorFlow variant of the Keras open source API.
The following figure shows the TensorFlow toolkit:
Source: Google’s Intro to TensorFlow8- Keras
Keras is a deep learning API written in Python and runs on top of the TensorFlow machine learning platform. It was developed with a focus on the possibility of rapid experiments. Keras are mainly used to create deep learning models, especially neural networks.
Are you an engineer or data scientist? Do you ship reliable and performant applied machine learning solutions? Check out Introduction to Keras for engineers.
Are you a machine learning researcher? Do you publish at NeurIPS and push the state-of-the-art in CV and NLP? Check out Introduction to Keras for researchers.
Are you a beginner looking for both an introduction to machine learning and an introduction to Keras and TensorFlow? You're going to need more than a one-pager. And you're in luck: we've got just the book for you.
9- StreamlitStreamlit is an Open source Python library that makes it easy to build beautiful custom Web Apps for Data Science & Machine Learning. Streamlit’s officially supported environment manager on Windows is Anaconda Navigator. Steamlit is an interactive graphical display of data used to understand analytical results and extract useful insights. It is the fastest way to build data apps (Interactive Dashboards). It supports many visualization libraries such as Matplotlib, Seaborn, Plotly
Express, Bokeh, and many others. Many Widgets such as Select Box, Slider, etc are available in Streamlit.
top features of Streamlit:
- Improves Decision Making
- Better Customer Experience
- Faster Analysis
- Increased organizational efficiency
10- Bokeh
Bokeh is the simple, interactive and powerful open source library in Python. Bokeh presents its basic grid and row / column layouts that make it quick to get started.When you need soft and responsive dashboards, you can embed bokeh designs and widgets in popular formats.
Bokeh offers a variety of methods to embed its content in web pages: server_document
for deployed Bokeh server applications, or json_items
and components
for standalone Bokeh output.
11- SciPySciPy in Python is an open source library used to solve math, science, engineering and technical problems. This allows users to manipulate data and visualize data using a wide range of Python commands. SciPy is based on the Python NumPy extension. SciPy extends NumPy and provides additional tools for array computing and provides specialized data structures such as scatter matrices and subsequent k dimensional trees. SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.
Some of the applications which make SciPy important are:
- Multi-dimensional image processing
- Ability to solve Fourier transforms, and differential equations
- Due to its optimized algorithms, it can do linear algebra computations very robustly and efficiently
12- OpenCvOpenCV is an open source Python library that focuses primarily on real-time computer vision. OpenCV has a modular structure, meaning that the package consists of several shared or static libraries. The following modules are available:
Core functionality (core) , Image Processing (imgproc) , Video Analysis (video), Camera Calibration and 3D Reconstruction (calib3d), 2D Features Framework (features2d), Object Detection (objdetect), High-level GUI (highgui), and Video I/O (videoio).
All the OpenCV classes and functions are placed into the cv
namespace. Therefore, to access this functionality from your code, use the cv::
specifier or using namespace cv;
directive.
.............................................................................................
Samira GholizadehPhD in Mechanical EngineeringMaster of Artificial Intelligence and Machine Learning