python libraries for data science

python libraries for data science : In this tutorial, we are going to explain the use of python libraries for data science and application. The python programming language is used and behaves as the best tool for flexible and open-source language. The data science is flexible and used for data manipulation. It is very easy to learn for the beginners called as the data analyst. The data is stored to get the logical solution and various sectors like health care, finance, and other sectors. There are many to9ols for the data analytic such as R programming, SAS, SQL ad many more. The easy way to use the tool for data analytics is the python language which is popular. In the survey of stack overflow 2018 python language is the best language and known as a suitable language for the application of data science.

How python is suitable for data science:-

The python language has a unique attribute and easy to use which comes with quantitative and analytical computing.
Python language is widely used in various fields as signal processing, oil, and gas, etc.

The features of python over data science tools:-

1. Visualization and graphics:-

There are many visualization options present in the python language and graphics include ggplot, pandas, etc.

2. Choice of libraries:-

It will provide a database of libraries as AI and ML.

3. It is easy and powerful to use:-

As any beginner students can start to learn python language.

4. Scalability:-

Comparing with other languages python is considered as scalable and faster language.

Python libraries for data science task will include the following libraries:-

1. Python library for data cleaning and manipulation:-

PyOD
Spacy
Numpy

2. Python library for the data collection:-

Scrapy
Selenium
Beautiful soup

3. Python library for the data visualization:-

Seaborn
Matplotlib
Bokeh

4. Python library for modeling:-

Pytorch
TensorFlow
Scikit-learn

5. Python library for model Interpretability:-

H2O
Lime

6. Python library for Audio processing:-

Madmom
Librosa
pyAudioAnalysis

7. Python library for Image processing:-

Pillow
Scikit-Image
Open CV-python

8. Python library for deployment:-

Flask
PyBrain
Ploty
NLTK
Gensim
Scrapy
Statsmodels
Kivy
pyQt
openCV

9. Python library for database:-

SQLAlchemy
Psycopg

1. Python library for data cleaning and manipulation:-

The process after the collection of data is divines the data. Cleaning the messy data is done.
The four python libraries help us to do the process.

A) Pandas:-

The name pandas are derived from “panel data” and this will include the observation over multiple periods for the same individual.
The pandas feature is as follows:-

Data filtration is done.
Data set merging and joining is carried out.
The data size is reshaped.

The pandas are preinstalled in the libraries as follows,
pip install pandas
Example:-
data= {‘apples’: [3, 2, 0, 1],’oranges’: [0.3.7, 2]}
Purchase=pd.DataFrame (data) purchases
Output:-

	Apples	Oranges
0	3	0
1	2	3
2	0	7
3	1	2

B) PyOD:-

The PyOD library is used for the data compression and scalable process.
The python toolkit is used for detecting the outlying objects.
The detection and identification of the objects are carried out in the PyOD.
Installation,

pip install pyod

C) Spacy:-

The spacy is super and useful for natural language processing (NLP).
The spacy is a fast library than any other libraries in python programming.
They are also used for similar task in the language.
Features:-

It will allow the computation and spacy will provide the spate module to build and test the statistical mode.
It is used to apply on the complex, and contains multiple punctuations.
Spacy is fast and robust which supports many languages.

Installation is,
pip install –U spacy
python –m spacy download en

D) Numpy:-
They are similar to the python library and Numpy will bring the function which supports large, multidimensional arrays and matrices.
The high-level mathematical function us carried out and we work on these matrices and arrays.
This is the open-source library used and comes with preinstalled anaconda to work on.
pip install numpy
Example:-
import numpy as np
x=np.array ([1, 2, 3])
print(x)
y=np.arange (10)
print(y)
Output:-
[123]
[0 1 2 3 4 5 6 7 8 9]

2. Python library for the data collection:-

A) Scrapy:-
The useful library for web scraping and open-source framework.
It is used for extracting data required from the websites and is fast, simple to use.
Pip install scrapy.
It is used on a large scale for web scraping process also gives the tool which is needed,
The data extraction is carried out from websites and stores them in the structure of the format.
Example:-
Import scrapy
Class spider (scrappy.spider):
Name=’NAME’
Start_urls= [‘LINK’]
Def parse (self, response):
For title in response.css (‘.post-header>h2’):
Yield{‘title’:title.css (‘a: text’).get ()}
For next_page in response.css (‘a.next-posts-link’):
Yield response.follow (next_page, self.parse)

B) Selenium:-

The selenium is the tool used for automating the browsers and used for testing in It industries.
It is becoming popular nowadays after manual testing because it made testing easy.
The python script is automated using a web browser using the selenium tool.
It will give efficiency to extract the data and store in the proper format for future use.

C) Beautiful soup:-

The best way of collecting the data is by scraping the website and it will take effort and time.
So the Beautiful soup is XML and HTML parser used to create trees for pages which we use for extracting webpage’s.
The process of extracting data from web pages is called as web scraping.
Installation as,

pip install beautifulsoup4

Example:-
from bs4 import beautifulSoup
from urllib.request import urlopen
with urlopen (‘LINK’) as response:
soup=BeautifulSoup (response,’html.parser’)
for anchor in soup.find_all (‘a’):
print(anchor. get (‘href’,’/’))

3. Python library for data visualization:-

In the data visualization process hypothesis are checked and patterns are found.
There are three libraries for the data visualization as follows:-
A) Seaborn:-
It is the plotting library which is based on the python library that will provide high-level interface for drawing the graphs.
Features:-
The tool is used for choosing the color palettes and pattern in your data.
The data set is oriented from API and examines the relationship between multiple variables.
The views are convenient and the structure is a complex dataset.
Installation as,
pip install seaborn
Example:-
Import seaborn as sns
Sns.set()
Tips=sns.load_dataset (“tips”)
Sns.relplot(x=”total_bill”, y=”tip”, col=”time”, hue=”smoker”, style=”smoker”, size=”size”, data=tips);

python libraries for data science

B) Matplotlib:-

Most popularly used for the data visualization library in python programming.
The process of the library will allow generating and building the plots of all types.
The go-to library is used for exploring the data along with seaborn.
Installation as,
pip install matplotlib
Features:-
The matplotlib is easy to plot graphs and provide functions to choose the appropriate styles, font style and all.
It is created to help clear understanding of the trends and make the corrections.
The matplotlib contains the Pyplot module and provide interface the same as the MATLAB.
This is called the best feature of the matplotlib package.
Also provides the API module for the integration of graphs into applications using tools like the wxPython, Qt, etc.
Example:-
import matplotlib.pyplot as plt
from numpy.random import normal
x=normal (size=100)
plt.hist(x, bins=20)
plt.show ()

c.Bokeh:-

It is the visualization library that targets the modern browsers for the presentation. Also provide elegant construction for a large number of datasets. Installation as, pip install bokeh

4. Python libraries for modeling:-

A) Pytorch:-
The pytorch is based on the scientific package which is used as the replacement Numpy to use the power of GPUs.
The deep learning is used for research and provides speed and flexiblility.
Features:-

Cloud support which is supported on cloud platforms, and easy scaling through images.
The active community research and developers have built and libraries for extending.

B) TensorFlow:-
The tensorflow is popular in deep learning which helps to build and train models.
It will provide an easy model for building for machine learning and powerful and libraries.
Tensorflow is used for building and training the models using the high-level keras.
Feature:-

It is better for the computational graph visualization.
Also acts as computing and executing the complex models.
This will reduce error by 50 to 60 percentages in ML

C) Scikit-learn:-

The pandas are used for data manipulation and visualization and used for building the blocks.
The sckit-learn is used to build the scipy and matplotlib and open source, reusable in various contexts.
Scikit-learn will support the ML and regression, clustering, model selection.
It will support different operations and perform the ML.
Installation as,
pip install scikit-learn

5. Python libraries for model data interpretability:-

A) H2O:-
The H2O is driverless and the AI will offer simple data visualization and used for representing the behavior.
MLI is the Machine Learning Interpretability and clarify and effect of the model.\
B) LIME:-
The lime is algorithms that explain predictions and regresses.
Installation as,
pip install lime

6. Python libraries for Audio processing:-

The audio analysis and audio processing used to refer to extraction and meaning from the audio signals.
Also popular function in deep learning for an out for that.
A) Madmom:-
The name is funny but good audio data analysis python library.
Also called the audio signal processing library which is written in python that focuses on music information retrieval (MIR).
The installation of the Madmom needs the following lib:-

Numpy
Mido
Scipy
Cython

The packages needed for installation are:-

PyFftw
PyTest
PyAudio

Installation as,
pip install madmom

B) LibROSA:-
The python library we used for audio and music the analysis.
Also provides the blocks that create music information and the retrieval system.
C) PyAudio analysis:-
This is python library used for the extraction and segmentation of the python language.
It will cover the wide range of the audio analysis task as,

Detection of the events and unsupervised segmentation.
Classification of the unknown sounds.
Extraction of the audio thumbnails and much more.

The installation will include as,
pip install pyAudioAnalysis

7. Python library for image processing:-

The knowledge of learning to work with the image data is important. The image processing is growing faster with the collection of data. The image processing will contain the 3 libraries as,
A) Pillow:-
The pillow is also called as PIL (Python Image library).
The is Pillow derived from the PIL and is replaced by the original PIL in linux.
The process for manipulation is as follows:-

Adding the text to the images and other.
The transparency handling and masking techniques.
The image enhancement as brightness, contrast of the color.
The process of image filtering means smoothing or edge finding.

Installation as,
pip install Pillow
B) Scikit-image:-
The python library for image processing is the scikit image.
This library is the collection of algorithms used for performing the multiple and diverse image processing tasks.
The use of image segmentation, transformation, analysis, feature detection is done here.
The python packages used for image processing are as follows:-

Joblib(.=0.11)
Python(>=3.5)
Scipy(>=0.17)
Numpy(>=0.11)

Installation is as,
pip install –U scikit-learn

C) OpenCV-Python:-
In the image processing techniques, the OpenCV is the library in python language for image processing and combines the best quality of openCV API and python language.
This library is designed to solve the computer vision problems and uses the Numpy arrays.
The library will make the integration easy and use Numpy as Matplotlib and Scipy.
Installation is as,
pip3 install opencv-python

8. Python libraries for deployment:-

A) Flask:-
It is the web framework written in python which is popularly used for deploying of data science models.
It has the composition as follows:-

Jinja is the template engine for the python language.
Werkzeug is also called as utility library for python programming language.

Example:-
From flask import Flask
App=Flask(_name_)
@app.route (“/”)
Def hello:
Return”Hello World!”
If _name_==”_main_”:
App.run()
B) PyBrain:-
It is the powerful modular ML library that is available in python language.
The PyBrain means the Python Based Reinforcement Learning, Neutral Network, and ML.
The pyBrain is used for the entry-level data scientist who offers algorithms for the research.
The tool is used for the development across neutral networks in the kernel.
C) Plotly:-

The famous web-based framework for the data scientist is the plotly library.
The toolbox will be used for designing and visualization and supports the programming languages.
The use of ployly in the model is done by setting the API keys.
Then the graphics are processed on server side and execution will appear on the server side.
Features:-
The ploty API can create the public and private boards that are created using the text and web images.
The visualization us created by using the ploty and becomes easy to access the platforms like MATLAB, Julia, etc.

D) NTLK:-

It is called as Natural Language Toolkit library which is helpful for Natural language processing tasks.
Using the NLTK we can perform many operations as the steaming, text tagging, and semantic reasoning and AI tasks.
The large work will need the analysis and automation that will make the task easy with NLTK.
Features:-
It will provide data and the text processing methods for the tokenization, stemming and semantic reasoning for text analysis.
The compression guide is present that will describe the computational and complete API documentation.
E) Gensim:-
The Gensim is a python based library that is open source and allow the modeling and apace vector computation.
It is very compatible with texts and in memory processing tasks.
Gensium use the Scipy and Numpy modules for provide efficient way to handle the enviourement.
F) Scrapy:-
Meaning of the scrappy is spider bots and this library is important for the programs and retrieving the data from web applications.
It is open source library in python programming and designed for the scraping purpose.
The framework with collection of data through API.
Also created across the Spider class and contains the instruction for a crawler.

G) Statsmodels:-

Statsmodel library is used for providing the data exploration modules and multiple methods to perform the statistical analysis.
It consists of the plotting function used for the analysis and achieves high performance of the datasets.
Features:-
It is the best library for the statistical tests and the tests which are in Scipy and Numpy.
This will provide formulas for better statistical analysis.
Used to implement GLM means Generalized Linear Model and OLM as ordinary least-square linear regression.
The statistical testing will include hypothesis testing done by the statsModel library.
H) Kivy:-
It is open source library that will provide the natural interface which can be accessed over the ios, Linux, and windows.
The library is used for building the mobile apps and multitouch applications.
The kivy is used for creation of the custom gadgets.
I) PyQt:-
The PyQt is also called as the toolkit for GUI platform and implemented as python plug-in.
It is the free application under the GNU public license.
J) OpenCV:-
This library is designed for deriving the growth of real-time application development.
Basically, it is used for creating the Intel and is free for anyone.
The openCV includes the 2D and 3D features for applications like motion tracking, mobile robotics, SFM, AR, etc.
It is written in C++ and provides binding in python and octave.

Pandas
Numpy
Scipy
TensorFlow
Matplotlib

8. Python libraries for Database:-

The database is used to store and retrieve the data also have skill for any data scientist.
Following are the two libraries for the database is given below.
A) Psycopg:-
This library is popular for the PostgreSQL means the advanced open-source relational database for the python programming language.
Psycopg has specifications and supports,

Python 3 version from 3.4 to 3.7
Python 2.7
PostgresSQL client library from 9.1
PostgresSQL server library from 7.4 to 11

Installation as,
pip install psycopg2
B) SQLAlchemy:-
It is also called as the database language.
SQL is the toolkit from which the object is given to the application developer with full power and flexibility.
This library is designed for high-performance data access and collection of tables.
Installation is,
pip install SQLAlchemy