python libraries for data science : In this tutorial, we are going to explain the use of python libraries for data science and application. The python programming language is used and behaves as the best tool for flexible and open-source language. The data science is flexible and used for data manipulation. It is very easy to learn for the beginners called as the data analyst. The data is stored to get the logical solution and various sectors like health care, finance, and other sectors. There are many to9ols for the data analytic such as R programming, SAS, SQL ad many more. The easy way to use the tool for data analytics is the python language which is popular. In the survey of stack overflow 2018 python language is the best language and known as a suitable language for the application of data science.
The python language has a unique attribute and easy to use which comes with quantitative and analytical computing.
Python language is widely used in various fields as signal processing, oil, and gas, etc.
The features of python over data science tools:-
1. Visualization and graphics:-
There are many visualization options present in the python language and graphics include ggplot, pandas, etc.
2. Choice of libraries:-
It will provide a database of libraries as AI and ML.
3. It is easy and powerful to use:-
As any beginner students can start to learn python language.
4. Scalability:-
Comparing with other languages python is considered as scalable and faster language.
1. Python library for data cleaning and manipulation:-
2. Python library for the data collection:-
3. Python library for the data visualization:-
4. Python library for modeling:-
5. Python library for model Interpretability:-
6. Python library for Audio processing:-
7. Python library for Image processing:-
8. Python library for deployment:-
9. Python library for database:-
The process after the collection of data is divines the data. Cleaning the messy data is done.
The four python libraries help us to do the process.
The name pandas are derived from “panel data” and this will include the observation over multiple periods for the same individual.
The pandas feature is as follows:-
The pandas are preinstalled in the libraries as follows,
pip install pandas
Example:-
data= {‘apples’: [3, 2, 0, 1],’oranges’: [0.3.7, 2]}
Purchase=pd.DataFrame (data) purchases
Output:-
|
Apples |
Oranges |
0 |
3 |
0 |
1 |
2 |
3 |
2 |
0 |
7 |
3 |
1 |
2 |
The PyOD library is used for the data compression and scalable process.
The python toolkit is used for detecting the outlying objects.
The detection and identification of the objects are carried out in the PyOD.
Installation,
pip install pyod
C) Spacy:-
The spacy is super and useful for natural language processing (NLP).
The spacy is a fast library than any other libraries in python programming.
They are also used for similar task in the language.
Features:-
Installation is,
pip install –U spacy
python –m spacy download en
D) Numpy:-
They are similar to the python library and Numpy will bring the function which supports large, multidimensional arrays and matrices.
The high-level mathematical function us carried out and we work on these matrices and arrays.
This is the open-source library used and comes with preinstalled anaconda to work on.
pip install numpy
Example:-
import numpy as np
x=np.array ([1, 2, 3])
print(x)
y=np.arange (10)
print(y)
Output:-
[123]
[0 1 2 3 4 5 6 7 8 9]
A) Scrapy:-
The useful library for web scraping and open-source framework.
It is used for extracting data required from the websites and is fast, simple to use.
Pip install scrapy.
It is used on a large scale for web scraping process also gives the tool which is needed,
The data extraction is carried out from websites and stores them in the structure of the format.
Example:-
Import scrapy
Class spider (scrappy.spider):
Name=’NAME’
Start_urls= [‘LINK’]
Def parse (self, response):
For title in response.css (‘.post-header>h2’):
Yield{‘title’:title.css (‘a: text’).get ()}
For next_page in response.css (‘a.next-posts-link’):
Yield response.follow (next_page, self.parse)
B) Selenium:-
The selenium is the tool used for automating the browsers and used for testing in It industries.
It is becoming popular nowadays after manual testing because it made testing easy.
The python script is automated using a web browser using the selenium tool.
It will give efficiency to extract the data and store in the proper format for future use.
pip install beautifulsoup4
Example:-
from bs4 import beautifulSoup
from urllib.request import urlopen
with urlopen (‘LINK’) as response:
soup=BeautifulSoup (response,’html.parser’)
for anchor in soup.find_all (‘a’):
print(anchor. get (‘href’,’/’))
In the data visualization process hypothesis are checked and patterns are found.
There are three libraries for the data visualization as follows:-
A) Seaborn:-
It is the plotting library which is based on the python library that will provide high-level interface for drawing the graphs.
Features:-
The tool is used for choosing the color palettes and pattern in your data.
The data set is oriented from API and examines the relationship between multiple variables.
The views are convenient and the structure is a complex dataset.
Installation as,
pip install seaborn
Example:-
Import seaborn as sns
Sns.set()
Tips=sns.load_dataset (“tips”)
Sns.relplot(x=”total_bill”, y=”tip”, col=”time”, hue=”smoker”, style=”smoker”, size=”size”, data=tips);
B) Matplotlib:-
Most popularly used for the data visualization library in python programming.It is the visualization library that targets the modern browsers for the presentation. Also provide elegant construction for a large number of datasets. Installation as, pip install bokeh
A) Pytorch:-
The pytorch is based on the scientific package which is used as the replacement Numpy to use the power of GPUs.
The deep learning is used for research and provides speed and flexiblility.
Features:-
B) TensorFlow:-
The tensorflow is popular in deep learning which helps to build and train models.
It will provide an easy model for building for machine learning and powerful and libraries.
Tensorflow is used for building and training the models using the high-level keras.
Feature:-
The pandas are used for data manipulation and visualization and used for building the blocks.
The sckit-learn is used to build the scipy and matplotlib and open source, reusable in various contexts.
Scikit-learn will support the ML and regression, clustering, model selection.
It will support different operations and perform the ML.
Installation as,
pip install scikit-learn
A) H2O:-
The H2O is driverless and the AI will offer simple data visualization and used for representing the behavior.
MLI is the Machine Learning Interpretability and clarify and effect of the model.\
B) LIME:-
The lime is algorithms that explain predictions and regresses.
Installation as,
pip install lime
The audio analysis and audio processing used to refer to extraction and meaning from the audio signals.
Also popular function in deep learning for an out for that.
A) Madmom:-
The name is funny but good audio data analysis python library.
Also called the audio signal processing library which is written in python that focuses on music information retrieval (MIR).
The installation of the Madmom needs the following lib:-
The packages needed for installation are:-
Installation as,
pip install madmom
B) LibROSA:-
The python library we used for audio and music the analysis.
Also provides the blocks that create music information and the retrieval system.
C) PyAudio analysis:-
This is python library used for the extraction and segmentation of the python language.
It will cover the wide range of the audio analysis task as,
The installation will include as,
pip install pyAudioAnalysis
The knowledge of learning to work with the image data is important.
The image processing is growing faster with the collection of data.
The image processing will contain the 3 libraries as,
A) Pillow:-
The pillow is also called as PIL (Python Image library).
The is Pillow derived from the PIL and is replaced by the original PIL in linux.
The process for manipulation is as follows:-
Installation as,
pip install Pillow
B) Scikit-image:-
The python library for image processing is the scikit image.
This library is the collection of algorithms used for performing the multiple and diverse image processing tasks.
The use of image segmentation, transformation, analysis, feature detection is done here.
The python packages used for image processing are as follows:-
Installation is as,
pip install –U scikit-learn
C) OpenCV-Python:-
In the image processing techniques, the OpenCV is the library in python language for image processing and combines the best quality of openCV API and python language.
This library is designed to solve the computer vision problems and uses the Numpy arrays.
The library will make the integration easy and use Numpy as Matplotlib and Scipy.
Installation is as,
pip3 install opencv-python
Example:-
From flask import Flask
App=Flask(_name_)
@app.route (“/”)
Def hello:
Return”Hello World!”
If _name_==”_main_”:
App.run()
B) PyBrain:-
It is the powerful modular ML library that is available in python language.
The PyBrain means the Python Based Reinforcement Learning, Neutral Network, and ML.
The pyBrain is used for the entry-level data scientist who offers algorithms for the research.
The tool is used for the development across neutral networks in the kernel.
C) Plotly:-
The famous web-based framework for the data scientist is the plotly library.
The toolbox will be used for designing and visualization and supports the programming languages.
The use of ployly in the model is done by setting the API keys.
Then the graphics are processed on server side and execution will appear on the server side.
Features:-
The ploty API can create the public and private boards that are created using the text and web images.
The visualization us created by using the ploty and becomes easy to access the platforms like MATLAB, Julia, etc.
It is called as Natural Language Toolkit library which is helpful for Natural language processing tasks.
Using the NLTK we can perform many operations as the steaming, text tagging, and semantic reasoning and AI tasks.
The large work will need the analysis and automation that will make the task easy with NLTK.
Features:-
It will provide data and the text processing methods for the tokenization, stemming and semantic reasoning for text analysis.
The compression guide is present that will describe the computational and complete API documentation.
E) Gensim:-
The Gensim is a python based library that is open source and allow the modeling and apace vector computation.
It is very compatible with texts and in memory processing tasks.
Gensium use the Scipy and Numpy modules for provide efficient way to handle the enviourement.
F) Scrapy:-
Meaning of the scrappy is spider bots and this library is important for the programs and retrieving the data from web applications.
It is open source library in python programming and designed for the scraping purpose.
The framework with collection of data through API.
Also created across the Spider class and contains the instruction for a crawler.
Statsmodel library is used for providing the data exploration modules and multiple methods to perform the statistical analysis.
It consists of the plotting function used for the analysis and achieves high performance of the datasets.
Features:-
It is the best library for the statistical tests and the tests which are in Scipy and Numpy.
This will provide formulas for better statistical analysis.
Used to implement GLM means Generalized Linear Model and OLM as ordinary least-square linear regression.
The statistical testing will include hypothesis testing done by the statsModel library.
H) Kivy:-
It is open source library that will provide the natural interface which can be accessed over the ios, Linux, and windows.
The library is used for building the mobile apps and multitouch applications.
The kivy is used for creation of the custom gadgets.
I) PyQt:-
The PyQt is also called as the toolkit for GUI platform and implemented as python plug-in.
It is the free application under the GNU public license.
J) OpenCV:-
This library is designed for deriving the growth of real-time application development.
Basically, it is used for creating the Intel and is free for anyone.
The openCV includes the 2D and 3D features for applications like motion tracking, mobile robotics, SFM, AR, etc.
It is written in C++ and provides binding in python and octave.
The database is used to store and retrieve the data also have skill for any data scientist.
Following are the two libraries for the database is given below.
A) Psycopg:-
This library is popular for the PostgreSQL means the advanced open-source relational database for the python programming language.
Psycopg has specifications and supports,
Installation as,
pip install psycopg2
B) SQLAlchemy:-
It is also called as the database language.
SQL is the toolkit from which the object is given to the application developer with full power and flexibility.
This library is designed for high-performance data access and collection of tables.
Installation is,
pip install SQLAlchemy