Pandas is an open source library built for Python programming language. Pandas offers high-performance, easy- to-use data structure and data analysis tools.

Data analysis refers to process of evaluating big data sets using analytical and statistical tools so as to discover useful information and conclusion to support decision- making.
Data structure- Data structure refers to specialized way of storing data so as to apply a specific type of functionality on them. Two basic and universally popular data structure of Pandas are- Series and DataFrame .

For working with Pandas, we generally import both pandas and numpy modules/ libraries.

 import pandas as pd
 import numpy as np

Before importing pandas or numpy or any other module in pandas you need to install it in your computer from Internet. It is free of cost. The commant to install pandas is: pip install pandas
To install numpy in your copmputer give the following command in command prompt: pip install numpy

Why Pandas -

Pandas is the most popular library in the Python ecosystem for data analysis.
Pandas is capable of many tasks including:
It can read or write in many different formats (integer, float, double, etc)
It can calculate in all ways data is organized.
It can easily select subsets of data from bulky data sets and even combine multiple datasets together.
It has functionality to find and fill missing data.
It has functionality to find and fill missing data.
It allows us to apply operations to independent groups within data.
It supports advanced time-series functionality (Time series forecasting is the use of a model to predict future values based on previously observed values.)

In other words, Pandas is best at handling huge tabular data sets comprising different data formats.

Both Series and DataFrame can store Hetrogenious values/ data.

Creating Series Object-

A series type object can be created using pandas library's Series( ) function.

1. Creating empty Series Object by using just the Series() with no parameter-
To create an empty object(i.e., having no values)-
    obj3= pd.Series ( )

This will create an empty Series type object (obj3) with no values having default data type which is float64.

2. Creating non-empty Series Object-
To create an non-empty Series object, we need to specify arguments for data and indexes:
Syntax: <Series object>= pd.Series (data, index)
Data is the data part of the Series, it can be any of the following: A Python sequence, An NumPy array, A Python dictionary , A scalar value.
    obj2= pd.Series ( [3.5, 5.0, 4.5, 8.])
    obj1= pd.Series (range(5))


Specify data as a scalar value –

The data can be in form of single value or scalar value. But if data is a scalar value, then the index must be provided. There can be one or more entries in index sequence. The scalar value (given as data) will be repeated to match the length of index.
The index has to be a sequence of numbers or labels of any type.
>>> medalsWon = py.Series (10, index=range(0,1))
>>> medals2= py.Series (15, index=range(1,6,2))
>>> ser3= py.Series ("Hello India ", index =[' Indore','Delhi','Shimla '])

Additional Functionality

Specifying/Adding NaN values in a Series object-

We can use np.NaN (not a Value) to specify missing or empty value. Legal empty value NaN is defined in NumPy module and hence you can use np.NaN to specify a missing value.

Specifying index( es ) as well as data with Series( ) –

While creating Series type object we can also provide indexes along with values. Both values and indexes are sequences.

We can also skip the keyword data, we will get the same output.


Using a mathematical function/ expression to create data array in Series( ) -

The Series( ) allows us to define a function/ expression that can calculate for data sequence.

Accessing a Series Object and its Elements-

To Access Individual Elements of a Series object we have to specify Series object name followed by a dot"." and then the index no whose value we want to access..
<Series object name>. [<valid index>]
Example: obj7 [2]
obj3 [‘”Jan”]

We can even change indexes of a Series object by assigning new index array to its index attributes.

Syntax: <object name>.index= <new index array>

Eg ., ob1.index= [‘a', ‘b', ‘c', ‘d', ‘e']

The Series object's values can be modified but size cannot. So we can say that Series objects are value- mutable but size-immutable objects.

The head( ) and tail( ) functions-

The head() function is used to fetch first n rows from a pandas object and tail() function returns last n rows from a pandas object.
Syntax: <pandas object>.head([n])
<pandas object>.tail([n])

Eg .,
ob1. head(10)

If we do not provide any value for n, then head() and tail() will return first 5 and last 5 rows respectively of pandas object.

When you perform arithmetic operations on two Series type objects, the data is aligned on the basis of matching indexes (this is called Data Alignment in panda objects) and then performed arithmetic; for non-overlapping indexes, the arithmetic operations result as a NaN (Not a Number)

NaN represents missing data.

Filtering Entries-

We can filter out entries from a Series objects using expression that are of Boolean type.

Syntax- <Series Object> [<Boolean expression on series object>]

e.g ,. Ob1 > 34

When we apply a comparison operator directly on a panda Series object, then it works like vectorized operation and applies this check on each individual element of Series and returns True or False .
BUT when we apply this check with the series object inside [ ], it returns filtered result containing only the values that return True.

Difference between NumPy Arrays and Series Objects-

1. In case of ndarrays , we can perform vectorised operations only if the shape of two ndarrays matches, otherwise it returns an error. While in Series for non matching indexes, NaN is returned.

2. In ndarrays , the indexes are always numeric starting from 0 onwards, BUT Series objects can have any type of indexes, including numbers (not necessary starting from 0), letters, labels, string etc.

Aggregate finctions:
>>> obj21= pd.Series ([12,23,45,10,120])
>>> obj21
0 12
1 23
2 45
3 10
4 120

>>> sum(obj21)
>>> min(obj21)
>>> max(obj21)

Problem1: Object 1 population stored the details of population in four states and object2 AvgIncome stores the total average income reported in previous year in each of these metro . Calculate income per capita for each of these cities.

.Problem 2: Given a Series that stores the area of some states in Km 2. Write code to find the biggest and smallest three area from the given Series. Given Series has been created like this:

Ser1= pd.Series ([34567, 890,450,67892, 34677, 78902, 256711,678291, 637632, 25723,2367, 11769, 345, 25671])

Tea Or Coffee

Python Downloads

Reindexing and Label Alteration
Function Application –Python pandas
Bar Charts and Histograms and Quantile
Plottint with PyPlot-2


Ways to Keep Your Brain Healthy