STAY WITH US

57. [Hindi]Machine Learning : .between() Method in Pandas| 2018 |Python 3


.between() Method in Pandas




57. [Hindi]Machine Learning : .between() Method in Pandas| 2018 |Python 3


                   All right in this lesson I'll introduce the .between() method, which like many of the method so far is called directly on a series and helps to generate a brand new Boolean series. And as you might guess the between method is helpful when we want to find values that fall between a range or within a specific range whether it's HAVE times or dates. So let's take a look at how this could work on several different columns within our data frame. So let's execute our code to import our employees as we and let's say I wanted to pull out all of the employees who had a salary between 60000 and 70000 inclusive.

      So of course we could do this by writing two separate statements. We could write one book series where we do greater than or equal to sixty thousand one billion series or we have less than or equal to 70000 use that ampersand that and symbol within our square brackets to pull both of those conditions. But the between method makes it a lot simpler. So once again let's begin by extracting a series like salary. And as I mentioned I want to pull in the employees who have a salary between 60000 and 70000. So now I can call the between method directly on my series. It is a method so it does take parentheses and this is going to be slightly different than our previous methods because this will take two arguments. The first is going to be the lower bound. And the second is going to be the upper bound.

      So I want the value to be between 60000 thousand was my first argument. Karma 70000 which is my second argument. And both of these values are inclusive. What that means is if there is a salary that's exactly 60000 or exactly 70000 It will be included. So between 60000 and 70000 inclusive basically the equivalent of greater than or equal to combined with less than or equal to. So I want to execute this. I am going to get a brand new Boeing series of trues and false. Let's just spend time by not assigning this to a variable let's directly put it within our square brackets after our data frame. And now we've extracted the salaries from the data frame that fall between 60000 and 70000. As I scroll down you can see that in fact all of these numbers fall within that range. Pretty simple lets collapses collapse this and repeat the process with our bonus column which consists of floating points.

        Let's say I want to pull the rows are those employees that have a bonus between 2 and 5 percent. Once again I begin by extracting my series and there I have my series of floating point values representing the bonus percentages. Once again I'm going to call the between method and then open my princes and provide a lower bound. In this case it's going to be 2.00 and then an upper bound which is where it ends. I'm going to provide 5.00 when I execute this I'll get a brilliant series. True if it falls within that range false if it does not. And once again if I pass this within square brackets after my data frame which is stored in D.S. there I have all of the rows from my data frame where the bonus falls within that three point range.

         As I scroll down you can see this is true what's really great about this is it also works on dates and times like we have in the start date and last login time column and there's two things I want to mention here. First this is one of the big advantages of converting our column from a string to a date time is operations like this become possible. So whenever we have something like a string if this was being stored like a string it would have no way of knowing what is between what is stored and what is. And now because it's a daytime object Pandas knows how to interpret it and is able to understand between as the start point of the date range and the end date of the date range. So once again the process is the exact same. Let's say I want to pull out all of the employees who started in my year of birth which is 1991 I can extract the start date column and I can call the between method directly on it. My first argument is going to my lower bound which is my starting point which in this case is the start date. So I'm going to put the first of January 1991 and my second argument is going to be my end date where I want to go up to I want to go up to the first day of 1982 which is January 1st 1992. So now I have a bullion series. True if it falls within 1091 falsify it does not.

        I'm going to pass that to a billion series and now I have all of the employees that have a start date within 1991 you can see all of these years or 1991 as I scroll down and I'm going to collapse this into one last example and this just demonstrates a point or rather repeats a point that I mentioned a little bit earlier which is that the values in the last login time column for you will defer because you're going to be watching it on video on a different day a day in the future hopefully unless you've time
traveled. And in that case your date here is always going to be present day. So my my dates here may fluctuate between lessons and they will certainly be different from yours. But the most important thing here to keep track of is the time on the right and that will be consistent for all of us and including on your end if you're following along. So once again let's begin by extracting the series that we want to do a comparison on. It's going to be the last log in time series. There I just use tab completion to fill it out.

        There is my Boolean series outputted below. And let's say I want to find the people who logged into the system between 8:30 a.m. and noon which is 12:00 p.m. I once called the between method opened my parentheses. The very first argument is the start point. So I want to start at 8:30 in the morning and I can write it out really in any way I want. It's kind of amazing how flexible Pandas is is so you can certainly write it out in military form and as you see below. But I can also do something like eight thirty a.m. so literally as a string and Pandas will figure out what I'm talking about then place a comma and then for the second argument this is the end point. Let's say I place 12 p.m. or noon. Now I have a Boolean series I can pass this in within square brackets after my data frame name which is DMF. And now you can see all the people who logged in between 8:30 in the morning and noon you can see as I scroll down that these values are all falling between those three and a half hours. So that's the between method accepts two arguments the start point and the End Point can operate on a Boolean series of many different data types. In this lesson we looked at integers Lot's dates and times and of course it returns true if the value in the series falls between that point and a false. That does not use that Boolean series that results as a soldier series just like we have with every other series that we've introduced. So that's the between method on a panel series.

Code Download Link : ML57

Code : 

#!/usr/bin/env python
# coding: utf-8

# In[3]:



import pandas as pd


df = pd.read_csv("employees.csv", parse_dates=["Start Date","Last Login Time"])

df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head()


# In[5]:



df["Salary"].between(60000, 70000)


df[df["Salary"].between(60000, 70000)]



# In[7]:



df["Bonus %"].between(2.0, 5.0)


df[df["Bonus %"].between(2.0, 5.0)]



# In[9]:



df["Start Date"].between("1991-01-01","1992-01-01")


df[df["Start Date"].between("1991-01-01","1992-01-01")]



# In[12]:



df["Last Login Time"].between("8:30AM", "12:00PM") 


df[df["Last Login Time"].between("8:30AM", "12:00PM")]




YouTube Video :




Post a Comment

0 Comments