STAY WITH US

61. [Hindi]Machine Learning : Retrieve Row Values Using loc in Pandas| 2018 |Python 3

   
Retrieve Row Values Using loc in Pandas



61. [Hindi]Machine Learning : Retrieve Row Values Using loc in Pandas| 2018 |Python 3

      All right in this lesson we'll explore how to retrieve rows from our data frame using the index labels and we'll be using something called the Loc method to do that. I'll begin by executing our code to import our James Bond data set. And I'm actually going to add two tweaks here. The first is adding that index call parameter which is index column and specifying that I want to use the film column to the left as my index labels. So that's what my data frame looks like currently. And there's one additional operation I'd like to do which is sorting the index alphabetically. To do that I can call these sorts index method and make sure I add that in place equals true parameter to make it permanent. And that's going to give me a data frame where the index labels which represent the film names are sorted in order.
     
        Now the impact of this isn't really going to be dramatically felt when we're dealing with a data frame of this size. And by that I mean 25 But whenever you're dealing with something much larger for example 25 million rows sorting an index can greatly accelerate the extraction process. The reason for this is because pandas can think more logically when it wants to find the specific label that it's looking for. If a label or rather an index labels are unsorted it's basically going to have to go through all of them and check them all one by one to see if they match your query. In comparison when a index is sorted Pandas knows that everything is structured in a specific sequence and it can jump around specifically within that index and find the exact value that it's looking for. Much quicker. Now there is much more complicated technical stuff going on behind the scenes but just from our user perspective for our own sake of speed and having things as quickly and efficiently as possible I strongly recommend sorting the index of your data frame if you're using something like strings as we are here. That's just the best practice to stick to.

           So let's go on right ahead and talk about the Loc method the Loc method is called directly on our data frame. So it's going to be bond that Loc. And this is one of those rare examples where a method actually does not take parentheses it's actually going to take square brackets. I think the reason that they did this is to maintain consistency. We know that square brackets are usually used for some kind of extraction process when we use them on a series we can extract a row and when we use them on a data frame we can use them to extract columns. So I think they wanted to maintain that consistency. But since the regular brackets syntax already is used for columns on a data frame we have to use the dot Loc method to extract rows. So let's say we want to pull information for the film called Goldfinger. I'm just going to enter it in double quotes here. It's going to look for that index label right here whenever it finds it it's going to return that Rose information and if it's a single value like this panderers is actually going to return a series. And if we take a look at the series we can see that the column headers from our data frame right here have become the index labels of our brand new series. We can see them to the left right here and to the right we're actually going to have the values for that row. So for the row where the label is Goldfinger the year value is 1964 the actor value is Sean Connery and so on.

        So that's a single operation. Similarly if we want to extract another film let's say Goldeneye. It's just going to be the Loc method pair of square brackets. And the name of the index label in double quotes. Now a few notes here. If the film that we cite or in this case the label that we cite does not exist for example sacred bond is not actually a real Bond film. There's going to be an error in that case Panadas is going to be unable to find that key among the index labels. It's going to give you this ugly blob of text. So I'm just going to comment this out to ensure that we don't have any errors. And move on. And in addition if we have an index label that's represented more than once for example Casino Royale and that case it's going to return all rows that fit that condition. So unlike Python dictionaries Panda's data frames do not have to have strictly unique values in the index. As I mentioned way back when. This is one of the great features of PANDAS is its flexibility. It's a lot more open to these kinds of things. So whenever you enter a label name to the Loc method that is duplicated it's just going to return all of the rows that fit that label. In this case because we have more than one row the end product is going to be a brand new data frame rather than a series. Let's also talk a little bit about extracting sequential values or values in a row. Let's say I want to extract with the Loc method all films between diamonds are forever I think was in 1971 and Moonraker so I'm going to put that colon in the middle which is the same Python list syntax for extracting sequential values. And you can see that we're going to get all these films from diamonds are forever at the beginning and Moonraker at the end.

         Now what's really interesting here is that the Moonraker part here is included. It's inclusive.

So that last value is part of our data frame. Many times when we're working on with a python list for example that last value is going to be Exclusive. So it's going to go up to that value but not include. Whenever we're working with strings as our index labels the end value is included. Similarly we can do something like extract every movie after Goldeneye. We can do that by doing Goldeneye and then using a colon which indicates go to the end of the data set. So here we are going to have Goldeneye at the very beginning and we're going to proceed all the way to the end of our data set. And the reason that you see all these different movies is because we have it sorted here by alphabetical order so it's going to start at that G and proceed all the way downwards. Similarly if I want to extract every movie from the beginning of my data frame to let's say on her majesty's secret service map mysteries on her Majesty's Secret Service see if that works similarly that's going to extract all the movies starting from a. Up until we get to on her majesty's secret service which is going to be the last movie here.
And in addition we can also extract multiple non-sequential values. So for example if I wanted to extract two movies that are not next to each other I can once again use my Loc method with my present or not parentheses with my square brackets to extract. And then within the square brackets I have to give another pair of square brackets which is going to represent the list of movies that I want to remove. So let's say I want to do Moonraker and Octopussy everybody loves Octopussy if I do that we can see that we are going to pull those movies out of the data frame those two index labels and in this case they are in the order in which they appear in the data frame because it is sorted alphabetically but it doesnt have to be this way if we wanted to pull Octopussy first. In all seriousness Octopussy as far as I recall was actually a pretty terrible bad movie. But let's say I want to put Octopussy first and then do Moonraker after we can do it this way. And even though Moonraker comes first in our original data frame because we pulled it in this way we're going to extract those labels in the specific order that those names are passed into our internal list. There's one big caveat here that you should watch out for.

             I'm going to use Bond Logan again and I'm going to use my square brackets and another pair of square brackets for IList and I'm going to begin with two Bond movies that actually exist which are for your eyes only and Live and Let Die. And then I'm going to add a Bond movie that actually does not exist so it does not exist in our original data frame. Let's call it gold bond. When I do this will see that the first two movies these first two index labels are going to be extracted. And the last one which does not exist is going to be included as well. You're going to see these no values to the right here. All of these mam's and that might give the impression that it's simply an existing row and all of the values are blank. But it is not gold bond does not exist in our original Bond data. It's been intentionally added to our new data from right here and that's something that you have to be very careful with Whenever you're extracting multiple indexed labels with the Loc method. If at least one of them exists you're not going to get an error. So be careful whenever you're doing multiple values because just because it shows up does not mean it's actually an index label in the original data in a classic way that we can check whether this exists is of course using our Python in keyword.

          So we can do gold bar gold bond in Bond Index which is our attribute which gives us that object that represents the index of our data frame and there we'll see that false. It does not exist. So just be careful. So that's how we can retrieve arrows by the index label name using the Loc method we can use the one one label at a time. Multiple labels in a list we can do from the beginning of the data frame to a specific label or from a specific label to the end of the data frame whichever way you want. We can extract them in different orders but that's how we extract them by label name and in the next lesson it will introduce the ILOC method which can be used to extract rose by index position.

Code Link : ML_61

Code : 
#!/usr/bin/env python
# coding: utf-8

# In[5]:


import pandas as pd

bond = pd.read_csv("jamesbond.csv", index_col="Film")

bond.sort_index(inplace=True)

bond.head()


# In[7]:


bond.loc["Goldfinger"]


# In[10]:


bond.loc["GoldenEye"]

#bond.loc["Sacred Bond"]


# In[11]:


bond.loc["Casino Royale"]


# In[14]:


bond.loc["Diamonds are Forever" : "Moonraker"]

bond.loc["GoldenEye" : ]

bond.loc[: "On Her Majesty's Secret Services"]


# In[16]:


bond.loc[["For Your Eyes Only", "Live and Let Die"]]

bond.loc[["Moonraker","Octopussy", "Gold Bond"]]


# In[17]:


"Gold Bond" in bond.index


YouTube Link : 


Post a Comment

0 Comments