1. Import pandas under the name pd .
In [1]:
import pandas as pd
import numpy as np
2. Print the version of pandas that has been imported.
In [2]:
pd.__version_
3. Print out all the version information of the libraries that are required by the pandas library
In [3]:
pd.show_versions()
4. Create a DataFrame df from this dictionary data which has the index labels .
In [2]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no',  'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
5. Display a summary of the basic information about this DataFrame and its data.
In [5]:
df.info()
# ...or...
df.describe()
6.Return the first 3 rows of the DataFrame df
In [6]:
df.iloc[:3]
# or equivalently
df.head(3)
7. Select just the 'animal' and 'age' columns from the DataFrame df .
In [7]:
df.loc[:, ['animal', 'age']]
# or
df[['animal', 'age']]
8. Select the data in rows [3, 4, 8] and in columns ['animal', 'age'] .
In [3]:
df.loc[df.index[[3, 4, 8]], ['animal', 'age']]
9. Select only the rows where the number of visits is greater than 3.
In [4]:
df[df['visits'] > 3]
10. Select the rows where the age is missing, i.e. is NaN .
In [5]:
df[df['age'].isnull()]
11. Select the rows where the animal is a cat and the age is less than 3.
In [6]:
df[(df['animal'] == 'cat') & (df['age'] < 3)]
12. Select the rows the age is between 2 and 4 (inclusive).
In [7]:
df[df['age'].between(2, 4)]
13. Change the age in row 'f' to 1.5.
In [ ]:
df.loc['f', 'age'] = 1.5
14. Calculate the sum of all visits (the total number of visits).
In [ ]:
df['visits'].sum()
15. Calculate the mean age for each different animal in df .
In [8]:
df.groupby('animal')['age'].mean()
16. Append a new row 'k' to df with your choice of values for each column. Then delete that row to return the
original DataFrame.
In [ ]:df.loc['k'] = [5.5, 'dog', 'no', 2]
# and then deleting the
df = df.drop('k')
17. Count the number of each type of animal in df .
In [9]:
df['animal'].value_counts()
18. Sort df first by the values in the 'age' in decending order, then by the value in the 'visit' column in
ascending order.
In [10]:
df.sort_values(by=['age', 'visits'], ascending=[False, True])
19. The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean
values: 'yes' should be True and 'no' should be False .
In [ ]:
df['priority'] = df['priority'].map({'yes': True, 'no': False})
In [14]:
df['animal'] = df['animal'].replace('snake', 'python')
print(df)
21. For each animal type and each number of visits, find the mean age. In other words, each row is an animal,
each column is a number of visits and the values are the mean ages (hint: use a pivot table).
In [15]:
df.pivot_table(index='animal', columns='visits', values='age', aggfunc='mean')
22. You have a DataFrame df with a column 'A' of integers. For example:
df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
How do you filter out rows which contain the same integer as the row immediately above?
In [16]:python菜鸟教程100
df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
df.loc[df['A'].shift() != df['A']]
# Alternatively, we could use drop_duplicates() here. Note
# that this removes *all* duplicates though, so it won't
23. Given a DataFrame of numeric values, say
df = pd.DataFrame(np.random.random(size=(5, 3))) # a 5x3 frame of float valu
es
how do you subtract the row mean from each element in the row?
In [ ]:df.an(axis=1), axis=0)
24. Suppose you have DataFrame with 10 columns of real numbers, for example:
df = pd.DataFrame(np.random.random(size=(5, 10)), columns=list('abcdefghij'
))
Which column of numbers has the smallest sum? ((Find that column's label.)
In [17]:
df.sum().idxmin()
25. How do you count how many unique rows a DataFrame has (i.e. ignore all rows that are duplicates)?
In [ ]:
len(df) - df.duplicated(keep=False).sum()
# or perhaps
len(df.drop_duplicates(keep=False))
26. You have a DataFrame that consists of 10 columns of floating--point numbers. Suppose that exactly 5

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。