What to do when pandas does not recognize the data string NA / ND as a missing value
Missing values in pandas
In experimental datasets, blanks may be complemented with NA, NA (Not analyzed) or ND, ND (Not detected). In pandas, NA and ND are recognized as just a character string (object), so it is not possible to batch process missing values using the dropna (), fillna (), and isnull () functions as they are. ..
The following are recognized as missing values in pandas:
- NaN
- none
- np.nan
- math.nan
The strings NA and ND need to be converted to one of the above.
Processing for NA and ND
Create sample data including NA, ND and NaN.
In [1]: import numpy as np import pandas as pd df = pd.DataFrame ({'A': [1,'ND', 2, 3],'B': [4,'NA', 5, 6 ],'C': [7,'ND', None, np.nan]}) Out [1] ABC 0 1 4 7 1 ND NA ND 2 2 5 NaN 3 3 5 NaN
None and np.nan are recognized and counted as missing values, but ND and NA are False.
isnull (): Returns True if the value is missing
isnull.sum (): Aggregate missing values
In [2]: df.isnull () Out [2]: ABC 0 False False False 1 False False False 2 False False True 3 False False True In [3]: df.isnull (). Sum () Out [3] : A 0 B 0 C 2 dtype: int64
Replace string ND with replace function
In [4]: df = df.replace ('ND', np.nan) Out [4]: A B C 0 1 4 7 1 NaN NA NaN 2 2 5 NaN 3 3 5 NaN
isnull () and fillna () can now be applied.
In [5]: df.isnull () Out [5]: ABC 0 False False False 1 True False True 2 False False True 3 False False True In [6]: df.isnull (). Sum () Out [6] : A 2 B 0 C 2 dtype: int64 In [7]: df.fillna (10) Out [7]: ABC 0 1 4 7 1 10 NA 10 2 2 5 10 3 3 5 10
In-Depth Discussions
Comment list
There are not any comments yet