What to do when pandas does not recognize the data string NA / ND as a missing value

2018/12/5

Missing values ​​in pandas

In experimental datasets, blanks may be complemented with NA, NA (Not analyzed) or ND, ND (Not detected). In pandas, NA and ND are recognized as just a character string (object), so it is not possible to batch process missing values ​​using the dropna (), fillna (), and isnull () functions as they are. ..

The following are recognized as missing values ​​in pandas:

  • NaN
  • none
  • np.nan
  • math.nan

The strings NA and ND need to be converted to one of the above.

Processing for NA and ND

Create sample data including NA, ND and NaN.

In [1]: import numpy as np import pandas as pd df = pd.DataFrame ({'A': [1,'ND', 2, 3],'B': [4,'NA', 5, 6 ],'C': [7,'ND', None, np.nan]}) Out [1] ABC 0 1 4 7 1 ND NA ND 2 2 5 NaN 3 3 5 NaN

None and np.nan are recognized and counted as missing values, but ND and NA are False.
isnull (): Returns True if the value is missing
isnull.sum (): Aggregate missing values

In [2]: df.isnull () Out [2]: ABC 0 False False False 1 False False False 2 False False True 3 False False True In [3]: df.isnull (). Sum () Out [3] : A 0 B 0 C 2 dtype: int64

Replace string ND with replace function

In [4]: ​​df = df.replace ('ND', np.nan) Out [4]: ​​A B C 0 1 4 7 1 NaN NA NaN 2 2 5 NaN 3 3 5 NaN

isnull () and fillna () can now be applied.

In [5]: df.isnull () Out [5]: ABC 0 False False False 1 True False True 2 False False True 3 False False True In [6]: df.isnull (). Sum () Out [6] : A 2 B 0 C 2 dtype: int64 In [7]: df.fillna (10) Out [7]: ABC 0 1 4 7 1 10 NA 10 2 2 5 10 3 3 5 10