To convert a float64-type column into an int64 or string-type column in python (short)
Address the “safe rule” issue by using the apply clause and pd.Int64Dtype()
Suppose we have a pandas dataframe which consists of two columns — date (a datetime64-type column) and colf (a float64-type column). And column colf contains NaNs.
Now we make an integer-type column from colf. So we type
df_a['coli']=df_a['colf'].astype(pd.Int64Dtype())
But this causes error messages like TypeError: Cannot cast array from dtype(‘float64’) to dtype(‘int64’) according to the rule ‘safe'.
So we now resort to the apply clause (although it can be slow):
df_a['coli']= (df_a['colf'].apply(lambda w0: None if pd.isnull(w0) else np.int64(w0)) ).astype(pd.Int64Dtype())
Then we have:
Indeed, the newly created column is integer64-type:
>>> df_a.dtypes
date datetime64[ns]
colf float64
coli Int64
Next we create a string type column based on column colf. It is well known that astype(str) converts NaN into a string ‘nan’. We try to do
df_a['cols']=df_a['colf'].astype('str')
As expected, we see that the NaNs in colf are converted into string nans in column cols:
Obviously, this form has the disadvantage that we cannot extract null values from the records in usual ways, and so we need to keep NaN. So we now use the apply clause:
df_a['cols_better']=df_a['colf'].apply(lambda flt0: np.nan if pd.isnull(flt0) else str(flt0))
Then we get:
That is, column cols_better is a string-type column and the NaNs from column colf are preserved.