Python Spark DataFrame: replace null with SparseVector
In spark, I have following data frame called “df” with some null entries:
df.features1 and df.features2 are type vector (nullable). Then I tried to use following code to fill null entries with SparseVectors:
This code led to following error:
Then I found following paragraph in spark documentation:
Does this explain my failure to replace null entries with SparseVectors in DataFrame? Or does this mean that there’s no way to do this in DataFrame?
I can achieve my goal by converting DataFrame to RDD and replacing None values with SparseVectors, but it will be much more convenient for me to do this directly in DataFrame.
Is there any method to do this directly in DataFrame?
You can use