I'm working with large GeoTiff files and have loaded each GeoTiff as a numpy array, and then created an array-stack of the GeoTiffs:
GeoTiff_list = []
for i in range(1,n+1):
GeoTiff_list.append(GeoTiff_i)
GeoTiff_array_stack = np.array(GeoTiff_list)
Then I just want to compute the average by pixel across the stack of images. In numpy this can be done as follows:
average_GeoTiff = GeoTiff_array_stack.mean(axis=0)
I'd like to repeat this computation in spark. I know there are some projects in the pipeline, such as GeoTrellis, that will make this trivial...but I'm not sure how I can do it using whatever is currently available.
I presume I can parallelize the array as follow:
sc.parallelize(GeoTiff_array_stack)
But then I'm not sure how to compute the mean of this array. This is probably trivial in MLlib, but it seems like pyspark isn't up to speed with the the other spark APIs.
Aucun commentaire:
Enregistrer un commentaire