Abstract
In the Big Data era, the community of PAM faces strong challenges, including the need for more standardized processing tools accross its different applications in oceanography, and for more scalable and high-performance computing systems to process more efficiently the everly growing datasets. In this work we address conjointly both issues by first proposing a detailed theory-plus-code document of a classical analysis workflow to describe the content of PAM data, which hopefully will be reviewed and adopted by a maximum of PAM experts to make it standardized. Second, we transposed this workflow into the Scala language within the Spark/Hadoop frameworks so it can be directly scaled out on several node cluster.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1902.06659