The greatest challenge of the proposed work will be to create an overall design that truly simplifies and improves the processes of pipeline development and data analysis, and does so in the face of a very wide variety of available approaches and tools.
An additional challenge arises from the very large size of population genomic data sets. The public instance of PPP that we will be hosting from Temple University will have enough resources for fairly large VCF files and for running most summary statistic-based methods. However we can not host very many computationally intensive model-based methods, that require very long runs or large numbers of processors. Investigators can still run small versions of these analyses and we will provide tools for data sampling and for doing trial runs on small data sets of model-based analyses.