Virtual StatsPD@Waite meeting

Every month, the professional development meetings of statisticians and data scientists at Waite, known as StatsPD@Waite, bring together specialists in various aspects of data sciences in agriculture from Waite, Roseworthy and Adelaide.

The next StatsPD@Waite meeting will take place on 16th June where Julian Taylor will present on his latest work on high dimensional whole genome analysis using a one-step penalized linear mixed model.

Email Beata Sznajder for details of the Zoom meeting.

 

High dimensional whole genome analysis using a one-step penalized linear mixed model

Julian Taylor & Suman Rakshit

In comparative plant breeding experiments, the advent of cost efficient high-throughput DNA sequencing has allowed the detailed genetic dissection of industry driven phenotypic traits through various whole genome analysis approaches. These approaches usually involve the use of a highly structured linear mixed model (LMM) that can simultaneously account for genetic and non-genetic sources of variation. In more simplistic piecemeal approaches, each of the genetic markers are analysed in a separate LMM and significant genomic regions are determined through thresholding techniques. More efficient analysis approaches are possible and usually occur through the incorporation of a whole genome marker based additive relationship in the LMM. However, amongst these approaches, only few provide an algorithm to accurately identify significant genetic markers linked to the trait of interest.

In this research, we focus on an efficient one-step LMM approach that applies a non-concave penalty to each of the marker effects. With appropriate approximation the penalty can be subsumed in the LMM estimation process through a modified relationship matrix. As the algorithm progresses, the variable selection nature of the penalization simultaneously highlights the effects of important markers and reduces other non-important markers through numerical thresholding. The method has been computationally implemented using the flexible linear mixed modelling package ASReml-R V4 and will be illustrated with a simulated high dimensional quantitative trait loci (QTL) analysis problem.

Tagged in Stats PD, Waite, Statistics, Biometry Hub