Virtual StatsPD@Waite meeting

Every month, the professional development meetings of statisticians and data scientists at Waite, known as StatsPD@Waite, bring together specialists in various aspects of data sciences in agriculture from Waite, Roseworthy and Adelaide.

Please join us for the next Virtual StatsPD@Waite seminar where Michael Mumford from Agri-Food and Data Science, Queensland Department of Agriculture and Fisheries will present on incorporating environmental covariates in linear mixed models to account for genotype x environment x management interactions.

Please note that the StatsPD@Waite meetings are recorded. If you have a question to the speaker but would rather not be recorded, please send me your question via chat during the meeting and it will be asked on your behalf. 

Please email Beata Sznajder for details of the Zoom meeting.

 

Improving genomic prediction performance and efficiency using machine learning

Mario Fruzangohar - Biometry Hub, University of Adelaide

Private and public plant breeding companies are keen to undertake genomic prediction with unprecedentedly large numbers of lines and environments. However, legacy modelling methods such as linear mixed models are computationally hampered by the requirement to involve large, dense matrices of genomic information in the prediction optimisation process. One potential approach to circumvent this is to use highly parallelised machine learning methods. In this talk I will discuss the collaboration with the Australian Institute for Machine Learning (AIML) and Australian Grains Technologies (AGT) that involved the development of multiple machine learning architectures to improve the genomic prediction accuracy of yield in various large wheat breeding data sets. For single environment genomic prediction involving up to 10K lines and 18K markers, we developed a multi-layer perceptron (MLP) neural network that used the learned weights between connected layers to determine the important non-additive interactions between markers. For multi-environment genomic prediction that included 20K+ lines and 18K markers we extended the ML architecture to include interactions of weather, pathogen, and soil covariates with the complete set of genetic markers. In both cases we compared these new ML genomic prediction approaches to legacy LMM methods and found ML had improved accuracy.

Tagged in Stats PD, Waite, Statistics, Biometry Hub