PT - JOURNAL ARTICLE AU - Akdemir, Deniz TI - STPGA: Selection of training populations with a genetic algorithm AID - 10.1101/111989 DP - 2017 Jan 01 TA - bioRxiv PG - 111989 4099 - http://biorxiv.org/content/early/2017/02/27/111989.short 4100 - http://biorxiv.org/content/early/2017/02/27/111989.full AB - Optimal subset selection is an important task that has numerous algorithms designed for it and has many application areas. STPGA contains a special genetic algorithm supplemented with a tabu memory property (that keeps track of previously tried solutions and their fitness for a number of iterations), and with a regression of the fitness of the solutions on their coding that is used to form the ideal estimated solution (look ahead property) to search for solutions of generic optimal subset selection problems. I have initially developed the programs for the specific problem of selecting training populations for genomic prediction or association problems, therefore I give discussion of the theory behind optimal design of experiments to explain the default optimization criteria in STPGA, and illustrate the use of the programs in this endeavor. Nevertheless, I have picked a few other areas of application: supervised and unsupervised variable selection based on kernel alignment, supervised variable selection with design criteria, influential observation identification for regression, solving mixed integer quadratic optimization problems, balancing gains and inbreeding in a breeding population. Some of these illustrations pertain new statistical approaches.