Developmental studies often assess the effect of treatment of the pregnant mother on offspring. The use of multiparous species such as rats and mice in such studies creates a special set of design and analysis problems. These arise for two reasons. First, the availability of many offspring per litter tempts the experimenter to inflate sample size by treating scores from several pups per litter as independent observations. Second, large litter size seldom makes it practical to measure exposure effects in all offspring of an exposed dam. Such studies commonly involve two-stage sampling: Drawing a random sample of dams for treatment, then drawing a second sample of pups per dam for neurobehavioral measurements. In this article, such sampling was modeled by two different simulations. The first, a standard Monte-Carlo approach, sampled from random-normal distributions for litter mean and within-litter variability. The second simulation sampled without replacement from actual data on weight of all pups in a series of 39 nontreated rat litters. These mutually-supportive approaches demonstrate that litter effects, even over as few as three litters, are generally large and statistically meaningful. Consequently, statistical significance tests are sensitive to litter effects. Inflation of sample size by treating as few as 2 pups per litter as independent measurements can almost triple the nominal 0.05 alpha level. Furthermore, two-stage sampling increases the within-treatment error term and correspondingly reduces statistical power relative to one-stage sampling.(ABSTRACT TRUNCATED AT 250 WORDS)