A two-stage association study identifies methyl-CpG-binding domain protein 2 gene polymorphisms as candidates for breast cancer susceptibility

A two-stage association study identifies methyl-CpG-binding domain protein 2 gene polymorphisms as candidates for breast cancer susceptibility Journal: Eur J Hum Genet | Pages: 682-689 | Date: June 2012 | Authors: Sapkota Y, Robson P, Lai R, Cass CE, Mackey JR, Damaraju S. Genome-wide association studies for breast cancer have identified over 40 single-nucleotide polymorphisms (SNPs), a subset of which remains statistically significant after genome-wide correction. Improved strategies for mining of genome-wide association data have been suggested to address heritable component of genetic risk in breast cancer. In this study, we attempted a two-stage association design using markers from a genome-wide study (stage 1, Affymetrix Human SNP 6.0 array, cases=302, controls=321). We restricted our analysis to DNA repair/modifications/metabolism pathway related gene polymorphisms for their obvious role in carcinogenesis in general and for their known protein-protein interactions vis-à-vis, potential epistatic effects. We selected 22 SNPs based on linkage disequilibrium patterns and high statistical significance. Genotyping assays in an independent replication study of 1178 cases and 1314 controls were attempted using Sequenom iPLEX Gold platform (stage 2). Six SNPs (rs8094493, rs4041245, rs7614, rs13250873, rs1556459 and rs2297381) showed consistent and statistically significant associations with breast cancer risk in both stages, with allelic odds ratios (and P-values) of 0.85 (0.0021), 0.86 (0.0026), 0.86 (0.0041), 1.17 (0.0043), 1.20 (0.0103) and 1.13 (0.0154), respectively, in combined analysis (N=3115). Of these, three polymorphisms were located in methyl-CpG-binding domain protein 2 gene regions and were in strong linkage disequilibrium. The remaining three SNPs were in proximity to RAD21 homolog (S. pombe), O-6-methylguanine-DNA methyltransferase and RNA polymerase II-associated protein 1. The identified markers may be relevant to breast cancer susceptibility in populations if these findings are confirmed in independent cohorts.

Cohorts and consortia conference: a summary report (Banff, Canada, June 17-19, 2009)

Cohorts and consortia conference: a summary report (Banff, Canada, June 17-19, 2009) Journal: Cancer Causes Control | Pages: 463-468 | Date: March 2011 | Authors: Boffetta P, Colditz GA, Potter JD, Kolonel L, Robson PJ, Malekzadeh R, Seminara D, Goode EL, Yoo KY, Demers P, Gallagher R, Prentice R, Yasui Y, O’Doherty K, Petersen GM, Ulrich CM, Csizmadi I, Amankwah EK, Brockton NT, Kopciuk K, McGregor SE, Kelemen LE. Epidemiologic studies have adapted to the genomics era by forming large international consortia to overcome issues of large data volume and small sample size. Whereas both cohort and well-conducted case-control studies can inform disease risk from genetic susceptibility, cohort studies offer the additional advantages of assessing lifestyle and environmental exposure-disease time sequences often over a life course. Consortium involvement poses several logistical and ethical issues to investigators, some of which are unique to cohort studies, including the challenge to harmonize prospectively collected lifestyle and environmental exposures validly across individual studies. An open forum to discuss the opportunities and challenges of large-scale cohorts and their consortia was held in June 2009 in Banff, Canada, and is summarized in this report.   http://www.ncbi.nlm.nih.gov/pubmed/21203821

Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project.

Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project. Journal: Am J Epidemiol | Pages: 956-967 | Date: April 2011 | Authors: Lo Siou G, Yasui Y, Csizmadi I, McGregor SE, Robson PJ. Dietary patterns derived by cluster analysis are commonly reported with little information describing how decisions are made at each step of the analytical process. Using food frequency questionnaire data obtained in 2001-2007 on Albertan men (n = 6,445) and women (n = 10,299) aged 35-69 years, the authors explored the use of statistical approaches to diminish the subjectivity inherent in cluster analysis. Reproducibility of cluster solutions, defined as agreement between 2 cluster assignments, by 3 clustering methods (Ward’s minimum variance, flexible beta, K means) was evaluated. Ratios of between- versus within-cluster variances were examined, and health-related variables across clusters in the final solution were described. K means produced cluster solutions with the highest reproducibility. For men, 4 clusters were chosen on the basis of ratios of between- versus within-cluster variances, but for women, 3 clusters were chosen on the basis of interpretability of cluster labels and descriptive statistics. In comparison with those in other clusters, men and women in the “healthy” clusters by greater proportions reported normal body mass index, smaller waist circumference, and lower energy intakes. The authors’ approach appeared helpful when choosing the clustering method for both sexes and the optimal number of clusters for men, but additional analyses are required to understand why it performed differently for women. http://www.ncbi.nlm.nih.gov/pubmed/21421742

Hours spent and energy expended in physical activity domains: results from the Tomorrow Project cohort in Alberta, Canada

Hours spent and energy expended in physical activity domains: results from the Tomorrow Project cohort in Alberta, Canada Journal: Int J Behav Nutr Phys Act | Date: October 2011 | Authors: Csizmadi I, Lo Siou G, Friedenreich CM, Owen N, Robson PJ. BACKGROUND: Knowledge of adult activity patterns across domains of physical activity is essential for the planning of population-based strategies that will increase overall energy expenditure and reduce the risk of obesity and related chronic diseases. We describe domain-specific hours of activity and energy expended among participants in a prospective cohort in Alberta, Canada. METHODS: The Past Year Total Physical Activity Questionnaire was completed by 15,591 Tomorrow Project®o participants, between 2001 and 2005 detailing physical activity type, duration, frequency and intensity. Domain-specific hours of activity and activity-related energy expenditure, expressed as a percent of total energy expenditure (TEE) (Mean (SD); Median (IQR)) are reported across inactive (<1.4), low active (1.4 to 1.59), active (1.6 to 1.89) and very active (≥ 1.9) Physical Activity Level (PAL = TEE:REE) categories. RESULTS: In very active women and amongst all men except those classified as inactive, activity-related energy expenditure comprised primarily occupational activity. Amongst inactive men and women in active, low active and inactive groups, activity-related energy expenditure from household activity was comparable to, or exceeded that for occupational activity. Leisure-time activity-related energy expenditure decreased with decreasing PAL categories; however, even amongst the most active men and women it accounted for less than 10 percent of TEE. When stratified by employment status, leisure-time activity-related energy expenditure was greatest for retired men [mean (SD): 10.8 (8.5) percent of TEE], compared with those who were fully employed, employed part-time or not employed. Transportation-related activity was negligible across all categories of PAL and employment status. CONCLUSIONS: For the inactive portion of this population, active non-leisure activities, specifically in the transportation and occupational domains, need to be considered for inclusion in daily routines as a means of increasing population-wide activity levels. Environmental and policy changes to promote active transport and workplace initiatives could increase overall daily energy expenditure through reducing prolonged sitting time. http://www.ncbi.nlm.nih.gov/pubmed/21985559

Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility.

Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility. Journal: Hum Genet. | Pages: 529-537 | Date: October 2011 | Authors: Sehrawat B, Sridharan M, Ghosh S, Robson P, Cass CE, Mackey JR, Greiner R, Damaraju S. Previous genome-wide association studies (GWAS) have shown several risk alleles to be associated with breast cancer. However, the variants identified so far contribute to only a small proportion of disease risk. The objective of our GWAS was to identify additional novel breast cancer susceptibility variants and to replicate these findings in an independent cohort. We performed a two-stage association study in a cohort of 3,064 women from Alberta, Canada. In Stage I, we interrogated 906,600 single nucleotide polymorphisms (SNPs) on Affymetrix SNP 6.0 arrays using 348 breast cancer cases and 348 controls. We used single-locus association tests to determine statistical significance for the observed differences in allele frequencies between cases and controls. In Stage II, we attempted to replicate 35 significant markers identified in Stage I in an independent study of 1,153 cases and 1,215 controls. Genotyping of Stage II samples was done using Sequenom Mass-ARRAY iPlex platform. Six loci from four different gene regions (chromosomes 4, 5, 16 and 19) showed statistically significant differences between cases and controls in both Stage I and Stage II testing, and also in joint analysis. The identified variants were from EDNRA, ROPN1L, C16orf61 and ZNF577 gene regions. The presented joint analyses from the two-stage study design were not significant after genome-wide correction. The SNPs identified in this study may serve as potential candidate loci for breast cancer risk in a further replication study in Stage III from Alberta population or independent validation in Caucasian cohorts elsewhere. http://www.ncbi.nlm.nih.gov/pubmed/21424380

Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies.

Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Journal: Int J Epidemiol | Pages: 1383-1393 | Date: September 2012 | Authors: Fortier I, Burton PR, Robson PJ, Ferretti V, Little J, L’Heureux F, Deschênes M, Knoppers BM, Doiron D, Keers JC, Linksted P, Harris JR, Lachance G, Boileau C, Pedersen NL, Hamilton CM, Hveem K, Borugian MJ, Gallagher RP, McLaughlin J, Parker L, Potter JD, Gallacher J, Kaaks R, Liu B, Sprosen T, Vilain A, Atkinson SA, Rengifo A, Morton R, Metspalu A, Wichmann HE, Tremblay M, Chisholm RL, Garcia-Montero A, Hillege H, Litton JE, Palmer LJ, Perola M, Wolffenbuttel BH, Peltonen L, Hudson TJ. BACKGROUND: Vast sample sizes are often essential in the quest to disentangle the complex interplay of the genetic, lifestyle, environmental and social factors that determine the aetiology and progression of chronic diseases. The pooling of information between studies is therefore of central importance to contemporary bioscience. However, there are many technical, ethico-legal and scientific challenges to be overcome if an effective, valid, pooled analysis is to be achieved. Perhaps most critically, any data that are to be analysed in this way must be adequately ‘harmonized’. This implies that the collection and recording of information and data must be done in a manner that is sufficiently similar in the different studies to allow valid synthesis to take place. METHODS: This conceptual article describes the origins, purpose and scientific foundations of the DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research; http://www.datashaper.org), which has been created by a multidisciplinary consortium of experts that was pulled together and coordinated by three international organizations: P³G (Public Population Project in Genomics), PHOEBE (Promoting Harmonization of Epidemiological Biobanks in Europe) and CPT (Canadian Partnership for Tomorrow Project). RESULTS: The DataSHaPER provides a flexible, structured approach to the harmonization and pooling of information between studies. Its two primary components, the ‘DataSchema’ and ‘Harmonization Platforms’, together support the preparation of effective data-collection protocols and provide a central reference to facilitate harmonization. The DataSHaPER supports both ‘prospective’ and ‘retrospective’ harmonization. CONCLUSIONS: It is hoped that this article will encourage readers to investigate the project further: the more the research groups and studies are actively involved, the more effective the DataSHaPER programme will ultimately be. http://www.ncbi.nlm.nih.gov/pubmed/20813861

The Canadian Partnership for Tomorrow Project: building a pan-Canadian research platform for disease prevention.

The Canadian Partnership for Tomorrow Project: building a pan-Canadian research platform for disease prevention. Journal: CMAJ | Pages: 1197-1201 | Date: August 2010 | Authors: Borugian MJ, Robson P, Fortier I, Parker L, McLaughlin J, Knoppers BM, Bédard K, Gallagher RP, Sinclair S, Ferretti V, Whelan H, Hoskin D, Potter JD. As the proportion of the population over age 65 increases in Western countries, the burden of cancer 1 and other chronic diseases is also increasing. If advances in preventing these diseases are to be realized, better information is needed about their causes and the antecedents of the causes. For example, although it is known that many sporadic cancers are caused by a combination of lifestyle factors, exposure to environmental carcinogens and individual genetic makeup, 2,3 detailed knowledge about the interplay among these factors is lacking. Much of our current knowledge about the causes of cancer and most relatively rare chronic diseases has come from retrospective case–control studies, in which the characteristics of patients (cases) are compared with those of age- and sex-matched people who do not have the disease (controls). This design has strengths but also a number of weakneses, including potential recall bias and selection bias 4 (Table 1). To address some of these weaknesses, in particular recall bias and the temporal relation between risk factors and outcomes, prospective cohorts are helpful because participants are enrolled before the onset of disease. In studies with a prospective cohort design, large numbers of participants, who generally have not had cancer or any other significant diagnosis, are recruited and followed over a long time, periodically providing updated health and lifestyle information and biologic samples. Layers of data and samples accumulate over time, allowing an exploration of why cancer develops in some people within the cohort but not others. 6 The disadvantages of such a design (Table 1) are cost and time, as it may be a decade or more before major results are obtained. Fortunately, many shorter-term results are also available, such as information on screening attendance and information on the frequency of major risk factors and health states, as well as environmental and individual determinants of these risk factors, all of which are useful for planning various health services. Furthermore, because many diseases can be studied simultaneously, the cost over time per health outcome studied is substantially lower than the cost of case–control studies for a comparable number of participants. http://www.ncbi.nlm.nih.gov/pubmed/20421354

Are physical activity levels linked to nutrient adequacy? Implications for cancer risk

Are physical activity levels linked to nutrient adequacy? Implications for cancer risk Journal: Nutr.Cancer | Volume: 66 (2) | Pages: 214-224|Date: 2014 | Authors: Csizmadi I, Kelemen LE, Speidel T, Yuan Y, Dale LC, Friedenreich CM, Robson PJ. Cancer prevention guidelines recommend a healthy body mass index, physical activity, and nutrient intake from food rather than supplements. Sedentary individuals may restrict energy intake to prevent weight gain and in so doing may compromise nutritional intake. We conducted a cross-sectional analysis to determine if adequacy of micronutrients is linked to physical activity levels (PALs) in healthy-weight adults. Tomorrow Project participants in Alberta, Canada (n = 5333), completed past-year diet and physical activity questionnaires. The percent meeting Dietary Reference Intakes (DRIs) was reported across low and high PAL groups, and the relation between PAL and percent achieved DRI was determined using multiple linear regression analyses. Overall, <50% of healthy-weight participants met DRIs for folate, calcium, and vitamin D. Percent achieved DRI increased linearly with increasing PAL in both genders (P < 0.01). A hypothetical increase in PAL from 1.4 to 1.9 was associated with a DRI that was 8%-13% higher for folate and vitamin C (men) and 5%-15% higher for calcium and iron (women). Healthy-weight adults at higher PALs appear more likely to meet DRIs for potential cancer-preventing nutrients. The benefits of higher PALs may extend beyond the usual benefits attributed to physical activity to include having a more favorable impact on nutrient adequacy. http://www.ncbi.nlm.nih.gov/pubmed/24564401

Assessing SNP-SNP interactions among DNA repair, modification and metabolism related pathway genes in breast cancer susceptibility

Assessing SNP-SNP interactions among DNA repair, modification and metabolism related pathway genes in breast cancer susceptibility Journal: PLoS One | Date: June 2013 | Authors: Sapkota Y, Mackey JR, Lai R, Franco-Villalobos C, Lupichuk S, Robson PJ, Kopciuk K, Cass CE, Yasui Y, Damaraju S. Genome-wide association studies (GWASs) have identified low-penetrance common variants (i.e., single nucleotide polymorphisms, SNPs) associated with breast cancer susceptibility. Although GWASs are primarily focused on single-locus effects, gene-gene interactions (i.e., epistasis) are also assumed to contribute to the genetic risks for complex diseases including breast cancer. While it has been hypothesized that moderately ranked (P value based) weak single-locus effects in GWASs could potentially harbor valuable information for evaluating epistasis, we lack systematic efforts to investigate SNPs showing consistent associations with weak statistical significance across independent discovery and replication stages. The objectives of this study were i) to select SNPs showing single-locus effects with weak statistical significance for breast cancer in a GWAS and/or candidate-gene studies; ii) to replicate these SNPs in an independent set of breast cancer cases and controls; and iii) to explore their potential SNP-SNP interactions contributing to breast cancer susceptibility. A total of 17 SNPs related to DNA repair, modification and metabolism pathway genes were selected since these pathways offer a priori knowledge for potential epistatic interactions and an overall role in breast carcinogenesis. The study design included predominantly Caucasian women (2,795 cases and 4,505 controls) from Alberta, Canada. We observed two two-way SNP-SNP interactions (APEX1-rs1130409 and RPAP1-rs2297381; MLH1-rs1799977 and MDM2-rs769412) in logistic regression that conferred elevated risks for breast cancer (P(interaction)   http://www.ncbi.nlm.nih.gov/pubmed/23755158

Breast cancer prediction using genome wide single nucleotide polymorphism data

Breast cancer prediction using genome wide single nucleotide polymorphism data Journal: BMC Bioinformatics |Date: 2013 | Authors: Mohsen Hajiloo, Babak Damavandi, Metanat HooshSadat, Farzad Sangi, John R Mackey, Carol E Cass, Russell Greiner and Sambasivarao Damaraju Background This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile. Results We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline. Conclusions We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.   http://www.biomedcentral.com/1471-2105/14/S13/S3