Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility.

Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility. Journal: Hum Genet. | Pages: 529-537 | Date: October 2011 | Authors: Sehrawat B, Sridharan M, Ghosh S, Robson P, Cass CE, Mackey JR, Greiner R, Damaraju S. Previous genome-wide association studies (GWAS) have shown several risk alleles to be associated with breast cancer. However, the variants identified so far contribute to only a small proportion of disease risk. The objective of our GWAS was to identify additional novel breast cancer susceptibility variants and to replicate these findings in an independent cohort. We performed a two-stage association study in a cohort of 3,064 women from Alberta, Canada. In Stage I, we interrogated 906,600 single nucleotide polymorphisms (SNPs) on Affymetrix SNP 6.0 arrays using 348 breast cancer cases and 348 controls. We used single-locus association tests to determine statistical significance for the observed differences in allele frequencies between cases and controls. In Stage II, we attempted to replicate 35 significant markers identified in Stage I in an independent study of 1,153 cases and 1,215 controls. Genotyping of Stage II samples was done using Sequenom Mass-ARRAY iPlex platform. Six loci from four different gene regions (chromosomes 4, 5, 16 and 19) showed statistically significant differences between cases and controls in both Stage I and Stage II testing, and also in joint analysis. The identified variants were from EDNRA, ROPN1L, C16orf61 and ZNF577 gene regions. The presented joint analyses from the two-stage study design were not significant after genome-wide correction. The SNPs identified in this study may serve as potential candidate loci for breast cancer risk in a further replication study in Stage III from Alberta population or independent validation in Caucasian cohorts elsewhere. http://www.ncbi.nlm.nih.gov/pubmed/21424380

Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies.

Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Journal: Int J Epidemiol | Pages: 1383-1393 | Date: September 2012 | Authors: Fortier I, Burton PR, Robson PJ, Ferretti V, Little J, L’Heureux F, Deschênes M, Knoppers BM, Doiron D, Keers JC, Linksted P, Harris JR, Lachance G, Boileau C, Pedersen NL, Hamilton CM, Hveem K, Borugian MJ, Gallagher RP, McLaughlin J, Parker L, Potter JD, Gallacher J, Kaaks R, Liu B, Sprosen T, Vilain A, Atkinson SA, Rengifo A, Morton R, Metspalu A, Wichmann HE, Tremblay M, Chisholm RL, Garcia-Montero A, Hillege H, Litton JE, Palmer LJ, Perola M, Wolffenbuttel BH, Peltonen L, Hudson TJ. BACKGROUND: Vast sample sizes are often essential in the quest to disentangle the complex interplay of the genetic, lifestyle, environmental and social factors that determine the aetiology and progression of chronic diseases. The pooling of information between studies is therefore of central importance to contemporary bioscience. However, there are many technical, ethico-legal and scientific challenges to be overcome if an effective, valid, pooled analysis is to be achieved. Perhaps most critically, any data that are to be analysed in this way must be adequately ‘harmonized’. This implies that the collection and recording of information and data must be done in a manner that is sufficiently similar in the different studies to allow valid synthesis to take place. METHODS: This conceptual article describes the origins, purpose and scientific foundations of the DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research; http://www.datashaper.org), which has been created by a multidisciplinary consortium of experts that was pulled together and coordinated by three international organizations: P³G (Public Population Project in Genomics), PHOEBE (Promoting Harmonization of Epidemiological Biobanks in Europe) and CPT (Canadian Partnership for Tomorrow Project). RESULTS: The DataSHaPER provides a flexible, structured approach to the harmonization and pooling of information between studies. Its two primary components, the ‘DataSchema’ and ‘Harmonization Platforms’, together support the preparation of effective data-collection protocols and provide a central reference to facilitate harmonization. The DataSHaPER supports both ‘prospective’ and ‘retrospective’ harmonization. CONCLUSIONS: It is hoped that this article will encourage readers to investigate the project further: the more the research groups and studies are actively involved, the more effective the DataSHaPER programme will ultimately be. http://www.ncbi.nlm.nih.gov/pubmed/20813861

The Canadian Partnership for Tomorrow Project: building a pan-Canadian research platform for disease prevention.

The Canadian Partnership for Tomorrow Project: building a pan-Canadian research platform for disease prevention. Journal: CMAJ | Pages: 1197-1201 | Date: August 2010 | Authors: Borugian MJ, Robson P, Fortier I, Parker L, McLaughlin J, Knoppers BM, Bédard K, Gallagher RP, Sinclair S, Ferretti V, Whelan H, Hoskin D, Potter JD. As the proportion of the population over age 65 increases in Western countries, the burden of cancer 1 and other chronic diseases is also increasing. If advances in preventing these diseases are to be realized, better information is needed about their causes and the antecedents of the causes. For example, although it is known that many sporadic cancers are caused by a combination of lifestyle factors, exposure to environmental carcinogens and individual genetic makeup, 2,3 detailed knowledge about the interplay among these factors is lacking. Much of our current knowledge about the causes of cancer and most relatively rare chronic diseases has come from retrospective case–control studies, in which the characteristics of patients (cases) are compared with those of age- and sex-matched people who do not have the disease (controls). This design has strengths but also a number of weakneses, including potential recall bias and selection bias 4 (Table 1). To address some of these weaknesses, in particular recall bias and the temporal relation between risk factors and outcomes, prospective cohorts are helpful because participants are enrolled before the onset of disease. In studies with a prospective cohort design, large numbers of participants, who generally have not had cancer or any other significant diagnosis, are recruited and followed over a long time, periodically providing updated health and lifestyle information and biologic samples. Layers of data and samples accumulate over time, allowing an exploration of why cancer develops in some people within the cohort but not others. 6 The disadvantages of such a design (Table 1) are cost and time, as it may be a decade or more before major results are obtained. Fortunately, many shorter-term results are also available, such as information on screening attendance and information on the frequency of major risk factors and health states, as well as environmental and individual determinants of these risk factors, all of which are useful for planning various health services. Furthermore, because many diseases can be studied simultaneously, the cost over time per health outcome studied is substantially lower than the cost of case–control studies for a comparable number of participants. http://www.ncbi.nlm.nih.gov/pubmed/20421354

Are physical activity levels linked to nutrient adequacy? Implications for cancer risk

Are physical activity levels linked to nutrient adequacy? Implications for cancer risk Journal: Nutr.Cancer | Volume: 66 (2) | Pages: 214-224|Date: 2014 | Authors: Csizmadi I, Kelemen LE, Speidel T, Yuan Y, Dale LC, Friedenreich CM, Robson PJ. Cancer prevention guidelines recommend a healthy body mass index, physical activity, and nutrient intake from food rather than supplements. Sedentary individuals may restrict energy intake to prevent weight gain and in so doing may compromise nutritional intake. We conducted a cross-sectional analysis to determine if adequacy of micronutrients is linked to physical activity levels (PALs) in healthy-weight adults. Tomorrow Project participants in Alberta, Canada (n = 5333), completed past-year diet and physical activity questionnaires. The percent meeting Dietary Reference Intakes (DRIs) was reported across low and high PAL groups, and the relation between PAL and percent achieved DRI was determined using multiple linear regression analyses. Overall, <50% of healthy-weight participants met DRIs for folate, calcium, and vitamin D. Percent achieved DRI increased linearly with increasing PAL in both genders (P < 0.01). A hypothetical increase in PAL from 1.4 to 1.9 was associated with a DRI that was 8%-13% higher for folate and vitamin C (men) and 5%-15% higher for calcium and iron (women). Healthy-weight adults at higher PALs appear more likely to meet DRIs for potential cancer-preventing nutrients. The benefits of higher PALs may extend beyond the usual benefits attributed to physical activity to include having a more favorable impact on nutrient adequacy. http://www.ncbi.nlm.nih.gov/pubmed/24564401

Assessing SNP-SNP interactions among DNA repair, modification and metabolism related pathway genes in breast cancer susceptibility

Assessing SNP-SNP interactions among DNA repair, modification and metabolism related pathway genes in breast cancer susceptibility Journal: PLoS One | Date: June 2013 | Authors: Sapkota Y, Mackey JR, Lai R, Franco-Villalobos C, Lupichuk S, Robson PJ, Kopciuk K, Cass CE, Yasui Y, Damaraju S. Genome-wide association studies (GWASs) have identified low-penetrance common variants (i.e., single nucleotide polymorphisms, SNPs) associated with breast cancer susceptibility. Although GWASs are primarily focused on single-locus effects, gene-gene interactions (i.e., epistasis) are also assumed to contribute to the genetic risks for complex diseases including breast cancer. While it has been hypothesized that moderately ranked (P value based) weak single-locus effects in GWASs could potentially harbor valuable information for evaluating epistasis, we lack systematic efforts to investigate SNPs showing consistent associations with weak statistical significance across independent discovery and replication stages. The objectives of this study were i) to select SNPs showing single-locus effects with weak statistical significance for breast cancer in a GWAS and/or candidate-gene studies; ii) to replicate these SNPs in an independent set of breast cancer cases and controls; and iii) to explore their potential SNP-SNP interactions contributing to breast cancer susceptibility. A total of 17 SNPs related to DNA repair, modification and metabolism pathway genes were selected since these pathways offer a priori knowledge for potential epistatic interactions and an overall role in breast carcinogenesis. The study design included predominantly Caucasian women (2,795 cases and 4,505 controls) from Alberta, Canada. We observed two two-way SNP-SNP interactions (APEX1-rs1130409 and RPAP1-rs2297381; MLH1-rs1799977 and MDM2-rs769412) in logistic regression that conferred elevated risks for breast cancer (P(interaction)   http://www.ncbi.nlm.nih.gov/pubmed/23755158

Breast cancer prediction using genome wide single nucleotide polymorphism data

Breast cancer prediction using genome wide single nucleotide polymorphism data Journal: BMC Bioinformatics |Date: 2013 | Authors: Mohsen Hajiloo, Babak Damavandi, Metanat HooshSadat, Farzad Sangi, John R Mackey, Carol E Cass, Russell Greiner and Sambasivarao Damaraju Background This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile. Results We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline. Conclusions We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.   http://www.biomedcentral.com/1471-2105/14/S13/S3

Cognitive testing of the STAR-Q: insights in activity and sedentary time reporting.

Cognitive testing of the STAR-Q: insights in activity and sedentary time reporting. Journal: J Phys Act Health | Pages: 379-389 | Date: March 2013 | Authors: Neilson HK, Ullman R, Robson PJ, Friedenreich CM, Csizmadi I. PURPOSE: The qualitative attributes and quantitative measurement properties of physical activity questionnaires are equally important considerations in questionnaire appraisal, yet fundamental aspects such as question comprehension are not often described in the literature. Here we describe the use of cognitive interviewing to evaluate the Sedentary Time and Activity Reporting Questionnaire (STAR-Q), a self-administered questionnaire designed to assess overall activity energy expenditure and sedentary behavior. METHODS: Several rounds of one-on-one interviews were conducted by an interviewer trained in qualitative research methods. Interviewees included a convenience sample of volunteers and participants in the Tomorrow Project, a large cohort study in Alberta, Canada. Following each round of interviews the STAR-Q was revised and cognitively tested until saturation was achieved. RESULTS: Six rounds of cognitive interviewing in 22 adults (5 males, 17 females) age 23-74 years, led to revisions involving 1) use of recall aids; 2) ambiguous terms; and 3) specific tasks, such as averaging across multiple routines, reporting time asleep and self-care, and reporting by activity domain. CONCLUSIONS: Cognitive interviewing is a critical step in questionnaire development. Knowledge gained in this study led to revisions that improved respondent acceptability and comprehension of the STAR-Q and will complement ongoing validity testing. http://www.ncbi.nlm.nih.gov/pubmed/22820674

Conditions associated with circulating tumor-associated folate receptor 1 protein in healthy men and women

Conditions associated with circulating tumor-associated folate receptor 1 protein in healthy men and women Journal: PLoS.One | Pages: 214-224|Date: May 2014 | Authors: Kelemen LE, Brenton JD, Parkinson C, C Whitaker H, Piskorz AM, Csizmadi I, Robson PJ. Serum concentrations of the tumor-associated folate receptor 1 (FOLR1) protein may be a marker for early cancer detection, yet concentrations have also been detected in cancer-free women. We investigated the conditions associated with circulating FOLR1 protein in healthy individuals and sought to clarify the range of normal serum values. METHODS: Sera of cancer-free men and women (N = 60) enrolled in a population-based cohort study in Alberta, Canada were analyzed for FOLR1 protein using an electrochemical luminescence immunoassay. Dietary, lifestyle, medical and reproductive history information was collected by questionnaires. Differences in serum FOLR1 concentrations between groups were assessed by non-parametric tests, and predictors of serum FOLR1 concentrations were estimated using multivariable linear regression. RESULTS: Median serum FOLR1 concentration was higher in women (491 pg/ml, range = 327-693 pg/ml) than in men (404 pg/ml, range = 340-682 pg/ml), P = 0.001. FOLR1 concentration was also positively associated with vitamin A intake (P = 0.02), and showed positive trends with age and with oral contraceptive hormone use among women and an inverse trend with body mass index. All variables examined explained almost half of the variation in serum FOLR1 (model R2 = 0.44, P = 0.04); however, the retention of gender (P = 0.003) and vitamin A intake (P = 0.03) together explained 20% (P = 0.001) of serum FOLR1 variation. No other predictor was significant at P<0.05. CONCLUSIONS: The positive association between serum FOLR1 concentration and female gender independent of an age effect suggests caution against statements to exploit serum FOLR1 for early cancer detection without further understanding the biological underpinnings of these observations. Serum FOLR1 concentrations may be influenced by the steroid retinoic acid (vitamin A) but do not appear to be associated with folate nutritional status. These findings require confirmation in larger independent studies. http://www.ncbi.nlm.nih.gov/pubmed/24810481

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction. Journal: BMC Bioinformatics |Date: 2013 | Authors: Mohsen Hajiloo, Yadav Sapkota, John R Mackey, Paula Robson, Russell Greiner and Sambasivarao Damaraju Background Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case–control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. Results We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual’s continental and sub-continental ancestry. To predict an individual’s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control’s λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver. Conclusions ETHNOPRED is a novel technique for producing classifiers that can identify an individual’s continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values. http://www.biomedcentral.com/1471-2105/14/61

Identification of a breast cancer susceptibility locus at 4q31.22 using a genome-wide association study paradigm

Identification of a breast cancer susceptibility locus at 4q31.22 using a genome-wide association study paradigm. Journal: PLoS One|Date: May 2013 | Authors: Sapkota Y1, Yasui Y, Lai R, Sridharan M, Robson PJ, Cass CE, Mackey JR, Damaraju S. More than 40 single nucleotide polymorphisms (SNPs) for breast cancer susceptibility were identified by genome-wide association studies (GWASs). However, additional SNPs likely contribute to breast cancer susceptibility and overall genetic risk, prompting this investigation for additional variants. Six putative breast cancer susceptibility SNPs identified in a two-stage GWAS that we reported earlier were replicated in a follow-up stage 3 study using an independent set of breast cancer cases and controls from Canada, with an overall cumulative sample size of 7,219 subjects across all three stages. The study design also encompassed the 11 variants from GWASs previously reported by various consortia between the years 2007-2009 to (i) enable comparisons of effect sizes, and (ii) identify putative prognostic variants across studies. All SNP associations reported with breast cancer were also adjusted for body mass index (BMI). We report a strong association with 4q31.22-rs1429142 (combined per allele odds ratio and 95% confidence interval = 1.28 [1.17-1.41] and P combined = 1.5×10(-7)), when adjusted for BMI. Ten of the 11 breast cancer susceptibility loci reported by consortia also showed associations in our predominantly Caucasian study population, and the associations were independent of BMI; four FGFR2 SNPs and TNRC9-rs3803662 were among the most notable associations. Since the original report by Garcia-Closas et al. 2008, this is the second study to confirm the association of 8q24.21-rs13281615 with breast cancer outcomes.   http://www.ncbi.nlm.nih.gov/pubmed/23717390