supplementary information: frequent mutations of ... supplementary information: frequent mutations...

Download Supplementary Information: Frequent mutations of ... Supplementary Information: Frequent mutations of

Post on 05-Jul-2020




0 download

Embed Size (px)


  • Supplementary Information:

    Frequent mutations of genes encoding ubiquitin-mediated proteolysis

    pathway components in clear cell renal cell carcinoma

    Guangwu Guo1,10, Yaoting Gui2,10, Shengjie Gao1,10, Aifa Tang2,3,10, Xueda Hu1,10, Yi

    Huang2,3,10, Wenlong Jia1, Zesong Li2,3, Minghui He1, Liang Sun2, Pengfei Song1,

    Xiaojuan Sun3, Xiaokun Zhao4, Sangming Yang1, Chaozhao Liang5, Shengqing Wan1,

    Fangjian Zhou6, Chao Chen1, Jialou Zhu1,7, Xianxin Li2, Minghan Jian1, Liang Zhou2,

    Rui Ye1, Peide Huang1, Jing Chen2, Xiao Liu1, Yong Wang2, Jing Zou1, Zhimao Jiang2,

    RenHua Wu1, Song Wu2, Fan Fan1, Zhongfu Zhang2, Lin Liu1, Ruilin Yang2, Xingwang

    Liu1, Haibo Wu1, Weihua Yin2, Xia Zhao1, Yuchen Liu2, Huanhuan Peng1, Binghua Jiang2,

    Qingxin Feng2, Cailing Li2, Jun Xie2, Jingxiao Lu2, Karsten Kristiansen1,8, Yingrui Li1,

    Xiuqing Zhang1, Songgang Li1, Jian Wang1, Huanming Yang1, Zhiming Cai2,3 & Jun


    1Shenzhen Key Laboratory of Transomics Biotechnologies, BGI-Shenzhen, Shenzhen 518083,

    China. 2Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics,

    Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical

    Center, Shenzhen 518036, China.

    3Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University,

    Shenzhen 518035, China. 4Department of Urology, the Second Xiangya Hospital of Central-Southern University, Changsha

    410011, China.

    5Department of Urology, the First Affiliated Hospital of Anhui Medical University, Hefei 230022,

    China. 6Department of Urology, Sun Yat-Sen University Cancer Center, Guangzhou 510060, China. 7College of Life Science, Wuhan University, Wuhan, 430072, China.

    8Department of Biology, University of Copenhagen, DK-1165 Copenhagen, Denmark. 9The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen,

    2200 Copenhagen, Denmark.

    Nature Genetics: doi:10.1038/ng.1014

  • 10These authors contributed equally to this work.

    Correspondence should be addressed to Ju.W. (, Z.C.

    ( and H.Y. (

    Nature Genetics: doi:10.1038/ng.1014���

  • Supplementary Methods

    Sample description and preparation

    Tumors with matched normal controls (morphologically adjacent normal kidney tissues

    cut at least 5 cm away from the boundaries of the primary tumors) were obtained from

    patients with clear cell renal cell carcinoma (ccRCC) newly diagnosed at member

    institutions of Urinogenital Cancer Genomics Consortium (UCGC) in China. A signed

    written consent from each patient was obtained before the recruitment in the study

    according to the regulations of the institutional ethics review boards. Detailed clinical

    information on the patients is summarized in Supplementary Table 1. All the specimens

    were snap-frozen in liquid nitrogen upon collection and immediately stored at -80℃ for

    further study. The hematoxylin-eosin (HE)-stained sections prepared using the cancerous

    or apparently normal tissues were microscopically evaluated by two independent

    pathologists. Typically, the tumor cells with clear transparent cytoplasm and well-defined

    cell membrane were interspersed within the highly vascularized stroma. The normal

    control samples were characterized by the presence of normal renal tubules and

    glomeruluses under each microscopic field, without any notable traces of tumor cell

    contamination. In this study, only ccRCCs with malignant cell purities over 85% were

    selected for DNA extraction and subsequent sequencing.

    Genomic DNA extraction and whole-exome sequencing

    In the Discovery Screen, genomic DNAs of tumor and matched normal samples from 10

    ccRCC patients were isolated using QIAamp DNA Mini Kits (QIAGEN, Hilden,

    Germany) according to the protocol provided by the manufacturer. Genomic DNAs were

    then fragmented and hybridized to NimbleGen 2.1M Human Exome Arrays (Roche

    NimbleGen, Inc, USA), which were capable of enriching the exonic sequences of more

    than 18, 000 protein-coding genes deposited in the highly curated database of Consensus

    Coding Sequence Region (

    In brief, all the extracted genomic DNAs were randomly sonicated to a smear of 300

    bp ~ 800 bp for polishing with T4 DNA polymerase and T4 polynucleotide kinase.

    Nature Genetics: doi:10.1038/ng.1014�

  • NimbleGen linkers were added to the polished DNA fragments by T4 DNA ligase. The

    ligated products were then hybridized to the capture array according to the manufacturer’s

    protocol, and the enriched DNA fragments were eluted and amplified by ligation-mediated

    PCR through the linkers added to exonic DNA fragments. Before the second run of

    library construction, qPCR reactions were performed to estimate the enrichment rate of

    exonic sequences. A total of four targeted exons were selected for this evaluation. The

    minimum requirement of 80-fold enrichment was achieved for all libraries prepared for

    the next procedure. The enriched exonic DNA was randomly ligated with blunt-ends by

    DNA ligase to fragments ranging from 2 kb to 5 kb in size. The resulting DNA products

    were sheared to 200 bps on average and were subjected to standard Illumina Genome

    Analyzer (GA) library preparation according to Illumina’s protocol. The exome-enriched

    shotgun libraries were sequenced with the Illumina GA Ⅱ platform and single-end reads

    with average size of 80 bps were generated. Image analysis and base calling was performed

    by the Genome Analyzer Pipeline version 1.3 with default parameters.

    Illumina-based exon resequencing of selected genes

    In the Prevalence Screen, we determined the mutation frequencies of selected genes

    (Supplementary Table 4) in 88 additional ccRCC patients by Illumina-based exonic

    resequencing. These selected genes included: 1) 234 genes that had at least one non-silent

    somatic mutation in the discovery stage; 2) 413 genes that have been causally implicated

    in human cancers (Cancer Gene Census,; 3)

    367 genes previously reported to harbor mutations in ccRCC (Database of COSMIC1) ;

    and 4) 113 genes in the ubiquitin-mediated proteolysis pathway (135 genes have been

    annotated in this pathway, of which 22 showed somatic mutations in above three

    categories). All the exonic regions of these 1127 genes were submitted to NimbleGen for

    the manufacturing of the targeted capture arrays. Genomic DNA from tumors and

    matched normal samples was then sonicated and hybridized to the arrays followed by the

    standard Illumina-based resequencing procedures as described above.

    Reads mapping and detection of somatic mutations

    After removing reads containing sequencing adaptors and low-quality reads with more

    Nature Genetics: doi:10.1038/ng.1014�

  • than five unknown bases, the high quality reads were aligned to the NCBI human

    reference genome (hg18) using MAQ2 with the default options. To identify indels, the

    high quality reads were gapped aligned to the reference sequence using BWA3. Then, we

    performed local realignment of the BWA aligned reads using the Genome Analysis

    Toolkit (GATK)4.

    The raw lists of potential somatic substitutions were called by VarScan5 (v2.2) based

    on the MAQ alignments. In this process, several heuristic rules were applied: (i) both the

    tumors and matched normal samples should be covered sufficiently (≥ 10×) at the

    genomic position being compared; (ii) the average base quality for a given genomic

    position should be at least 15 in both the tumors and normal samples; (iii) the variants

    should be supported by at least 10% of the total reads in the tumors while no high quality

    variant-supporting reads are allowed in normal controls; (iv) the variants should be

    supported by at least five reads in the tumors. Using the same criteria, the preliminary

    lists of somatic indels was called out by GATK based on the local realignment results.

    After these two steps, germline variants could be effectively removed.

    To further reduce the false positive calls, variations including single nucleotide

    variants (SNVs) and indels were called with the SAMtools software package in the

    tumors. We eliminated all somatic variants that fulfill any one of the following filtering

    criterion: (i) variants with Phred-like scaled consensus scores or SNP qualities < 20; (ii)

    variants with mapping qualities < 30; (iii) indels represented by only one DNA strand; (iv)

    substitutions located 30bp around predicted indels. To deal with false positives associated

    with pseudo gene issues or repeat sequences, simulated reads (80bp in length) containing

    the potential mutations were generated and aligned to the reference genome. For a given

    variants, if more than 1


View more >