生物信息學/HumMeth27QCReport

HumMeth27QCReport

編輯

HumMeth27QCReport是Illumina的Infinium BeadChip甲基化晶片的質控工具,是由CRG Genotyping UnitCRG Bioinformatics Core合作開發的R包。

HumMeth27QCReport包的CRAN下載地址

安裝HumMeth27QCReport

編輯

軟體包下載地址:HumMeth27QCReport

安裝前需要安裝一些依賴包:

# 检查是否安装BiocManager软件包
if (!requireNamespace("BiocManager", quietly=TRUE))
  install.packages("BiocManager")

# 安装依赖的R包
install.packages(c("amap", "tcltk2"))
BiocManager::install(c("IlluminaHumanMethylation27k.db","FDb.InfiniumMethylation.hg18","FDb.InfiniumMethylation.hg19"))

# 安装HumMeth27QCReport
install.packages("path/to/HumMeth27QCReport_1.2.15.tar.gz", repos = NULL, type = "source")

HumMeth27QCReport包的函數和參數

編輯

ImportData()

編輯
ImportData(Dir)

參數

編輯
Dir 輸入文件所在文件夾,也是輸出文件夾

返回值

編輯

包含三或四個數據框的列表,每個樣本一個文件,與樣本同名的pdf文件。.

QCCheck()

編輯
QCCheck(ImportDataR, pval)

參數

編輯
ImportDataR ImportData函數的結果;
pval p值的閾值,篩選進行標準化和後續分析的樣本;

返回值

編輯
三個不同的質控圖 第一個是methylumi包的"plotSampleIntensities"函數繪製的Intensity圖;

第二個是未檢測到的CPG的直方圖百分比(即CPGs的檢測p值大於0.05或0.01);.

每個樣本平均p值的直方圖。.

名為QualityCheck.pdf的文件中包含所有圖形

三個數據框的列表 1. 分析結果匯總;2. CPGs的檢測p值大於0.05或;3. 大於 0.01;

NormCheck()

編輯
NormCheck(ImportDataR, platform, pval, ChrX, ClustMethod, normMethod)

參數

編輯
ImportDataR ImportData函數的結果;
platform 平台類型,可選值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);
pval p值閾值
ClustMethod 聚類方法,可選值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall";
ChrX 是否將X染色體上的CpGs用於分析,默認FALSE,否;TRUE,是。
normMethod 標準化方法,可選值 "quantile" 或 "ssn",參看lumi包中的lumiMethyN()函數文檔默認為 "quantile";

返回值

編輯
圖形 標準化後Beta值的PCA圖;

標準化後Beta值的聚類圖。

所有圖形的名為ExplorativeAnalysis.pdf的pdf文件。

數據框 標準化的M值的data.frame

HumMeth27QCReport()

編輯
HumMeth27QCReport(ImportDataR, platform, pval, ChrX, ClustMethod, quoteOutput, normMethod)

參數

編輯
ImportDataR ImportData函數的結果;
platform 平台類型,可選值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);
pval p值閾值
ClustMethod 聚類方法,可選值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall";
ChrX 是否將X染色體上的CpGs用於分析,默認FALSE,否;TRUE,是。
quoteOutput 如果標準化後的數據里有非數值項,是否加引號,默認是TRUE是。FALSE否。
normMethod 標準化方法,可選值 "quantile" 或 "ssn",參看lumi包中的lumiMethyN()函數文檔默認為 "quantile";

返回值

編輯

質控圖形和標準化後的Beta值矩陣

getAssayControls()

編輯
getAssayControls(ImportDataR, platform)

參數

編輯
ImportDataR ImportData函數的結果;
platform 平台類型,可選值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip);

返回值

編輯

八個質控圖

getFileSepChar()

編輯
getFileSepChar(File)

參數

編輯
File 可讀文本文件的名稱

返回值

編輯

文件的分隔符號

輸入文件的準備

編輯

Sample table文件示例:SampleTable.txt

編輯
Index Sample ID Sample Group Sentrix Barcode Sample Section Detected Genes (0.01) Detected Genes (0.05) Signal Average GRN Signal Average RED Signal P05 GRN Signal P05 RED Signal P25 GRN Signal P25 RED Signal P50 GRN Signal P50 RED Signal P75 GRN Signal P75 RED Signal P95 GRN Signal P95 RED Sample_Plate Sample_Well
1 Hela_1 Hela Hela 1 27571 27572 6032.731 3968.032 0 0 343 330 941 617 8394 2075 27229 23051 DemoData A01
2 Hela_2 Hela Hela 2 27573 27574 6732.99 3863.289 0 0 359 327 1002 586 9288 1899 30716 22657 DemoData B01
3 Raji_1 Raji Raji 1 27528 27548 6082.553 3778.055 0 0 614 569 1228 911 7691 2150 27401 21009 DemoData C01
4 Raji_2 Raji Raji 2 27568 27577 6632.926 3756.684 0 0 510 463 1179 750 8460 1944 30422 21667 DemoData D01
5 Jurkat_1 Jurkat Jurkat 1 27534 27545 6828.716 3911.746 0 0 519 458 1227 766 8717 2081 30918 22437 DemoData E01
6 Jurkat_2 Jurkat Jurkat 2 27514 27529 7014.263 3903.508 0 0 449 411 1175 695 9040 2026 31989 22589 DemoData F01
7 A431_1 A431 A431 1 27575 27576 6763.988 3758.975 0 0 426 357 1088 616 10578 2192 29410 20948 DemoData G01
8 A431_2 A431 A431 2 27575 27576 6633.289 3840.406 0 0 409 382 1063 645 10428 2284 28832 21356 DemoData H01
9 K562_1 K562 K562 1 27530 27538 6425.563 3720.365 0 0 400 379 995 639 9058 1775 29100 21597 DemoData A02
10 K562_2 K562 K562 2 27535 27543 6547.497 3736.386 0 0 405 413 1015 675 9282 1819 29531 21463 DemoData B02

Control table文件示例:ControlProbeProfile.txt

編輯
Index TargetID ProbeID <Sn>.Signal_Grn <Sn>.Signal_Red <Sn>.Detection Pval ... ... ...

<Sn>: = Sample Name

Index TargetID ProbeID Hela_1.Signal_Grn Hela_1.Signal_Red Hela_1.Detection Pval Hela_2.Signal_Grn Hela_2.Signal_Red Hela_2.Detection Pval
1 BISULFITE CONVERSION 4670278 13997 661 3.68E-38 14496 738 3.68E-38
2 BISULFITE CONVERSION 4670484 506 513 1.98E-05 613 502 3.56E-09
3 BISULFITE CONVERSION 5270706 13007 578 3.68E-38 14583 575 3.68E-38
4 BISULFITE CONVERSION 5290048 346 467 0.012058 337 353 0.050596
5 EXTENSION 360446 1262 48739 3.68E-38 1226 49133 3.68E-38
6 EXTENSION 520537 1588 65535 3.68E-38 1498 65535 3.68E-38
7 EXTENSION 1190050 39316 2572 3.68E-38 46292 2545 3.68E-38
8 EXTENSION 2630184 65500 1656 3.68E-38 65535 1355 3.68E-38
9 HYBRIDIZATION 2450040 2637 371 3.68E-38 2821 321 3.68E-38
10 HYBRIDIZATION 5690072 23628 527 3.68E-38 28406 635 3.68E-38
11 HYBRIDIZATION 5690110 10321 247 3.68E-38 11464 286 3.68E-38
12 NEGATIVE 50110 230 295 0.63193 289 427 0.029181
13 NEGATIVE 360079 258 304 0.501571 135 293 0.820958
14 NEGATIVE 430114 161 353 0.668574 169 245 0.854451
15 NEGATIVE 460494 145 359 0.700552 281 191 0.687725
16 NEGATIVE 540577 163 322 0.7571 121 281 0.879578
17 NEGATIVE 610692 110 356 0.807308 159 360 0.512171
18 NEGATIVE 610706 152 383 0.597531 141 336 0.670246
19 NEGATIVE 670750 184 260 0.856797 132 290 0.835865
20 NEGATIVE 1190458 162 389 0.540998 173 368 0.426892
21 NEGATIVE 1500059 299 434 0.062366 226 440 0.080044
22 NEGATIVE 1500167 182 545 0.069276 287 390 0.065245
23 NEGATIVE 1500398 165 258 0.895271 194 341 0.449981
24 NEGATIVE 1660097 163 336 0.715998 158 241 0.885352
25 NEGATIVE 1770019 264 362 0.283621 250 265 0.527731
26 NEGATIVE 1940364 152 439 0.398556 168 388 0.370417
27 NEGATIVE 1990692 220 594 0.011778 208 407 0.182252
28 NON-POLYMORPHIC 110184 21912 773 3.68E-38 23847 525 3.68E-38
29 NON-POLYMORPHIC 1740025 1242 12619 3.68E-38 1267 11792 3.68E-38
30 NON-POLYMORPHIC 2480348 1690 22906 3.68E-38 2112 24975 3.68E-38
31 NON-POLYMORPHIC 2810035 16531 451 3.68E-38 18677 395 3.68E-38
32 SPECIFICITY 3800086 249 463 0.089121 330 389 0.027287
33 SPECIFICITY 3800154 13051 1019 3.68E-38 12701 937 3.68E-38
34 SPECIFICITY 4610400 356 461 0.010974 299 523 0.001706
35 SPECIFICITY 4610725 1207 23986 3.68E-38 1347 23608 3.68E-38
36 STAINING 4200736 1167 33466 3.68E-38 1294 27325 3.68E-38
37 STAINING 4570020 22107 429 3.68E-38 23868 413 3.68E-38
38 STAINING 5050601 778 732 7.48E-18 856 1077 3.68E-38
39 STAINING 5340168 971 469 1.42E-15 983 527 2.62E-22
40 TARGET REMOVAL 580035 331 317 0.22061 337 402 0.017107

BetaAverage table文件示例:AvgBeta.txt

編輯
Index TargetID <Sn>.AVG_Beta <Sn>.Intensity <Sn>.Signal_A <Sn>.Signal_B <Sn>.BEAD_STDERR_A <Sn>.BEA

<Sn>: = Sample Name

Index TargetID Hela_1.AVG_Beta Hela_1.Intensity Hela_1.Signal_A Hela_1.Signal_B Hela_1.BEAD_STDERR_A Hela_1.BEAD_STDERR_B Hela_1.Avg_NBEADS_A Hela_1.Avg_NBEADS_B Hela_1.Detection Pval SYMBOL
3 cg00003994 0.02954 12967 12581 386 511.197 38.65517 28 18 3.68E-38 MEOX2
4 cg00005847 0.823062 10966 1858 9108 69.19808 533.973 24 20 3.68E-38 HOXD3
6 cg00007981 0.027668 19345 18807 538 1055.73 58.04218 24 19 3.68E-38 PANX1
7 cg00008493 0.962965 18450 587 17863 36.4771 746.25 19 16 3.68E-38 COX8C
8 cg00008713 0.033844 30245 29218 1027 1039.59 90.22325 21 17 3.68E-38 IMPA2
10 cg00010193 0.553988 55690 24783 30907 1014.98 1716.425 15 17 3.68E-38 FLJ35816
11 cg00011459 0.938229 10018 525 9493 56.12486 454.7543 14 17 3.68E-38 PMM2
13 cg00012386 0.020579 33284 32597 687 2174.17 58.45416 14 18 3.68E-38 C1orf142
14 cg00012792 0.040012 36889 35409 1480 1097.253 136.5769 32 23 3.68E-38 TXNDC5
17 cg00014837 0.884958 8097 843 7254 77.45966 507.9264 15 19 3.68E-38 ACRBP

Discard.txt

編輯

想要丟棄的樣本,一個樣本名一行。

運行HumMeth27QCReport

編輯

加載R包

編輯
library(HumMeth27QCReport)
Loading required package: methylumi
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: BiocGenericsThe following objects are masked from package:parallel:

    clusterApply, clusterApplyLB, clusterCall,
    clusterEvalQ, clusterExport, clusterMap,
    parApply, parCapply, parLapply, parLapplyLB,
    parRapply, parSapply, parSapplyLB

The following objects are masked from package:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked from package:base:

    anyDuplicated, append, as.data.frame, basename,
    cbind, colnames, dirname, do.call, duplicated,
    eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view
    with 'browseVignettes()'. To cite Bioconductor,
    see 'citation("Biobase")', and for packages
    'citation("pkgname")'.

Loading required package: scales
Loading required package: reshape2
Loading required package: ggplot2
Loading required package: matrixStats

Attaching package: matrixStatsThe following objects are masked from package:Biobase:

    anyMissing, rowMedians

Loading required package: FDb.InfiniumMethylation.hg19
Loading required package: GenomicFeatures
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: S4VectorsThe following object is masked from package:base:

    expand.grid

Loading required package: IRanges

Attaching package: IRangesThe following object is masked from package:grDevices:

    windows

Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
Loading required package: org.Hs.eg.db

Loading required package: minfi
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics

Attaching package: MatrixGenericsThe following objects are masked from package:matrixStats:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet,
    colCollapse, colCounts, colCummaxs, colCummins,
    colCumprods, colCumsums, colDiffs, colIQRDiffs,
    colIQRs, colLogSumExps, colMadDiffs, colMads,
    colMaxs, colMeans2, colMedians, colMins,
    colOrderStats, colProds, colQuantiles,
    colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars,
    colWeightedMads, colWeightedMeans,
    colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys,
    rowAvgsPerColSet, rowCollapse, rowCounts,
    rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs,
    rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
    rowMeans2, rowMedians, rowMins, rowOrderStats,
    rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates,
    rowVarDiffs, rowVars, rowWeightedMads,
    rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

The following object is masked from package:Biobase:

    rowMedians

Loading required package: Biostrings
Loading required package: XVector

Attaching package: BiostringsThe following object is masked from package:base:

    strsplit

Loading required package: bumphunter
Loading required package: foreach
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.4 	 2020-03-24
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Loading required package: lumi
No methods found in package RSQLite for request: dbListFields when loading lumiAttaching package: lumiThe following objects are masked from package:methylumi:

    estimateM, getHistory

Loading required package: IlluminaHumanMethylation27k.db

Loading required package: amap
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula

Attaching package: HmiscThe following objects are masked from package:Biostrings:

    mask, translate

The following object is masked from package:AnnotationDbi:

    contents

The following object is masked from package:Biobase:

    contents

The following objects are masked from package:base:

    format.pval, units

Loading required package: gplots

Attaching package: gplotsThe following object is masked from package:IRanges:

    space

The following object is masked from package:S4Vectors:

    space

The following object is masked from package:stats:

    lowess

Loading required package: plotrix

Attaching package: plotrixThe following object is masked from package:gplots:

    plotCI

The following object is masked from package:scales:

    rescale

Loading required package: WriteXLS
Loading required package: tcltk2
Loading required package: tcltk

Attaching package: tcltk2The following objects are masked from package:Hmisc:

    label, label<-

The following objects are masked from package:SummarizedExperiment:

    values, values<-

The following objects are masked from package:GenomicRanges:

    values, values<-

The following objects are masked from package:IRanges:

    values, values<-

The following objects are masked from package:S4Vectors:

    values, values<-

Warning messages:
1: replacing previous import FDb.InfiniumMethylation.hg18::get27k by FDb.InfiniumMethylation.hg19::get27k when loading HumMeth27QCReport 
2: replacing previous import FDb.InfiniumMethylation.hg18::get450k by FDb.InfiniumMethylation.hg19::get450k when loading HumMeth27QCReport 
3: replacing previous import FDb.InfiniumMethylation.hg18::getNearestTSS by FDb.InfiniumMethylation.hg19::getNearestTSS when loading HumMeth27QCReport 
4: replacing previous import FDb.InfiniumMethylation.hg18::getNearest by FDb.InfiniumMethylation.hg19::getNearest when loading HumMeth27QCReport 
5: replacing previous import FDb.InfiniumMethylation.hg18::getNearestGene by FDb.InfiniumMethylation.hg19::getNearestGene when loading HumMeth27QCReport 
6: replacing previous import FDb.InfiniumMethylation.hg18::getNearestTranscript by FDb.InfiniumMethylation.hg19::getNearestTranscript when loading HumMeth27QCReport 
7: replacing previous import FDb.InfiniumMethylation.hg18::getPlatform by FDb.InfiniumMethylation.hg19::getPlatform when loading HumMeth27QCReport 
8: replacing previous import Hmisc::label<- by tcltk2::label<- when loading HumMeth27QCReport 
9: replacing previous import Hmisc::label by tcltk2::label when loading HumMeth27QCReport

NormCheck()運行示例

編輯
Dir <- system.file("extdata/",package="HumMeth27QCReport")
ImportDataR <- ImportData(Dir)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")
Perform quantile color balance adjustment ...
Processing sample Hela_1 ...
Processing sample Hela_2 ...
Processing sample Raji_1 ...
Processing sample Raji_2 ...
Processing sample Jurkat_1 ...
Processing sample Jurkat_2 ...
Processing sample A431_1 ...
Processing sample A431_2 ...
Processing sample K562_1 ...
Processing sample K562_2 ...
Perform quantile normalization ...
Warning message:
In prcomp.default(t(data.nona), tol = 0.1, na.action = na.omit, 
    center = T, scale = T) :
 extra argument na.action will be disregarded

QCCheck()運行示例

編輯
ControlResults <- getAssayControls(ImportDataR,platform="Hum27")
QCresults <- QCCheck(ImportDataR, pval=0.05)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")#结果同上
The purpose of this method is better served by diagnostics()

結果解讀

編輯

輸出解讀

質控圖解讀

後續分析

編輯

與450k晶片後續分析相同