生物信息学/HumMeth27QCReport
< 生物信息学
HumMeth27QCReport
编辑HumMeth27QCReport是Illumina的Infinium BeadChip甲基化芯片的质控工具,是由CRG Genotyping Unit和CRG Bioinformatics Core合作开发的R包。
HumMeth27QCReport包的CRAN下载地址
安装HumMeth27QCReport
编辑软件包下载地址:HumMeth27QCReport
安装前需要安装一些依赖包:
# 检查是否安装BiocManager软件包
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
# 安装依赖的R包
install.packages(c("amap", "tcltk2"))
BiocManager::install(c("IlluminaHumanMethylation27k.db","FDb.InfiniumMethylation.hg18","FDb.InfiniumMethylation.hg19"))
# 安装HumMeth27QCReport
install.packages("path/to/HumMeth27QCReport_1.2.15.tar.gz", repos = NULL, type = "source")
HumMeth27QCReport包的函数和参数
编辑ImportData()
编辑ImportData(Dir)
参数
编辑Dir | 输入文件所在文件夹,也是输出文件夹 |
返回值
编辑包含三或四个数据框的列表,每个样本一个文件,与样本同名的pdf文件。.
QCCheck()
编辑QCCheck(ImportDataR, pval)
参数
编辑ImportDataR | ImportData函数的结果; |
pval | p值的阈值,筛选进行标准化和后续分析的样本; |
返回值
编辑三个不同的质控图 | 第一个是methylumi包的"plotSampleIntensities"函数绘制的Intensity图;
第二个是未检测到的CPG的直方图百分比(即CPGs的检测p值大于0.05或0.01);. 每个样本平均p值的直方图。. 名为QualityCheck.pdf的文件中包含所有图形 |
三个数据框的列表 | 1. 分析结果汇总;2. CPGs的检测p值大于0.05或;3. 大于 0.01; |
NormCheck()
编辑NormCheck(ImportDataR, platform, pval, ChrX, ClustMethod, normMethod)
参数
编辑ImportDataR | ImportData函数的结果; |
platform | 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip); |
pval | p值阈值 |
ClustMethod | 聚类方法,可选值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall"; |
ChrX | 是否将X染色体上的CpGs用于分析,默认FALSE,否;TRUE,是。 |
normMethod | 标准化方法,可选值 "quantile" 或 "ssn",参看lumi包中的lumiMethyN()函数文档默认为 "quantile"; |
返回值
编辑图形 | 标准化后Beta值的PCA图;
标准化后Beta值的聚类图。 所有图形的名为ExplorativeAnalysis.pdf的pdf文件。 |
数据框 | 标准化的M值的data.frame |
HumMeth27QCReport()
编辑HumMeth27QCReport(ImportDataR, platform, pval, ChrX, ClustMethod, quoteOutput, normMethod)
参数
编辑ImportDataR | ImportData函数的结果; |
platform | 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip); |
pval | p值阈值 |
ClustMethod | 聚类方法,可选值有 "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" , "kendall"; |
ChrX | 是否将X染色体上的CpGs用于分析,默认FALSE,否;TRUE,是。 |
quoteOutput | 如果标准化后的数据里有非数值项,是否加引号,默认是TRUE是。FALSE否。 |
normMethod | 标准化方法,可选值 "quantile" 或 "ssn",参看lumi包中的lumiMethyN()函数文档默认为 "quantile"; |
返回值
编辑质控图形和标准化后的Beta值矩阵
getAssayControls()
编辑getAssayControls(ImportDataR, platform)
参数
编辑ImportDataR | ImportData函数的结果; |
platform | 平台类型,可选值有 "Hum27" (Infinium HumanMethylation27 BeadChip) 或 "Hum450" (Infinium HumanMethylation450 BeadChip); |
返回值
编辑八个质控图
getFileSepChar()
编辑getFileSepChar(File)
参数
编辑File | 可读文本文件的名称 |
返回值
编辑文件的分隔符号
输入文件的准备
编辑Sample table文件示例:SampleTable.txt
编辑Index | Sample ID | Sample Group | Sentrix Barcode | Sample Section | Detected Genes (0.01) | Detected Genes (0.05) | Signal Average GRN | Signal Average RED | Signal P05 GRN | Signal P05 RED | Signal P25 GRN | Signal P25 RED | Signal P50 GRN | Signal P50 RED | Signal P75 GRN | Signal P75 RED | Signal P95 GRN | Signal P95 RED | Sample_Plate | Sample_Well |
1 | Hela_1 | Hela | Hela | 1 | 27571 | 27572 | 6032.731 | 3968.032 | 0 | 0 | 343 | 330 | 941 | 617 | 8394 | 2075 | 27229 | 23051 | DemoData | A01 |
2 | Hela_2 | Hela | Hela | 2 | 27573 | 27574 | 6732.99 | 3863.289 | 0 | 0 | 359 | 327 | 1002 | 586 | 9288 | 1899 | 30716 | 22657 | DemoData | B01 |
3 | Raji_1 | Raji | Raji | 1 | 27528 | 27548 | 6082.553 | 3778.055 | 0 | 0 | 614 | 569 | 1228 | 911 | 7691 | 2150 | 27401 | 21009 | DemoData | C01 |
4 | Raji_2 | Raji | Raji | 2 | 27568 | 27577 | 6632.926 | 3756.684 | 0 | 0 | 510 | 463 | 1179 | 750 | 8460 | 1944 | 30422 | 21667 | DemoData | D01 |
5 | Jurkat_1 | Jurkat | Jurkat | 1 | 27534 | 27545 | 6828.716 | 3911.746 | 0 | 0 | 519 | 458 | 1227 | 766 | 8717 | 2081 | 30918 | 22437 | DemoData | E01 |
6 | Jurkat_2 | Jurkat | Jurkat | 2 | 27514 | 27529 | 7014.263 | 3903.508 | 0 | 0 | 449 | 411 | 1175 | 695 | 9040 | 2026 | 31989 | 22589 | DemoData | F01 |
7 | A431_1 | A431 | A431 | 1 | 27575 | 27576 | 6763.988 | 3758.975 | 0 | 0 | 426 | 357 | 1088 | 616 | 10578 | 2192 | 29410 | 20948 | DemoData | G01 |
8 | A431_2 | A431 | A431 | 2 | 27575 | 27576 | 6633.289 | 3840.406 | 0 | 0 | 409 | 382 | 1063 | 645 | 10428 | 2284 | 28832 | 21356 | DemoData | H01 |
9 | K562_1 | K562 | K562 | 1 | 27530 | 27538 | 6425.563 | 3720.365 | 0 | 0 | 400 | 379 | 995 | 639 | 9058 | 1775 | 29100 | 21597 | DemoData | A02 |
10 | K562_2 | K562 | K562 | 2 | 27535 | 27543 | 6547.497 | 3736.386 | 0 | 0 | 405 | 413 | 1015 | 675 | 9282 | 1819 | 29531 | 21463 | DemoData | B02 |
Control table文件示例:ControlProbeProfile.txt
编辑Index | TargetID | ProbeID | <Sn>.Signal_Grn | <Sn>.Signal_Red | <Sn>.Detection Pval | ... | ... | ... |
<Sn>: = Sample Name
Index | TargetID | ProbeID | Hela_1.Signal_Grn | Hela_1.Signal_Red | Hela_1.Detection Pval | Hela_2.Signal_Grn | Hela_2.Signal_Red | Hela_2.Detection Pval |
1 | BISULFITE CONVERSION | 4670278 | 13997 | 661 | 3.68E-38 | 14496 | 738 | 3.68E-38 |
2 | BISULFITE CONVERSION | 4670484 | 506 | 513 | 1.98E-05 | 613 | 502 | 3.56E-09 |
3 | BISULFITE CONVERSION | 5270706 | 13007 | 578 | 3.68E-38 | 14583 | 575 | 3.68E-38 |
4 | BISULFITE CONVERSION | 5290048 | 346 | 467 | 0.012058 | 337 | 353 | 0.050596 |
5 | EXTENSION | 360446 | 1262 | 48739 | 3.68E-38 | 1226 | 49133 | 3.68E-38 |
6 | EXTENSION | 520537 | 1588 | 65535 | 3.68E-38 | 1498 | 65535 | 3.68E-38 |
7 | EXTENSION | 1190050 | 39316 | 2572 | 3.68E-38 | 46292 | 2545 | 3.68E-38 |
8 | EXTENSION | 2630184 | 65500 | 1656 | 3.68E-38 | 65535 | 1355 | 3.68E-38 |
9 | HYBRIDIZATION | 2450040 | 2637 | 371 | 3.68E-38 | 2821 | 321 | 3.68E-38 |
10 | HYBRIDIZATION | 5690072 | 23628 | 527 | 3.68E-38 | 28406 | 635 | 3.68E-38 |
11 | HYBRIDIZATION | 5690110 | 10321 | 247 | 3.68E-38 | 11464 | 286 | 3.68E-38 |
12 | NEGATIVE | 50110 | 230 | 295 | 0.63193 | 289 | 427 | 0.029181 |
13 | NEGATIVE | 360079 | 258 | 304 | 0.501571 | 135 | 293 | 0.820958 |
14 | NEGATIVE | 430114 | 161 | 353 | 0.668574 | 169 | 245 | 0.854451 |
15 | NEGATIVE | 460494 | 145 | 359 | 0.700552 | 281 | 191 | 0.687725 |
16 | NEGATIVE | 540577 | 163 | 322 | 0.7571 | 121 | 281 | 0.879578 |
17 | NEGATIVE | 610692 | 110 | 356 | 0.807308 | 159 | 360 | 0.512171 |
18 | NEGATIVE | 610706 | 152 | 383 | 0.597531 | 141 | 336 | 0.670246 |
19 | NEGATIVE | 670750 | 184 | 260 | 0.856797 | 132 | 290 | 0.835865 |
20 | NEGATIVE | 1190458 | 162 | 389 | 0.540998 | 173 | 368 | 0.426892 |
21 | NEGATIVE | 1500059 | 299 | 434 | 0.062366 | 226 | 440 | 0.080044 |
22 | NEGATIVE | 1500167 | 182 | 545 | 0.069276 | 287 | 390 | 0.065245 |
23 | NEGATIVE | 1500398 | 165 | 258 | 0.895271 | 194 | 341 | 0.449981 |
24 | NEGATIVE | 1660097 | 163 | 336 | 0.715998 | 158 | 241 | 0.885352 |
25 | NEGATIVE | 1770019 | 264 | 362 | 0.283621 | 250 | 265 | 0.527731 |
26 | NEGATIVE | 1940364 | 152 | 439 | 0.398556 | 168 | 388 | 0.370417 |
27 | NEGATIVE | 1990692 | 220 | 594 | 0.011778 | 208 | 407 | 0.182252 |
28 | NON-POLYMORPHIC | 110184 | 21912 | 773 | 3.68E-38 | 23847 | 525 | 3.68E-38 |
29 | NON-POLYMORPHIC | 1740025 | 1242 | 12619 | 3.68E-38 | 1267 | 11792 | 3.68E-38 |
30 | NON-POLYMORPHIC | 2480348 | 1690 | 22906 | 3.68E-38 | 2112 | 24975 | 3.68E-38 |
31 | NON-POLYMORPHIC | 2810035 | 16531 | 451 | 3.68E-38 | 18677 | 395 | 3.68E-38 |
32 | SPECIFICITY | 3800086 | 249 | 463 | 0.089121 | 330 | 389 | 0.027287 |
33 | SPECIFICITY | 3800154 | 13051 | 1019 | 3.68E-38 | 12701 | 937 | 3.68E-38 |
34 | SPECIFICITY | 4610400 | 356 | 461 | 0.010974 | 299 | 523 | 0.001706 |
35 | SPECIFICITY | 4610725 | 1207 | 23986 | 3.68E-38 | 1347 | 23608 | 3.68E-38 |
36 | STAINING | 4200736 | 1167 | 33466 | 3.68E-38 | 1294 | 27325 | 3.68E-38 |
37 | STAINING | 4570020 | 22107 | 429 | 3.68E-38 | 23868 | 413 | 3.68E-38 |
38 | STAINING | 5050601 | 778 | 732 | 7.48E-18 | 856 | 1077 | 3.68E-38 |
39 | STAINING | 5340168 | 971 | 469 | 1.42E-15 | 983 | 527 | 2.62E-22 |
40 | TARGET REMOVAL | 580035 | 331 | 317 | 0.22061 | 337 | 402 | 0.017107 |
BetaAverage table文件示例:AvgBeta.txt
编辑Index | TargetID | <Sn>.AVG_Beta | <Sn>.Intensity | <Sn>.Signal_A | <Sn>.Signal_B | <Sn>.BEAD_STDERR_A | <Sn>.BEA |
<Sn>: = Sample Name
Index | TargetID | Hela_1.AVG_Beta | Hela_1.Intensity | Hela_1.Signal_A | Hela_1.Signal_B | Hela_1.BEAD_STDERR_A | Hela_1.BEAD_STDERR_B | Hela_1.Avg_NBEADS_A | Hela_1.Avg_NBEADS_B | Hela_1.Detection Pval | SYMBOL |
3 | cg00003994 | 0.02954 | 12967 | 12581 | 386 | 511.197 | 38.65517 | 28 | 18 | 3.68E-38 | MEOX2 |
4 | cg00005847 | 0.823062 | 10966 | 1858 | 9108 | 69.19808 | 533.973 | 24 | 20 | 3.68E-38 | HOXD3 |
6 | cg00007981 | 0.027668 | 19345 | 18807 | 538 | 1055.73 | 58.04218 | 24 | 19 | 3.68E-38 | PANX1 |
7 | cg00008493 | 0.962965 | 18450 | 587 | 17863 | 36.4771 | 746.25 | 19 | 16 | 3.68E-38 | COX8C |
8 | cg00008713 | 0.033844 | 30245 | 29218 | 1027 | 1039.59 | 90.22325 | 21 | 17 | 3.68E-38 | IMPA2 |
10 | cg00010193 | 0.553988 | 55690 | 24783 | 30907 | 1014.98 | 1716.425 | 15 | 17 | 3.68E-38 | FLJ35816 |
11 | cg00011459 | 0.938229 | 10018 | 525 | 9493 | 56.12486 | 454.7543 | 14 | 17 | 3.68E-38 | PMM2 |
13 | cg00012386 | 0.020579 | 33284 | 32597 | 687 | 2174.17 | 58.45416 | 14 | 18 | 3.68E-38 | C1orf142 |
14 | cg00012792 | 0.040012 | 36889 | 35409 | 1480 | 1097.253 | 136.5769 | 32 | 23 | 3.68E-38 | TXNDC5 |
17 | cg00014837 | 0.884958 | 8097 | 843 | 7254 | 77.45966 | 507.9264 | 15 | 19 | 3.68E-38 | ACRBP |
Discard.txt
编辑想要丢弃的样本,一个样本名一行。
运行HumMeth27QCReport
编辑加载R包
编辑library(HumMeth27QCReport)
Loading required package: methylumi
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall,
clusterEvalQ, clusterExport, clusterMap,
parApply, parCapply, parLapply, parLapplyLB,
parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename,
cbind, colnames, dirname, do.call, duplicated,
eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce,
rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view
with 'browseVignettes()'. To cite Bioconductor,
see 'citation("Biobase")', and for packages
'citation("pkgname")'.
Loading required package: scales
Loading required package: reshape2
Loading required package: ggplot2
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
Loading required package: FDb.InfiniumMethylation.hg19
Loading required package: GenomicFeatures
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Attaching package: ‘IRanges’
The following object is masked from ‘package:grDevices’:
windows
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
Loading required package: org.Hs.eg.db
Loading required package: minfi
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet,
colCollapse, colCounts, colCummaxs, colCummins,
colCumprods, colCumsums, colDiffs, colIQRDiffs,
colIQRs, colLogSumExps, colMadDiffs, colMads,
colMaxs, colMeans2, colMedians, colMins,
colOrderStats, colProds, colQuantiles,
colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars,
colWeightedMads, colWeightedMeans,
colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys,
rowAvgsPerColSet, rowCollapse, rowCounts,
rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs,
rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
rowMeans2, rowMedians, rowMins, rowOrderStats,
rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates,
rowVarDiffs, rowVars, rowWeightedMads,
rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
The following object is masked from ‘package:Biobase’:
rowMedians
Loading required package: Biostrings
Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
Loading required package: bumphunter
Loading required package: foreach
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.4 2020-03-24
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Loading required package: lumi
No methods found in package ‘RSQLite’ for request: ‘dbListFields’ when loading ‘lumi’
Attaching package: ‘lumi’
The following objects are masked from ‘package:methylumi’:
estimateM, getHistory
Loading required package: IlluminaHumanMethylation27k.db
Loading required package: amap
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:Biostrings’:
mask, translate
The following object is masked from ‘package:AnnotationDbi’:
contents
The following object is masked from ‘package:Biobase’:
contents
The following objects are masked from ‘package:base’:
format.pval, units
Loading required package: gplots
Attaching package: ‘gplots’
The following object is masked from ‘package:IRanges’:
space
The following object is masked from ‘package:S4Vectors’:
space
The following object is masked from ‘package:stats’:
lowess
Loading required package: plotrix
Attaching package: ‘plotrix’
The following object is masked from ‘package:gplots’:
plotCI
The following object is masked from ‘package:scales’:
rescale
Loading required package: WriteXLS
Loading required package: tcltk2
Loading required package: tcltk
Attaching package: ‘tcltk2’
The following objects are masked from ‘package:Hmisc’:
label, label<-
The following objects are masked from ‘package:SummarizedExperiment’:
values, values<-
The following objects are masked from ‘package:GenomicRanges’:
values, values<-
The following objects are masked from ‘package:IRanges’:
values, values<-
The following objects are masked from ‘package:S4Vectors’:
values, values<-
Warning messages:
1: replacing previous import ‘FDb.InfiniumMethylation.hg18::get27k’ by ‘FDb.InfiniumMethylation.hg19::get27k’ when loading ‘HumMeth27QCReport’
2: replacing previous import ‘FDb.InfiniumMethylation.hg18::get450k’ by ‘FDb.InfiniumMethylation.hg19::get450k’ when loading ‘HumMeth27QCReport’
3: replacing previous import ‘FDb.InfiniumMethylation.hg18::getNearestTSS’ by ‘FDb.InfiniumMethylation.hg19::getNearestTSS’ when loading ‘HumMeth27QCReport’
4: replacing previous import ‘FDb.InfiniumMethylation.hg18::getNearest’ by ‘FDb.InfiniumMethylation.hg19::getNearest’ when loading ‘HumMeth27QCReport’
5: replacing previous import ‘FDb.InfiniumMethylation.hg18::getNearestGene’ by ‘FDb.InfiniumMethylation.hg19::getNearestGene’ when loading ‘HumMeth27QCReport’
6: replacing previous import ‘FDb.InfiniumMethylation.hg18::getNearestTranscript’ by ‘FDb.InfiniumMethylation.hg19::getNearestTranscript’ when loading ‘HumMeth27QCReport’
7: replacing previous import ‘FDb.InfiniumMethylation.hg18::getPlatform’ by ‘FDb.InfiniumMethylation.hg19::getPlatform’ when loading ‘HumMeth27QCReport’
8: replacing previous import ‘Hmisc::label<-’ by ‘tcltk2::label<-’ when loading ‘HumMeth27QCReport’
9: replacing previous import ‘Hmisc::label’ by ‘tcltk2::label’ when loading ‘HumMeth27QCReport’
NormCheck()运行示例
编辑Dir <- system.file("extdata/",package="HumMeth27QCReport")
ImportDataR <- ImportData(Dir)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")
Perform quantile color balance adjustment ...
Processing sample Hela_1 ...
Processing sample Hela_2 ...
Processing sample Raji_1 ...
Processing sample Raji_2 ...
Processing sample Jurkat_1 ...
Processing sample Jurkat_2 ...
Processing sample A431_1 ...
Processing sample A431_2 ...
Processing sample K562_1 ...
Processing sample K562_2 ...
Perform quantile normalization ...
Warning message:
In prcomp.default(t(data.nona), tol = 0.1, na.action = na.omit,
center = T, scale = T) :
extra argument ‘na.action’ will be disregarded
QCCheck()运行示例
编辑ControlResults <- getAssayControls(ImportDataR,platform="Hum27")
QCresults <- QCCheck(ImportDataR, pval=0.05)
normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")#结果同上
The purpose of this method is better served by diagnostics()
结果解读
编辑后续分析
编辑与450k芯片后续分析相同