R/createArbalistMAE.R
createArbalistMAE.RdImport scATAC-seq or multiome results into a MultiAssayExperiment.
createArbalistMAE(
sample.names,
fragment.files,
filtered.feature.matrix.files,
barcode.annotation.files = NULL,
sample.annotation.files = NULL,
output.dir = tempdir(),
multiome = NULL,
min.frags = 1000,
max.frags = Inf,
tile.size = 500,
seq.lengths = NULL,
gene.grs = NULL,
use.alt.exp = FALSE,
main.exp.name = "TileMatrix500",
filter.rna.features.without.intervals = TRUE,
BPPARAM = bpparam()
)String vector containing the sample names. Must be the same length as fragment.files & filtered.feature.matrix.files.
String vector containing the fragment file names and paths. ex. 10x Cell Ranger result output atac_fragments.tsv.gz or fragments.tsv.gz
String vector containing the filtered feature matrix file names and paths. ex. 10x Cell Ranger result output filtered_feature_bc_matrix.h5 or filtered_tf_bc_matrix.h5
String vector containing barcode annotation to put in the SingleCellExperiment colData. ex. 10x Cell Ranger result output per_barcode_metrics.csv or singlecell.csv
String vector containing sample annotation to put in the MultiAssayExperiment colData. ex. 10x Cell Ranger result output summary.csv
String containing the directory where files should be output while creating the MultiAssayExperiment.
Logical whether to use createRNASCE on the filtered.feature.matrix.files to extract the RNA features and create a SingleCellExperiment. If NULL, then will become TRUE if filtered.feature.matrix.files contain "filtered_feature_bc_matrix.h5" otherwise FALSE.
Number specifying the minimum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses. Cells containing greater than or equal to min.frags total fragments will be retained.
Number specifying the maximum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses. Cells containing less than or equal to max.frags total fragments will be retained.
Integer scalar specifying the size of the tiles in base pairs.
Named integer vector containing the lengths of the
reference sequences used for alignment. Vector names should correspond to
the names of the sequences, in the same order of occurrence as in the
fragment file. If NULL, this is obtained from the reference genome
used by Cellranger (itself located by scanning the header of the fragment
file).
Genomic Ranges specifying gene coordinates for creating the gene score matrix. If NULL, then the geneset will be selected based on the genome version.
Logical for selecting the MultiAssayExperiment structure. TRUE means that there will only be one experiment in the MultiAssayExperiment and all other experiments will be in alternative experiments. This option is only available if the columns are the same for all Matrices. FALSE means that each Matrix will be a separate experiment in the MAE.
String containing the name of the experiment that will be the main experiment when use.alt.exp is TRUE.
Logical whether to remove GeneExpression Matrix features from the h5.files that do not have interval specified. Often these are mitochondria genes.
A BiocParallelParam object indicating how matrix creation should be parallelized.
A MultiAssayExperiment
# Create mock data
f <- tempfile(fileext=".tsv.gz")
seq_lengths <- c(chr1=1000)
mockFragmentFile(f, seq_lengths, 50, LETTERS[1:5])
# Create a valid mock feature matrix (H5)
h5 <- tempfile(fileext=".h5")
rhdf5::h5createFile(h5)
rhdf5::h5createGroup(h5, "matrix")
# 5 barcodes
rhdf5::h5write(LETTERS[1:5], h5, "matrix/barcodes", createnewfile = FALSE)
# 5 non-zero entries (1 per cell)
rhdf5::h5write(as.integer(c(1,1,1,1,1)), h5, "matrix/data", createnewfile = FALSE)
# All entries in row 0 (the only feature)
rhdf5::h5write(as.integer(c(0,0,0,0,0)), h5, "matrix/indices", createnewfile = FALSE)
# Column pointers: 0, 1, 2, 3, 4, 5 (defines 5 columns)
rhdf5::h5write(as.integer(c(0,1,2,3,4,5)), h5, "matrix/indptr", createnewfile = FALSE)
# Shape: 1 feature (row), 5 cells (columns)
rhdf5::h5write(as.integer(c(1,5)), h5, "matrix/shape", createnewfile = FALSE)
rhdf5::h5createGroup(h5, "matrix/features")
rhdf5::h5write("GENE1", h5, "matrix/features/id", createnewfile = FALSE)
rhdf5::h5write("Gene1", h5, "matrix/features/name", createnewfile = FALSE)
rhdf5::h5write("Gene Expression", h5, "matrix/features/feature_type", createnewfile = FALSE)
rhdf5::h5write("chr1:1-100", h5, "matrix/features/interval", createnewfile = FALSE)
rhdf5::h5write("Genome1", h5, "matrix/features/genome", createnewfile = FALSE)
# Use a unique temp directory for this example
example_dir <- tempfile()
dir.create(example_dir)
# Force SerialParam to avoid HDF5 threading issues during examples/checks
mae <- createArbalistMAE(
sample.names = "Sample1",
fragment.files = f,
filtered.feature.matrix.files = h5,
seq.lengths = seq_lengths,
output.dir = example_dir,
BPPARAM = BiocParallel::SerialParam()
)