Import scATAC-seq or multiome results into a MultiAssayExperiment.

createArbalistMAE(
  sample.names,
  fragment.files,
  filtered.feature.matrix.files,
  barcode.annotation.files = NULL,
  sample.annotation.files = NULL,
  output.dir = tempdir(),
  multiome = NULL,
  min.frags = 1000,
  max.frags = Inf,
  tile.size = 500,
  seq.lengths = NULL,
  gene.grs = NULL,
  use.alt.exp = FALSE,
  main.exp.name = "TileMatrix500",
  filter.rna.features.without.intervals = TRUE,
  BPPARAM = bpparam()
)

Arguments

sample.names

String vector containing the sample names. Must be the same length as fragment.files & filtered.feature.matrix.files.

fragment.files

String vector containing the fragment file names and paths. ex. 10x Cell Ranger result output atac_fragments.tsv.gz or fragments.tsv.gz

filtered.feature.matrix.files

String vector containing the filtered feature matrix file names and paths. ex. 10x Cell Ranger result output filtered_feature_bc_matrix.h5 or filtered_tf_bc_matrix.h5

barcode.annotation.files

String vector containing barcode annotation to put in the SingleCellExperiment colData. ex. 10x Cell Ranger result output per_barcode_metrics.csv or singlecell.csv

sample.annotation.files

String vector containing sample annotation to put in the MultiAssayExperiment colData. ex. 10x Cell Ranger result output summary.csv

output.dir

String containing the directory where files should be output while creating the MultiAssayExperiment.

multiome

Logical whether to use createRNASCE on the filtered.feature.matrix.files to extract the RNA features and create a SingleCellExperiment. If NULL, then will become TRUE if filtered.feature.matrix.files contain "filtered_feature_bc_matrix.h5" otherwise FALSE.

min.frags

Number specifying the minimum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses. Cells containing greater than or equal to min.frags total fragments will be retained.

max.frags

Number specifying the maximum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses. Cells containing less than or equal to max.frags total fragments will be retained.

tile.size

Integer scalar specifying the size of the tiles in base pairs.

seq.lengths

Named integer vector containing the lengths of the reference sequences used for alignment. Vector names should correspond to the names of the sequences, in the same order of occurrence as in the fragment file. If NULL, this is obtained from the reference genome used by Cellranger (itself located by scanning the header of the fragment file).

gene.grs

Genomic Ranges specifying gene coordinates for creating the gene score matrix. If NULL, then the geneset will be selected based on the genome version.

use.alt.exp

Logical for selecting the MultiAssayExperiment structure. TRUE means that there will only be one experiment in the MultiAssayExperiment and all other experiments will be in alternative experiments. This option is only available if the columns are the same for all Matrices. FALSE means that each Matrix will be a separate experiment in the MAE.

main.exp.name

String containing the name of the experiment that will be the main experiment when use.alt.exp is TRUE.

filter.rna.features.without.intervals

Logical whether to remove GeneExpression Matrix features from the h5.files that do not have interval specified. Often these are mitochondria genes.

BPPARAM

A BiocParallelParam object indicating how matrix creation should be parallelized.

Value

A MultiAssayExperiment

Author

Natalie Fox

Examples

# Create mock data
f <- tempfile(fileext=".tsv.gz")
seq_lengths <- c(chr1=1000)
mockFragmentFile(f, seq_lengths, 50, LETTERS[1:5])

# Create a valid mock feature matrix (H5)
h5 <- tempfile(fileext=".h5")
rhdf5::h5createFile(h5)
rhdf5::h5createGroup(h5, "matrix")
# 5 barcodes
rhdf5::h5write(LETTERS[1:5], h5, "matrix/barcodes", createnewfile = FALSE)
# 5 non-zero entries (1 per cell)
rhdf5::h5write(as.integer(c(1,1,1,1,1)), h5, "matrix/data", createnewfile = FALSE)
# All entries in row 0 (the only feature)
rhdf5::h5write(as.integer(c(0,0,0,0,0)), h5, "matrix/indices", createnewfile = FALSE)
# Column pointers: 0, 1, 2, 3, 4, 5 (defines 5 columns)
rhdf5::h5write(as.integer(c(0,1,2,3,4,5)), h5, "matrix/indptr", createnewfile = FALSE)
# Shape: 1 feature (row), 5 cells (columns)
rhdf5::h5write(as.integer(c(1,5)), h5, "matrix/shape", createnewfile = FALSE)

rhdf5::h5createGroup(h5, "matrix/features")
rhdf5::h5write("GENE1", h5, "matrix/features/id", createnewfile = FALSE)
rhdf5::h5write("Gene1", h5, "matrix/features/name", createnewfile = FALSE)
rhdf5::h5write("Gene Expression", h5, "matrix/features/feature_type", createnewfile = FALSE)
rhdf5::h5write("chr1:1-100", h5, "matrix/features/interval", createnewfile = FALSE)
rhdf5::h5write("Genome1", h5, "matrix/features/genome", createnewfile = FALSE)

# Use a unique temp directory for this example
example_dir <- tempfile()
dir.create(example_dir)

# Force SerialParam to avoid HDF5 threading issues during examples/checks
mae <- createArbalistMAE(
    sample.names = "Sample1",
    fragment.files = f,
    filtered.feature.matrix.files = h5,
    seq.lengths = seq_lengths,
    output.dir = example_dir,
    BPPARAM = BiocParallel::SerialParam()
)