Count the number of fragment start/end positions within user-defined genomic regions, using the fragment file generated by Cellranger. Then, save the resulting matrix into a HDF5 file.

saveRegionMatrix(
  fragment.file,
  output.file,
  output.name,
  regions,
  barcodes = NULL,
  compress.level = 6,
  chunk.dim = 20000
)

Arguments

fragment.file

String containing a path to a fragment file.

output.file

String containing a path to an output HDF5 file. If none exists, one will be created.

output.name

String containing the name of the group inside output.file, in which to save the matrix contents.

regions

A GRanges or GRangesList of the regions in which to count fragments. If regions is a GRangesList, the count for each entry of regions will include all fragment starts/ends falling in any of the genomic intervals in its GRanges.

barcodes

Character vector of cell barcodes to extract, e.g., based on the filtered cells reported by Cellranger. If NULL, all barcodes are extracted, though this is usually undesirable as not all barcodes correspond to cell-containing droplets.

compress.level

Integer scalar specifying the Zlib compression level to use.

chunk.dim

Integer scalar specifying the size of the chunks (in terms of the number of elements) inside the HDF5 file.

Value

A sparse matrix is saved to output.file using the 10X HDF5 format. A H5SparseMatrix referencing this file is returned where the rows correspond to entries of regions. Column names are set to the cell barcodes - if barcodes is supplied, this is directly used as the column names.

Details

We count the overlap with the start/end positions of each fragment, not the overlap with the fragment interval itself. This is because the fragment start/ends represent the transposase cleavage events, while the fragment interval has no real biological significance.

If a fragment start/end overlaps with multiple entries of regions, it is not counted as the assignment is ambiguous. This should not be confused with overlaps to multiple intervals of the same entry when regions is a GRangesList; this yields a mapping to a single feature, even if the exact interval isnot ambiguous.

If the start and end for the same fragment overlap different entries of regions, the counts for both entries are incremented by 1. This reflects the fact that these positions represent distinct transposase cleavage events for different features. However, if the start and end for the same fragment overlap the same entry of regions, the entry's count is only incremented by 1. This ensures that the count for each entry of regions still follows Poisson noise and avoids an artificial enrichment of even counts.

Author

Aaron Lun

Examples

# Mocking up the fragment file.
seq.lengths <- c(chrA = 2000, chrB = 10000)
temp <- tempfile(fileext = ".gz")
mockFragmentFile(temp, seq.lengths, 1e3, cell.names = LETTERS)

# Mocking up some regions of interest.
library(GenomicRanges)
regions <- GRanges(c("chrA:500-1000", "chrB:1000-2000"))

# Running the counter
out <- tempfile(fileext=".h5")
counts <- saveRegionMatrix(temp, output.file=out, output.name="WHEE", regions=regions)
counts
#> <2 x 26> sparse DelayedMatrix object of type "raw":
#>       H  Q  I ...  R  M
#> [1,] cf d2 c0   . c4 b7
#> [2,] 4f 41 49   . 51 4a