Count the number of fragment start/end positions within user-defined genomic regions, using the fragment file generated by Cellranger. Then, save the resulting matrix into a HDF5 file.
saveRegionMatrix(
fragment.file,
output.file,
output.name,
regions,
barcodes = NULL,
compress.level = 6,
chunk.dim = 20000
)String containing a path to a fragment file.
String containing a path to an output HDF5 file. If none exists, one will be created.
String containing the name of the group inside
output.file, in which to save the matrix contents.
A GRanges or GRangesList of the
regions in which to count fragments. If regions is a GRangesList,
the count for each entry of regions will include all fragment
starts/ends falling in any of the genomic intervals in its GRanges.
Character vector of cell barcodes to extract, e.g., based on
the filtered cells reported by Cellranger. If NULL, all barcodes are
extracted, though this is usually undesirable as not all barcodes
correspond to cell-containing droplets.
Integer scalar specifying the Zlib compression level to use.
Integer scalar specifying the size of the chunks (in terms of the number of elements) inside the HDF5 file.
A sparse matrix is saved to output.file using the 10X HDF5
format. A H5SparseMatrix referencing this file is returned
where the rows correspond to entries of regions. Column names are
set to the cell barcodes - if barcodes is supplied, this is directly
used as the column names.
We count the overlap with the start/end positions of each fragment, not the overlap with the fragment interval itself. This is because the fragment start/ends represent the transposase cleavage events, while the fragment interval has no real biological significance.
If a fragment start/end overlaps with multiple entries of regions, it
is not counted as the assignment is ambiguous. This should not be confused
with overlaps to multiple intervals of the same entry when regions is
a GRangesList; this yields a mapping to a single feature, even if the exact
interval isnot ambiguous.
If the start and end for the same fragment overlap different entries of
regions, the counts for both entries are incremented by 1. This
reflects the fact that these positions represent distinct transposase
cleavage events for different features. However, if the start and end for the
same fragment overlap the same entry of regions, the entry's count is
only incremented by 1. This ensures that the count for each entry of
regions still follows Poisson noise and avoids an artificial
enrichment of even counts.
# Mocking up the fragment file.
seq.lengths <- c(chrA = 2000, chrB = 10000)
temp <- tempfile(fileext = ".gz")
mockFragmentFile(temp, seq.lengths, 1e3, cell.names = LETTERS)
# Mocking up some regions of interest.
library(GenomicRanges)
regions <- GRanges(c("chrA:500-1000", "chrB:1000-2000"))
# Running the counter
out <- tempfile(fileext=".h5")
counts <- saveRegionMatrix(temp, output.file=out, output.name="WHEE", regions=regions)
counts
#> <2 x 26> sparse DelayedMatrix object of type "raw":
#> H Q I ... R M
#> [1,] cf d2 c0 . c4 b7
#> [2,] 4f 41 49 . 51 4a