Count the number of fragment start/end positions within genomic tiles for each cell, using the fragment file generated by Cellranger. Then, save the resulting matrix into a HDF5 file.
saveTileMatrix(
fragment.file,
output.file,
output.name,
seq.lengths = NULL,
barcodes = NULL,
tile.size = 500,
compress.level = 6,
chunk.dim = 20000
)String containing a path to a fragment file.
String containing a path to an output HDF5 file. If none exists, one will be created.
String containing the name of the group inside
output.file, in which to save the matrix contents.
Named integer vector containing the lengths of the
reference sequences used for alignment. Vector names should correspond to
the names of the sequences, in the same order of occurrence as in the
fragment file. If NULL, this is obtained from the reference genome
used by Cellranger (itself located by scanning the header of the fragment
file).
Character vector of cell barcodes to extract, e.g., based on
the filtered cells reported by Cellranger. If NULL, all barcodes are
extracted, though this is usually undesirable as not all barcodes
correspond to cell-containing droplets.
Integer scalar specifying the size of the tiles in base pairs.
Integer scalar specifying the Zlib compression level to use.
Integer scalar specifying the size of the chunks (in terms of the number of elements) inside the HDF5 file.
A sparse matrix is saved to output.file using the 10X HDF5
format. A list is returned containing:
tiles, a GRanges object containing the tile coordinates.
counts A H5SparseMatrix referencing the outputfile, where the rows correspond to entries of tiles.
Column names are set to the cell barcodes - if barcodes is supplied, this is directly used as the column names.
We count the overlap with the start/end positions of each fragment, not the overlap with the fragment interval itself. This is because the fragment start/ends represent the transposase cleavage events, while the fragment interval has no real biological significance.
If the start and end for the same fragment overlap different tiles, the
counts for both tiles are incremented by 1. This reflects the fact that
these positions represent distinct transposase cleavage events for
different features. However, if the start and end for the same fragment
overlap the same tile, the tile's count is only incremented by 1. This
ensures that the count for each entry of regions still follows
Poisson noise and avoids an artificial enrichment of even counts.
Start positions are inclusive and end positions are exclusive in fragments files according to 10x genomics.
The ReadSupport column is ignored because we only consider unique fragments.
Refer to ATAC duplicate marking.
# Mocking up the fragment file.
seq.lengths <- c(chrA = 2000, chrB = 10000)
temp <- tempfile(fileext = ".gz")
mockFragmentFile(temp, seq.lengths, 1e3, cell.names = LETTERS)
# Running the counter
out <- tempfile(fileext=".h5")
counted <- saveTileMatrix(temp, output.file=out, output.name="WHEE", seq.lengths=seq.lengths)
counted
#> $tiles
#> GRanges object with 24 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chrA 1-500 *
#> [2] chrA 501-1000 *
#> [3] chrA 1001-1500 *
#> [4] chrA 1501-2000 *
#> [5] chrB 1-500 *
#> ... ... ... ...
#> [20] chrB 7501-8000 *
#> [21] chrB 8001-8500 *
#> [22] chrB 8501-9000 *
#> [23] chrB 9001-9500 *
#> [24] chrB 9501-10000 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome
#>
#> $counts
#> <24 x 26> sparse DelayedMatrix object of type "integer":
#> P K B J ... N E F G
#> [1,] 127 127 128 116 . 136 122 113 118
#> [2,] 181 198 203 192 . 198 166 185 163
#> [3,] 196 230 215 237 . 231 214 217 215
#> [4,] 253 239 248 247 . 265 282 242 224
#> [5,] 26 21 34 29 . 17 25 26 25
#> ... . . . . . . . . .
#> [20,] 32 38 34 46 . 43 42 52 36
#> [21,] 44 48 58 40 . 50 50 39 36
#> [22,] 44 45 33 37 . 39 52 40 42
#> [23,] 50 49 52 37 . 35 49 42 43
#> [24,] 53 46 59 49 . 44 47 49 44
#>