Count the number of fragment start/end positions within genomic tiles for each cell, using the fragment file generated by Cellranger. Then, save the resulting matrix into a HDF5 file.

saveTileMatrix(
  fragment.file,
  output.file,
  output.name,
  seq.lengths = NULL,
  barcodes = NULL,
  tile.size = 500,
  compress.level = 6,
  chunk.dim = 20000
)

Arguments

fragment.file

String containing a path to a fragment file.

output.file

String containing a path to an output HDF5 file. If none exists, one will be created.

output.name

String containing the name of the group inside output.file, in which to save the matrix contents.

seq.lengths

Named integer vector containing the lengths of the reference sequences used for alignment. Vector names should correspond to the names of the sequences, in the same order of occurrence as in the fragment file. If NULL, this is obtained from the reference genome used by Cellranger (itself located by scanning the header of the fragment file).

barcodes

Character vector of cell barcodes to extract, e.g., based on the filtered cells reported by Cellranger. If NULL, all barcodes are extracted, though this is usually undesirable as not all barcodes correspond to cell-containing droplets.

tile.size

Integer scalar specifying the size of the tiles in base pairs.

compress.level

Integer scalar specifying the Zlib compression level to use.

chunk.dim

Integer scalar specifying the size of the chunks (in terms of the number of elements) inside the HDF5 file.

Value

A sparse matrix is saved to output.file using the 10X HDF5 format. A list is returned containing:

  • tiles, a GRanges object containing the tile coordinates.

  • counts A H5SparseMatrix referencing the outputfile, where the rows correspond to entries of tiles. Column names are set to the cell barcodes - if barcodes is supplied, this is directly used as the column names.

Details

We count the overlap with the start/end positions of each fragment, not the overlap with the fragment interval itself. This is because the fragment start/ends represent the transposase cleavage events, while the fragment interval has no real biological significance.

If the start and end for the same fragment overlap different tiles, the counts for both tiles are incremented by 1. This reflects the fact that these positions represent distinct transposase cleavage events for different features. However, if the start and end for the same fragment overlap the same tile, the tile's count is only incremented by 1. This ensures that the count for each entry of regions still follows Poisson noise and avoids an artificial enrichment of even counts.

Start positions are inclusive and end positions are exclusive in fragments files according to 10x genomics.

The ReadSupport column is ignored because we only consider unique fragments. Refer to ATAC duplicate marking.

Author

Aaron Lun

Examples

# Mocking up the fragment file.
seq.lengths <- c(chrA = 2000, chrB = 10000)
temp <- tempfile(fileext = ".gz")
mockFragmentFile(temp, seq.lengths, 1e3, cell.names = LETTERS)

# Running the counter
out <- tempfile(fileext=".h5")
counted <- saveTileMatrix(temp, output.file=out, output.name="WHEE", seq.lengths=seq.lengths)
counted
#> $tiles
#> GRanges object with 24 ranges and 0 metadata columns:
#>        seqnames     ranges strand
#>           <Rle>  <IRanges>  <Rle>
#>    [1]     chrA      1-500      *
#>    [2]     chrA   501-1000      *
#>    [3]     chrA  1001-1500      *
#>    [4]     chrA  1501-2000      *
#>    [5]     chrB      1-500      *
#>    ...      ...        ...    ...
#>   [20]     chrB  7501-8000      *
#>   [21]     chrB  8001-8500      *
#>   [22]     chrB  8501-9000      *
#>   [23]     chrB  9001-9500      *
#>   [24]     chrB 9501-10000      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome
#> 
#> $counts
#> <24 x 26> sparse DelayedMatrix object of type "integer":
#>         P   K   B   J ...   N   E   F   G
#>  [1,] 127 127 128 116   . 136 122 113 118
#>  [2,] 181 198 203 192   . 198 166 185 163
#>  [3,] 196 230 215 237   . 231 214 217 215
#>  [4,] 253 239 248 247   . 265 282 242 224
#>  [5,]  26  21  34  29   .  17  25  26  25
#>   ...   .   .   .   .   .   .   .   .   .
#> [20,]  32  38  34  46   .  43  42  52  36
#> [21,]  44  48  58  40   .  50  50  39  36
#> [22,]  44  45  33  37   .  39  52  40  42
#> [23,]  50  49  52  37   .  35  49  42  43
#> [24,]  53  46  59  49   .  44  47  49  44
#>