xenomapper2

xenomapper2.xenomapper2 module

class xenomapper2.xenomapper2.AlignbatchFileReader(*args, **kwargs)

Bases: pylazybam.bam.FileReader

A BAM file reader that iterates batches of reads with the same name

Parameters:ubam (BinaryIO) – An binary (bytes) file or stream containing a valid uncompressed bam file conforming to the specification.
Yields:alignbatch (List[bytes]) – A byte string of a bam alignment entry in raw binary format
raw_header

The raw bytestring representing the bam header

Type:bytes
raw_refs

The raw bytestring representing the reference sequences

Type:bytes
header

The ASCII representation of the header

Type:str
refs

A dictionary of reference_name keys with reference_length

Type:Dict[str:int]
ref_to_index

A dictionary mapping reference names to the bam numeric identifier

Type:Dict[str:int]
index_to_ref

A dictionary mapping bam reference numeric identifiers to names

Type:Dict[int:str]

Notes

It is advisable not to call the private functions or operate directly on the underlying file object.

The detailed specification for the BAM format can be found at https://samtools.github.io/hts-specs/SAMv1.pdf

Example

This class requires an uncompressed bam file as input. For this example we will use gzip to decompress the test file included as a resource in the package

>>> import gzip
>>> from pkg_resources import resource_stream
>>> ubam = gzip.open('/tests/data/paired_end_testdata_human.bam'))

A bam filereader object can then be created and headers inspected

>>> mybam = bam.FileReader(ubam)
>>> print(mybam.header)

The filereader object is an iterator and yields batches of alignments in raw BAM binary format

>>> align_batch = next(mybam)
>>> for align in align_batch:
>>>     print(mybam.index_to_ref[get_ref_index(align)],
>>>           get_pos(align),
>>>           get_AS(align))
close()

Close input file

get_full_raw_header()

Get a complete BAM header with all elements suitable for writing to a bam.FileWriter

Returns:The complete raw BAM header including the BAM magic and reference block
Return type:bytes
get_updated_header(id: str, program: str, version: str, command: str = None, description: str = None, raw_header=None)

Get a modified version of the header with additional program information in a @PG BAM header suitable for writing to a new output file.

Does not include the BAM magic or the BAM reference block.

Note that the header should be written after self.magic and before self.raw_refs

Parameters:
  • id (str) – a unique identifier of this action of the program on the BAM
  • program (str) – the name of the program used to process the BAM
  • version (str) – version number of the program used to process the BAM
  • command (str) – the command and arguments to the program used to process the BAM
  • description (str) – optional description
  • raw_header (bytes) – a raw BAM formatted header without BAM magic or REF block used for recursive modification to add multiple

Notes

Parameters will be utf-8 encoded If there are existing @PG records the last record will be used as PP

reset_alignments()

Reset the file pointer to the beginning of the alignment block

update_header(*args, **kwargs)

Use get_updated_header to update self.raw_header inplace

See get_updated_header for Parameters and documentation

update_header_length(raw_header=None)

Update the length of the SAM format text component of the header

Parameters:raw_header (bytes) – optional raw SAM text header as bytes to process Default self.raw_header
Returns:returns the length corrected raw header if a raw header is provided
Return type:bytes
class xenomapper2.xenomapper2.DummyFile(*args, **kwargs)

Bases: object

A dummy io class - looks like pylazybam.bam.FileWriter but does nothing

close()
get_updated_header(*args, **kwargs)
seekable()
update_header(*args, **kwargs)
update_header_length(*args, **kwargs)
writable()
write(data)
write_header(*args, **kwargs)
class xenomapper2.xenomapper2.XenomapperOutputWriter(primary_raw_header: bytes, primary_raw_refs: bytes, secondary_raw_header: bytes, secondary_raw_refs: bytes, primary_specific: Union[str, pathlib.Path, BinaryIO, None] = None, primary_multi: Union[str, pathlib.Path, BinaryIO, None] = None, secondary_specific: Union[str, pathlib.Path, BinaryIO, None] = None, secondary_multi: Union[str, pathlib.Path, BinaryIO, None] = None, unresolved: Union[str, pathlib.Path, BinaryIO, None] = None, unassigned: Union[str, pathlib.Path, BinaryIO, None] = None, basename: str = None, cmdline: str = '', compresslevel: int = 6)

Bases: object

Container class grouping xenomapper output files & manipulating headers

Updated headers are written to these files on initialization. The primary BAM header is used for all files except secondary_specific and secondary_multi. A @PO line is added to the file indicating the id and program as xenomapper with the version, a description field indicating the type of xenomapper output in the file (eg primary_specific) and the command used to run it.

Parameters:
  • primary_raw_header (bytes) – the raw header from the primary species BAM file
  • primary_raw_refs (bytes) – the raw reference sequence info from the primary species BAM file
  • secondary_raw_header (bytes) – the raw header from the secondary species BAM file
  • secondary_raw_refs (bytes) – the raw reference sequence info from the secondary species BAM file
  • primary_specific (str or Path or BinaryIO, optional) – output file for uniquely mapping primary specific alignments [ Default : None ]
  • primary_multi (str or Path or BinaryIO, optional) – output file for multimapping primary specific alignments [ Default : None ]
  • secondary_specific (str or Path or BinaryIO, optional) – output file for uniquely mapping secondary specific alignments [ Default : None ]
  • secondary_multi (str or Path or BinaryIO, optional) – output file for multimapping secondary specific alignments [ Default : None ]
  • unresolved (str or Path or BinaryIO, optional) – output file for alignments where primary or secondary is ambiguous [ Default : None ]
  • unassigned (str or Path or BinaryIO, optional) – output file for reads that do not map sufficiently well in either BAM [ Default : None ]
  • basename (str, optional) – filename stem to derive all output filenames <basename>_<outfilename> (eg basename=’Foo’ -> Foo_primary_specific, … Foo_unassigned) [ Default : None ]
  • cmdline (str, optional) – The commandline to include in the output BAM header
  • compresslevel (int, optional) – gzip compression level for output file [ Default : 6 ]
Returns:

Return type:

A container class with __get_item__ , keys, close

Examples

>>> xow = XenomapperOutputWriter(phead,pref,shead,sref,basename='/foo')
>>> for key in xow.keys():
>>>     xow[key].write(data)
>>> xow.close()
close()
keys()

return the keys for the file output objects :returns: :rtype: Iterable[str]

xenomapper2.xenomapper2.always_very_negative(*args, **kwargs) → int

A function that returns -2**31 for any combination of parameters :returns: always -2**31 == MIN32INT == -2147483648 :rtype: int

See also

always_zero()

xenomapper2.xenomapper2.always_zero(*args, **kwargs) → int

A function that returns zero for any combination of parameters :returns: always 0 :rtype: int

xenomapper2.xenomapper2.calc_cigar_based_score(cigar_string: bytes, NM: int = None, mismatch: int = -6, gap_open: int = -5, gap_extend: int = -3, softclip: int = -2) → int

Calculate values for a score equivalent to a rescaled AS tag using the SAM formatted cigar line and NM tags from an alignment. Score is scaled such that a perfect match for the full length of the read achieves a score of 0. Standard penalties are: mismatch -6, gap open -5, gap extend -3, softclipped -2 (equiv to +2 match)

Parameters:
  • align (bytes) – the raw alignment entry from a BAM file
  • mismatch (int) – score penalty for a mismatch [ Default : -6 ]
  • gap_open (int) – score penalty for opening/initiating a gap [ Default : -5 ]
  • gap_extend (int) – score penalty for extending an existing gap [ Default : -3 ]
  • softclip (int) – score penalty for removing bases at the start or end of a read [ Default : -2 ]
Returns:

The AS like score calculated from the cigar string 0 maximum and mismatch*seq_length theoretical minimum

Return type:

int

xenomapper2.xenomapper2.conservative_state_map(forward_state, reverse_state)

Conservative tate map for inferring fragment state from forward & reverse Reads where there is ambiguous evidence will be assigned the least specific state.

Parameters:
  • forward_state (str) – a xenomapper state for the forward read (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved)
  • reverse_state (str or None) – a xenomapper state for the reverse read or None (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved) or None
Returns:

a xenomapper state for the fragment (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved)

Return type:

str

Notes

In future these will be of type PEP586 Literal

xenomapper2.xenomapper2.get_bamprimary_AS_XS(alignments: List[bytes], AS_function=<function get_AS>, XS_function=<function get_XS>) → Tuple[int, int]

Get the AS and XS of the primary alignment

Parameters:
  • alignments (List[bytes]) – a list of binary format BAM alignments from the same read. Should contain only forward or reverse reads and only one primary flag
  • AS_function – a function that accepts a BAM alignment bytestring and returns the AS score (alignment score) as an integer eg pylazybam.bam.get_AS or
  • XS_function – a function that accepts a BAM alignment bytestring and returns the XS score as an integer
Returns:

a tuple of the alignment score AS and suboptimal alignment score XS

Return type:

Tuple[int,int]

Raises:

ValueError – raises value error if there is no primary or more than one primary flag

xenomapper2.xenomapper2.get_cigar_based_score(align: bytes, no_tag: int = -2147483648) → int

Returns a score equivalent to a rescaled AS tag from a raw BAM alignment Extracts and decodes the cigar string and NM tags and calls calc_cigar_based_score to calculate

Parameters:
  • align (bytes) – A raw alignment from a BAM file
  • no_tag (int) – The value to return if no NM tag or cigar string is present
Returns:

the AS like score with 0 maximum and -6*seq_length theoretical minimum

Return type:

int

xenomapper2.xenomapper2.get_mapping_state(AS1: int, XS1: int, AS2: int, XS2: int, min_score: int = -2147483648)

Determine the mapping state based on scores in each species. Scores can be negative but better matches must have higher scores

Parameters:
  • AS1 (int) – score of best match in primary species
  • XS1 (int) – score of suboptimal match in primary species
  • AS2 (int) – score of best match in secondary species
  • XS2 (int) – score of suboptimal match in secondary species
  • min_score (int [ Default : -2**31 ]) – the score that matches must exceed in order to be considered valid matches. Note scores equalling this value will also be considered not to match.
Returns:

‘primary_multi’, ‘secondary_multi’, ‘unresolved’, or ‘unassigned’ indicating match state.

Return type:

state - a string of ‘primary_specific’, ‘secondary_specific’,

xenomapper2.xenomapper2.get_max_AS_XS(alignments: List[bytes], AS_function=<function get_AS>, XS_function=<function get_XS>, sort_key=None, default=-2147483648) → Tuple[int, int]

Get the maximum AS and XS from a group of alignments

Parameters:
  • alignments (List[bytes]) – a list of binary format BAM alignments.
  • AS_function (Callable[[bytes], int]) – a function that accepts a BAM alignment bytestring and returns the AS score (alignment score) as an integer eg pylazybam.bam.get_AS or
  • XS_function (Callable[[bytes], int]) – a function that accepts a BAM alignment bytestring and returns the XS score as an integer
  • sort_key (Callable[[Tuple[int,int]], Any]) – a key function that converts tuples to items to be used by sorted()
Returns:

a tuple of the alignment score AS and suboptimal alignment score XS

Return type:

Tuple[int,int]

xenomapper2.xenomapper2.output_summary(category_counts: collections.Counter, title='Read Count Category Summary\n', outfile=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)

Print a summary table based on a Counter

Parameters:
  • category_counts (Counter) –
  • title (str) –
  • outfile (TextIO) –
xenomapper2.xenomapper2.split_forward_reverse(alignments: List[bytes]) → Tuple[List[bytes], List[bytes]]

Split a list of BAM alignments into forward and reverse

Parameters:alignments (List[bytes]) – a list of BAM alignments in raw binary format containing forward and optionally reverse alignments
Returns:a tuple containing a list of forward and reverse BAM reads
Return type:(List[bytes], List[bytes])
Raises:ValueError – if there are any reads that do not have the forward or reverse flag
xenomapper2.xenomapper2.state_map(forward_state: str, reverse_state: str) → str

State map for inferring the fragment state from forward & reverse Reads where there is stronger but not contradictory evidence will be assigned the more specific state.

Parameters:
  • forward_state (str) – a xenomapper state for the forward read (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved)
  • reverse_state (str or None) – a xenomapper state for the reverse read or None (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved) or None
Returns:

a xenomapper state for the fragment (primary_specific,primary_multi,secondary_specific,secondary_multi, unassigned, unresolved)

Return type:

str

Notes

In future these will be of type PEP586 Literal

xenomapper2.xenomapper2.xenomap(primary_bam: xenomapper2.xenomapper2.AlignbatchFileReader, secondary_bam: xenomapper2.xenomapper2.AlignbatchFileReader, output_writer: xenomapper2.xenomapper2.XenomapperOutputWriter, score_function: Callable = <function get_bamprimary_AS_XS>, AS_function: Callable = <function get_AS>, XS_function: Callable = <function get_XS>, min_score: int = -2147483648, conservative: bool = False)

core method to coordinate the xenomapping of BAMS

Parameters:
  • primary_bam (AlignbatchFileReader) – An interable that yields iterables of primary BAM alignments
  • secondary_bam (AlignbatchFileReader) – An interable that yields iterables of secondary BAM alignments
  • output_writer (XenomapperOutputWriter) – output writer that holds output files of type bam.FileWriter
  • score_function (Callable) – a xenomapper alignment batch calling function get_bamprimary_AS_XS or get_max_AS_XS
  • AS_function (Callable[[bytes], int]) – a function that accepts a BAM alignment bytestring and returns the AS score (alignment score) as an integer get_AS or get_cigar_based_score
  • XS_function (Callable[[bytes], int]) – a function that accepts a BAM alignment bytestring and returns the XS score as an integer get_XS, get_ZS, always_zero, or always_very_negative
  • min_score (int) – the score that matches must exceed in order to be considered valid matches. Note scores equalling this value will also be considered not valid matches. [ Default : -2**31 ]
  • conservative
Returns:

Return type:

Tuple[Counter, Counter, XenomapperOutputWriter]

xenomapper2.xenomapper2.xenomap_states(primary_aligns: Iterable[bytes], secondary_aligns: Iterable[bytes], score_function: Callable = <function get_bamprimary_AS_XS>, AS_function: Callable = <function get_AS>, XS_function: Callable = <function get_XS>, min_score: int = -2147483648) → Tuple[int, int]

Get the xenomapping state for the forward and reverse reads

primary_aligns : List[bytes]
a list of binary format BAM alignments from the primary BAM.
secondary_aligns : List[bytes]
a list of binary format BAM alignments from the primary BAM.
score_function : Callable
a xenomapper alignment batch calling function get_bamprimary_AS_XS or get_max_AS_XS
AS_function : Callable[[bytes], int]
a function that accepts a BAM alignment bytestring and returns the AS score (alignment score) as an integer get_AS or get_cigar_based_score
XS_function : Callable[[bytes], int]
a function that accepts a BAM alignment bytestring and returns the XS score as an integer get_XS, get_ZS, always_zero, or always_very_negative
min_score : int [ Default : -2**31 ]
the score that matches must exceed in order to be considered valid matches. Note scores equalling this value will also be considered not valid matches.
Returns:the xenomapper state of the forward and reverse read
Return type:Tuple[str,str]

xenomapper2.cli module

Format decoders for BAM alignment data types

xenomapper2.cli.main(arguments: str = None, output=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>) → Tuple[collections.Counter, collections.Counter]

main function for orchestrating a xenomapper2 run

Parameters:
  • arguments (str) – a string of unix command line arguments - used for testing
  • output (TextIO) – destination to print progress and results to - used for testing
Returns:

  • Counter – a counter of pair states
  • Counter – a counter of read states

pylazybam.tags module

Functions for extracting and decoding SAM tag data from BAM alignments

pylazybam.tags.get_AS(tag_bytes: bytes, no_tag: Any = -2147483648) → int

Extract the high scoring alignment score from an AS tag in a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • no_tag (Any) – return value for when tag not found (default: MIN32INT)
Returns:

AS Tag Value – the integer value of the AS tag returns the value of no_tag if tag absent (default:MIN32INT)

Return type:

int

Raises:

ValueError – raises a ValueError if more than one tag match

Notes

Recommended try accept for use on raw alignment with fall back to calling on only the tag byte string.

Please test carefully on your BAM output as in complicated output the regular expression based extraction of the tag can be error prone

pylazybam.tags.get_MD(tag_bytes: bytes, no_tag: Any = None) → str

Extract the MD tag from a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • no_tag (Any) – return value for when tag not found (default: None)
Returns:

MD Tag Value – an ASCII string representing the SAM format value of the MD tag returns the value of no_tag if tag absent (default: None)

Return type:

str

Raises:

ValueError – raises a ValueError if more than one tag match

Notes

The MD field aims to achieve SNP/indel calling independent of the reference. Numbers represent matches, letters bases that differ from the reference and bases preceeded by ^ are deletions. A ^T0A indicates a base change to an A immediately following a deleted T.

Recommended try accept for use on raw alignment with fall back to calling on only the tag byte string.

Please test carefully on your BAM output as in complicated output the regular expression based extraction of the tag can be error prone

pylazybam.tags.get_XS(tag_bytes: bytes, no_tag: Any = -2147483648) → int

Extract the suboptimal alignment score from an XS tag in a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • no_tag (Any) – return value for when tag not found (default: MIN32INT)
Returns:

XS Tag Value – the integer value of the XS tag returns the value of no_tag if tag absent (default:MIN32INT)

Return type:

int

Raises:

ValueError – raises a ValueError if more than one tag match

Notes

This function is for the genome aligner definition of XS where XS:i:<int> is the alignment score of the suboptimal alignment. This is not the same as the spliced aligner XS tag XS:C:<str> that represents the strand on which the intron occurs (equiv to TS:C:<str>)

Recommended try accept for use on raw alignment with fall back to calling on only the tag byte string.

Please test carefully on your BAM output as in complicated output the regular expression based extraction of the tag can be error prone

pylazybam.tags.get_ZS(tag_bytes: bytes, no_tag: Any = -2147483648) → int

Extract the suboptimal alignment score from the ZS tag in a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • no_tag (Any) – return value for when tag not found (default: MIN32INT)
Returns:

ZS Tag Value – the integer value of the ZS tag returns the value of no_tag if tag absent (default:MIN32INT)

Return type:

int

Raises:

ValueError – raises a ValueError if more than one tag match

Notes

ZS is the equivalent to XS:i:<int> tag in some spliced aligners including HISAT2.

Recommended try accept for use on raw alignment with fall back to calling on only the tag byte string.

Please test carefully on your BAM output as in complicated output the regular expression based extraction of the tag can be error prone

pylazybam.tags.get_int_tag(tag_bytes: bytes, tag: bytes, no_tag: Any = -2147483648) → int

Extract an integer format tag from a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • tag (bytes) – the two byte tag to be returned eg b’AS’ The tag must represent an integer format tag
  • no_tag (Any) – return value for when tag not found (default: MIN32INT)
Returns:

the integer value of the tag returns the value of no_tag if tag absent (default:MIN32INT)

Return type:

int

Raises:

ValueError – raises a ValueError if more than one tag match

Potential values for the tag parameter include:

AM:i:score The smallest template-independent mapping quality of any
segment in the same template as this read. (See also SM.)

AS:i:score Alignment score generated by aligner.

CP:i:pos Leftmost coordinate of the next hit.

FI:i:int The index of segment in the template.

H0:i:count Number of perfect hits.

H1:i:count Number of 1-difference hits (see also NM).

H2:i:count Number of 2-difference hits.

HI:i:i Query hit index, indicating the alignment record is the
i-th one stored in SAM.
IH:i:count Number of alignments stored in the file that contain the
query in the current record.

MQ:i:score Mapping quality of the mate/next segment.

NH:i:count Number of reported alignments that contain the query in
the current record.
NM:i:count Number of differences (mismatches plus inserted and deleted
bases) between the sequence and reference, counting only (case-insensitive) A, C, G and T bases in sequence and reference as potential matches, with everything else being a mismatch.
PQ:i:score Phred likelihood of the template, conditional on the mapping
locations of both/all segments being correct.
SM:i:score Template-independent mapping quality, i.e., the mapping
quality if the read were mapped as a single read rather than as part of a read pair or template.

TC:i: The number of segments in the template.

UQ:i: Phred likelihood of the segment,
conditional on the mapping being correct.

see https://samtools.github.io/hts-specs/SAMtags.pdf

pylazybam.tags.get_str_tag(tag_bytes: bytes, tag: bytes, no_tag: Any = None) → str

Extract an integer format tag from a raw BAM alignment bytestring

Parameters:
  • tag_bytes (bytes) – a bytestring containing bam formatted tag elements
  • tag (bytes) – the two byte tag to be returned eg b’AS’ The tag must represent an integer format tag
  • no_tag (Any) – return value for when tag not found (default: None)
Returns:

MD Tag Value – an ASCII string representing the SAM format value of the MD tag returns the value of no_tag if tag absent (default: None)

Return type:

str

Raises:

ValueError – raises a ValueError if more than one tag match

Notes

Potential values for the tag parameter include:

BQ:Z:qualities Offset to base alignment quality (BAQ), of the same
length as the read sequence. At the i-th read base, BAQi = Qi − (BQi − 64) where Qi is the i-th base quality

CC:Z:rname Reference name of the next hit; ‘=’ for same chromosome.

E2:Z:bases The 2nd most likely base calls.
Same encoding and same length as SEQ.

FS:Z:str Segment suffix.

MC:Z:cigar CIGAR string for mate/next segment.

MD:Z: String for mismatching positions.

Q2:Z:qualities Phred quality of the mate/next segment sequence in the
R2 tag. Same encoding as QUAL.

R2:Z:bases Sequence of the mate/next segment in the template.

SA:Z: (rname ,pos ,strand ,CIGAR ,mapQ ,NM ;)+
Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list.
U2:Z: Phred probability of the 2nd call being wrong
conditional on the best being wrong.

RG:Z:readgroup The read group to which the read belongs.

LB:Z:library The library from which the read has been sequenced.

PG:Z:program id Program. Value matches the header PG-ID tag

PU:Z:platformunit The platform unit in which the read was sequenced.

CO:Z:text Free-text comments.

BC:Z:sequence Barcode sequence (Identifying the sample/library),
with any quality scores (optionally) stored in QT tag. The BC tag should match the QT tag in length.
QT:Z:qualities Phred quality of the sample barcode sequence in BC tag.
Same encoding as QUAL, i.e., Phred score + 33.
CB:Z:str Cell identifier, consisting of the optionally-corrected
cellular barcode sequence and an optional suffix. The sequence part is similar to the CR tag
CR:Z:sequence+ Cellular barcode. The uncorrected sequence bases of the
cellular barcode as reported by the sequencing machine, with the corresponding base quality scores (optionally) stored in CY.
CY:Z:qualities+ Phred quality of the cellular barcode sequence in CR tag
Same encoding as QUAL, i.e., Phred score + 33.
MI:Z:str Molecular Identifier. A unique ID within the SAM file
for the source molecule from which this read is derived.
OX:Z:sequence+ Raw (uncorrected) unique molecular identifier bases,
with quality scores (optionally) stored in the BZ tag.
BZ:Z:qualities+ Phred quality of the (uncorrected) unique molecular
identifier sequence in the OX tag. Same encoding as QUAL, i.e., Phred score + 33.
RX:Z:sequence+ Sequence bases from the unique molecular identifier.
These could be either corrected or uncorrected. Unlike MI, the value may be non-unique in the file.
QX:Z:qualities+ Phred quality of the unique molecular identifier
sequence in the RX tag. Same encoding as QUAL, i.e., Phred score + 33
OA:Z:(RNAME,POS,strand,CIGAR,MAPQ,NM;)+ The original alignment
information of the record prior to realignment or unalignment by a subsequent tool.

OC:Z:cigar Original CIGAR, usually before realignment.

CT:Z:strand ;type (;key (=value )?)* Complete read annotation tag
used for consensus annotation dummy features.
PT:Z:annotag(|annotag)* where each annotag matches
start;end;strand;type(;key(=value)?)* Read annotations for parts of the padded read sequence.

see https://samtools.github.io/hts-specs/SAMtags.pdf

Recommended try accept for use on raw alignment with fall back to calling on only the tag byte string.

Please test carefully on your BAM output as in complicated output the regular expression based extraction of the tag can be error prone