For an example of an input file with the various sections completed see here. For a detailed explanation of each section in the input files, see below.
Note: This page lists the most common inputs only. SAPP also allows more advanced users to tailor more complex functions – see the Advanced Inputs page for more details.
This is the only required section in the input file and, as its name implies, is a list of kit numbers and the associated STR results. STRs should be listed in Family Tree DNA order with only four values for DYS464 (i.e. no 464e/f/g). Kit labels may be numeric or contain letters but should NOT include the characters “-‘ or ‘.’.
Beyond these restrictions the program is very forgiving; you can separate the STR results onto different lines or include multi-copy marker designations like “8-10” or “15-16-16-17” and SAPP will correctly interpret the line.
This optional section gives the SNP results for each kit. There should be one line per kit of the format “Kit (SNP1+ SNP2- SNP3?…)” where the special characters indicate whether those kits are positive (+), negative(-), or unknown(?) for that SNP. The parentheses are optional if all SNP labels fit on the same line after the kit name. SNPs without a special character mark are assumed to be positive. Any SNPs NOT included in the input file are assumed to be unknown.
Besides marking a SNP as positive (+), negative (-) or unknown (?), you can also mark it with an asterisk (*). This indicates that the kit was positive for that SNP but negative for ALL known SNPs below it (meaning that SNP is currently that kit’s “terminal SNP”).
If the SNP has no official label you can create your own with the position reference like “UN16345607” or any other label that makes sense to you. Do NOT use the format “16345607C-G” as a label since that contains the “-” special character and SAPP will interpret it incorrectly.
You do not need to list ALL SNPs tested for particular kits or SNPs for EVERY kit. SAPP will try to map every SNP you list but the only ones that really matter here are SNPs shared by some in the group and not others since those help define which kits are more closely related than others. Private SNPs (only found in one kit) or SNPs shared by ALL in the group are not particularly useful in re-creating the connections between group members.
Note that SAPP has an internal representation of certain major haplogroups for the known Y-SNP tree (as of this writing, mainly R1b-U106, R1b-L2, and R1b-L21) and will use it to fill in any un-indicated references – for instance a kit marked “+” for DF21 would by definition be L513-, and so on. It will however only recognize these shared SNPs if you use their traditional SNP label (some regular synonyms are also recognized).
This optional section indicates shared ancestors between kits in the group. There should be one line per ancestor with the format “AncestorLabel (kit1+ kit2- kit3?)”, where “AncestorLabel” is any meaningful label for that ancestor and the “+”, “-“, and “?” characters indicate whether those kits are descendants, NOT descendants, or unknown for that ancestor. Kits without a special character are assumed to be positive. Any kits not included are assumed to be NOT descendants.
You can also use this section to check what happens to the tree if certain groups are more closely related than others. For example, you could put all the kits for a certain surname under one “ancestor” (like: “Smiths (kit1 kit2 kit3…)” ). If you did that without any kits marked as ?, however, remember that you’re telling SAPP that no other kits could be NPEs off the Smith families.
Use this optional section to designate a starting Modal haplotype different from the Group Modal that SAPP calculates from the most frequent values in the STR input. Since the Modal is the starting point at the highest point of the tree, selecting the right values can significantly change the evolution of STRs down the various branches.
This optional section adjusts the TMRCAs for particular nodes in the tree. Use it after the tree is complete to correct the TMRCAs for known ancestors. SAPP will also adjust earlier and later TMRCAs in the tree to compensate for the correction. Adjusted TMRCAs will show with a 0 error range.
The format is “NODE nn yyyy”, where “nn” is the node number and “yyyy” is the birth year (or any close year) for that common ancestor. Use a different line for each node to calibrate. Note that SAPP will display an approximate number of generations for calibrated years also.
Use this optional section to indicate STRs for SAPP to skip when building the tree and calculating genetic distances. You might typically want to do this for fast-mutating STRs like 464 or CDY where their multiple variations may be confusing. Specify any STRs to skip on one line with spaces in between; you can use either their full names (“DYS464a”) or short-hand designations (“464a”, etc). However if you want to skip multi-copy markers, please indicate ALL the associated STR marker names (to skip 464 for instance put “464a 464b 464c 464d”).
Use this optional section to put additional information within each kit’s box. For instance if your kits are labeled by their reference id, you could list surnames here or project group information. The field will wrap in the box but too much information may become unreadable or overlap with other text. The format is simply “kit text” each on one line where kit is the kit id used in the other sections and “text” can be any text including spaces. You can also put text in node (branching) boxes by using a node’s number in place of “kit”.
Another optional section to show information against various kits and nodes (branches), this one is used to attach common labels to groups of kits or nodes. These labels are not used in building the tree but can be useful reference information. The format here is “Group Text (kit1 kit2 kit3)” where “Group Text” is the label (with spaces as desired) and “kit1 kit2 kit3” are any number of kits (or node numbers) to label. The labels will appear above the boxes in red text (or just in red text in the text output)
This cosmetic optional section is used AFTER the tree is complete to remove unwanted nodes. Please note this is a cosmetic change only and the mutations and TMRCAs on the tree will not change. Removing a node may also eliminate the SNPs or ancestor information which are marked on that node! The format here is “NODE n”, one on each line of the section, where “n” is the number of the node to remove (note: the Group MRCA node and kit nodes cannot be removed)
Another cosmetic optional section, this changes the colors for certain kits or nodes. SAPP has 10 predefined colors you can use or you can supply the RGB values for any color you like. The format is “nn (kit1 kit2 kit3…)” where “nn” is a number from 1-10 and the kits are listed within optional parentheses. You can also supply colors as “r g b (kit1 kit2 kit3…)” where “r”, “g”, and “b” are the three numbers between 1-255 designating a color by RGB value. To include a node in the list of kits, enter its node number (e.g. “1 (f0001 f0002 44)” would change the color of node 44 as well as kits f0001 and f0002).
This section can be used to override certain normal behaviors in SAPP. At this time the available settings are listed below. Each of these one-word options should be specified on its own line, with no other parameters on that line.
TEXTOUT This parameter causes SAPP to produce the phylogenetic tree in a text-based (HTML) format rather than as a PNG image.
CSVOUT This parameter causes SAPP to produce the phylogenetic tree as a CSV file with data on each kit or node in columns rather than as a PNG image.
NEWICK This parameter causes SAPP to produce the phylogenetic tree in Newick format (TXT file) with branch lengths in generations rather than as a PNG image.
NOTMRCAS This parameter suppresses the reporting of estimated TMRCAs on the output image or text tree.
NODBTREE This parameter suppresses the use of SAPP’s internal SNP tree. Use only as an advanced option if you want SAPP to rely ONLY on the /SNPTREE section you entered.
SHOWMODALS This parameter causes SAPP to additionally report (in the table of kit data) the calculated haplotypes for all nodes with SNP or Genealogy labels.
SHOWMUTATIONS This parameter adds a new table in the HTML output that calculates the effective STR mutation rates in the output tree and compares them to the general rates used. The total span in years of the tree is also displayed using the TMRCA that was calculated for the Group MRCA down to present day.
SHOWREASONS This parameter displays text under each box giving a Grade (“A” highest to “C” lowest) as a confidence assessment for that box’s placement on the tree.
REPORTNULLS This parameter turns on reporting of null values (‘N’) as mutations. Normally SAPP triangulates a value for these positions and considers Nulls as no change in mutation.
NOWARN This parameter suppresses the display of Warning messages in the output.
NOSHRINK This parameter expands the output tree to show ALL branches between any two kits or nodes.
PFACTOR n This advanced parameter adjusts the sensitivity of the signature recognition algorithm. The default setting is 80. Enter smaller whole numbers for “n” to relax the sensitivity to recognize single faster STRs as signatures. A Pfactor of less than 4 will recognize even CDYa/b as a signature. Enter numbers higher than 80 to reduce the sensitvity.
YEARSPERGEN n This advanced parameter adjusts the years-per-generation when calculating TMRCAs, normally set at 28 years per generation.
Options affecting display of node text and STRs, SNPs and Genealogy labels:
NOCUTOFF This parameter stops the cut-off of the STR mutation history and SNP/Genealogy Labels. All text past the (…) will be displayed. Warning: this may run over or under other parts of the tree!
OFFRIGHT This parameter moves the reported STR mutations to the right of each node, so they don’t cover the kit or node information if they run long.
OFFLABELS This parameter moves the reported SNP/Genealogy labels to the right of each node, so they don’t cover the kit or node information if they run long.
NOLABELS This parameter suppresses the SNP/Genealogy labels displayed on the output PNG tree branches (but is ignored for the text tree).
NOSTRS This parameter suppresses the STR mutations displayed on the output PNG tree branches (but is ignored for the text tree).
NOINFO This parameter suppresses the use of text from the /INFO section if this section exists. This provides a simple way to exclude unwanted text for instance from the NEWICK or other formats.
Any line that starts with an asterisk (*) is interpreted by the program as a comment line and ignored. Apart from allowing you to annotate your input file, this also allows you to “turn off and on” certain sections between runs by “commenting them out” – i.e. if you add an asterisk at the start of each line of that section you can run SAPP without that section, then add it back in for the next run by taking out the asterisks. This allows you to quickly change input without losing the sections.