Commands for Preassembly and Assembly

Name

Description

Parameter Name

Parameter Description

Examples

assemble

Same as using the Assemble button from the Unassembled Sequences window. Instructs SeqMan to begin your assembly.

If you have not yet added sequences to your project using the loadSeq command, then including one of the three file parameters below is required. Using one of the file parameters alone adds the specified file(s) to the project and assembles without trimming, following the current program values for contaminant screening and optimizing assembly order.

Example 1: The following will assemble the segment of Sequence 1 from 50-250 bp and the segment of Sequence 2 from 180 bp through the right end of that sequence:

 

assemble

          file:"datafolder\sequence1.abi" endpnt:(50>250)

          file:"datafolder\sequence2.abi" endpnt:(180>rend)

 

 

Example 2: The following will assemble all sequences contained in the folder “SampleSeqs” and its subfolders:

 

assemble

file:"DNASTAR\SampleSeqs" expandDir:true

 

 

Example 3: The following will 1) create a new project, 2) assemble all files from the folder “Demo SeqMan Pro” using the preassembly conditions of vector scanning, end trimming and assembly order optimization, and 3) save the project as “ScriptProject.sqd”:

 

newProject

assemble

file:"C:\Program Files\DNASTAR\Demo SeqMan Pro" expandDir:false

          optimizeOrder:true

          contamScan:false

          vectScan:true

          trimEnds:true

          doAssemble:true

saveProject   

file:"C:\Finished Projects\ScriptProject.sqd"

 

file:"pathname\filename.ext"

Adds a single sequence. The pathname\filename must exist, must have a valid sequence extension replacing the “.ext” extension, and must be enclosed in quotes.

 

The following optional values can be placed after the file parameter to specify that only a subset of the sequence be added.

 

      endpnt:(49>538) – Would add only the segment of the sequence between the specified coordinates: in this example, the subset of the sequence from coordinates 49 to 538.* The paired parentheses must be enclosed in quotes You may want to be liberal in the actual numbers used, as DNASTAR’s scripting language uses the exact values you input. In this case, you might expand the range to 30>600.

 

      endpnt:(538<49) – Would add the complement of the sequence between the specified coordinates. Since SeqMan considers both orientations of all input reads when computing assemblies, though, there is no need to specify complementary strands.

 

      endpnt:(lend>rend) - Adds the entire sequence, from the left end to the right end of the sequence. This works the same as omitting the endpnt parameter entirely. The paired parentheses must be enclosed in quotes.

file:"pathname\filename.fof"

Adds a file of filenames. The pathname\filename must exist, must have a valid extension—usually “.fof”—and must be enclosed in quotes. When using a file of filenames, SeqMan assumes that vector trimming has been completed. Therefore, using a file of filenames for an assembly will prevent some other options (e.g., extend ends) from functioning as expected.

file:"pathname\folder"

Adds an entire folder. The pathname\folder must exist and must be enclosed in quotes.

Use the following optional expandDir values to specify whether or not to use sequences located in sub-folders:

 

expandDir:true

Includes sequences located in subfolders within the specified data folder.

expandDir:false

Excludes sequences located in subfolders within the specified data folder.

To change preassembly options, include one or more optional OptimizeOrder parameters for the assemble command:

OptimizeOrder:true

Optimizes assembly order.

OptimizeOrder:false

Does not optimize assembly order.

contamScan:true

Executes contaminant screening, providing one or more valid contaminant sequences are specified.

contamScan:false

Does not execute contaminant screening.

vectScan:true

Executes vector scanning and trimming, providing a default vector is specified.

vectScan:false

Does not execute vector scanning and trimming.

trimEnds:true

Executes quality trimming.

trimEnds:false

Does not execute quality trimming.

doAssemble:true or noAssemble:false

Either parameter causes assembly to be executed.

doAssemble:false or noAssemble:true

Either parameter causes assembly not to be executed. This is useful if you want to perform trimming and perhaps save a file of filenames and a trim report, but do not wish to assemble data yet.

assembleInGroups

Same as the Assemble in Groups feature, accessed from the Unassembled Sequences window.

 

If the command loadReference appears in your script before the assembleinGroups command, then the specified reference sequence will be added to each contig in the assembly.

file:[quoted file name]

Specifies the directory and file name of the sequence file(s) to be loaded. A folder may also be specified, in which case all of the sequence files within that folder will be loaded. The directory and file/folder name must be enclosed in quotes. If a folder is specified, a final backslash “\” is necessary. This parameter is not necessary if loadSeq is used.

If your sequences have names like 1A_forward.abi, 1B_forward.abi, 1A_reverse.abi, 1B_reverse.abi, etc., the following command specifies that the forward and reverse reads for the 1A sequence be assembled together, the forward and reverse reads for the 1B sequence be assembled together, and so on.

 

assembleinGroups

          file: "Data\"

          match: "(\d[abc]).*.abi"

match: [quoted grep expression]

Specifies the portion of the file name that SeqMan should use to identify the sequence as part of a group using a subset of regular expressions which utilizes elements of the Grep language.

include

References an existing script within the current script. The include command should be placed in the current script where you normally enter the parameters contained within the script. If the referenced script contains a new set of assembly parameters, enter it in place of the setParam command.

file: "pathname\scriptname.script" (required)

Specifies the directory and file name of the referenced script. The directory and file name must be enclosed in quotes.

include

          file: "C:\abc_Project\new parameters script.script"

runScript

This optional runScript command allows you to run a table script within the current script. A table script references variable values for specified parameters and other elements in a script. This enables you to run multiple projects from the same script, substituting new parameter values and other variables each time. SeqMan will run the table script repeatedly, using the variable values from one row of the table for each iteration of the script until all of the rows have been used. For more information, see the Using Table Scripts in SeqMan section.

script (required)

Specifies the directory and file name of the table script to be run.

runScript

          script: "C:\abc_Project\abc_script.script"

          table: "C:\abc_Project\table.txt"

table (required)

Specifies the delimited text file containing the variable values.

setPairSpecifier

Defines up to 6 pair specifiers for the sequences in your assembly. Pair specifiers define the naming convention for sequence pairs, as well as your requirement for a minimum and maximum distance between the opposite ends of your inserts.

pairs: {forward:"f1" reverse:"r1" min: 701 max: 1001}

Expressions for forward and reverse naming conventions should be created using the Paired End Specification Language.

The following script sets up 2 pair specifiers with different size ranges:

 

setPairSpecifier

pairs:{

{forward:"(.*)(2kb)(.*)-FP.*$" reverse:"(.*)(2kb)(.*)-RP.*$" min: 1500 max: 2500}

{forward:"(.*)(8kb)(.*)-FP.*$" reverse:"(.*)(8kb)(.*)-RP.*$" min: 7000 max: 9000}

}

setParam

Defines the assembling parameters to use for your assembly.

assemblyMethod

Specifies which assembly method to use. Value may be pro or classic. Default Classic.

 

minMatchPercent

Specifies the Minimum Match Percentage. Default 80.

matchSize

Specifies the Match Size parameter. Default Pro: 25. Default Classic: 12.

gapPenalty

Specifies the Gap Penalty parameter. Default 0.25.

gapLengthPenalty

Specifies the Gap Length Penalty parameter. Default 0.70.

snpThresh

Specifies the Variant Discovery parameter of Heterozygous Peak Threshold. Default 50.

coverageThresh:4

Specifies the Strategy Viewing parameter of Coverage Threshold. Default 4.

matchSpacing

(Pro assembler only.) Specifies the Match Spacing parameter. Default 150.

maxMismatchEnd

(Pro assembler only.) Specifies the Maximum Mismatch End Bases parameter. Default 15.

UseRepeatHandling

(Pro assembler only.) Value may be true or false. When true, the Pro assembler uses the Repeat Handling parameters. Default False.

coverageType

 

(Pro assembler only.) Specifies the type of coverage to be used for Repeat Handling. Value may be Frag, which uses the length of the fragment being assembled to calculate the expected coverage or Fixed which uses a fixed value as the expected coverage. Default Fixed.

fixedCoverage

(Pro assembler only.) Specifies the Fixed Value for Repeat Handling Coverage. Default 6.

fragLength

(Pro assembler only.) Specifies the Fragment Length for Repeat Handling Coverage. Default 0.

MatchRepeatPercent

(Pro assembler only.) Specifies the Match Repeat Percentage for Repeat Handling Coverage. Default 150.

maxPadContig

(Classic assembler only.) Specifies the Maximum Added Gaps per kb in Contig parameter. Default 70.

maxPadSequence

(Classic assembler only.) Specifies the Maximum Added Gaps per kB in Sequence parameter. Default 70.

maxRegShift:70

(Classic assembler only.) Specifies the Maximum Register Shift Difference. Default 70.

minSeqLength:40

(Classic assembler only.) Specifies the Minimum Sequence Length parameter. Default 40.