|
|
biosed |
biosed is a simple sequence editing utility that searches for a target subsequence in one or more input sequences and replaces it with an insert subsequence, or optionally, just deletes the target subsequence where found. If the target subsequence occurs more than once, then each instance of the target is replaced or deleted.
The -position option allows a sequence position to be specified as an additional constraint for the match: a replacement / deletion only occurs if the start of a match is at the specified -position position.The target subsequence is just a short, literal sequence of characters. biosed cannot interpret cannot any sort of an ambiguity pattern such as a regular expression. A simple string match is done between the target and input sequences. If there is an exact matches then the replacement or deletion is done. The matching is case insensitive, independent of the case of both the input sequences and target.
Replace all 'T's with 'U's to create an RNA sequence
% biosed tembl:x65923 x65923.rna -target T -replace U Replace or delete sequence sections |
Go to the input files for this example
Go to the output files for this example
Example 2
Replace all 'PPP' protein motifs with 'XXPPPXX'
% biosed tsw:amir_pseae amir_pseae.pep -target PPP -replace XXPPPXX Replace or delete sequence sections |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
[-sequence] seqall (Gapped) sequence(s) filename and optional
format, or reference (input USA)
-target string [N] Sequence section to match (Any string is
accepted)
* -replace string [A] Replacement sequence section (Any string
is accepted)
[-outseq] seqout [
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
(Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
| -target | Sequence section to match | Any string is accepted | N |
| -replace | Replacement sequence section | Any string is accepted | A |
| [-outseq] (Parameter 2) |
Sequence filename and optional format (output USA) | Writeable sequence | <*>.format |
| Additional (Optional) qualifiers | Allowed values | Default | |
| -position | Sequence position to match | Integer 0 or more | 0 |
| Advanced (Unprompted) qualifiers | Allowed values | Default | |
| -delete | Delete the target sequence sections | Toggle value Yes/No | No |
ID X65923; SV 1; linear; mRNA; STD; HUM; 518 BP.
XX
AC X65923;
XX
DT 13-MAY-1992 (Rel. 31, Created)
DT 18-APR-2005 (Rel. 83, Last updated, Version 11)
XX
DE H.sapiens fau mRNA
XX
KW fau gene.
XX
OS Homo sapiens (human)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC Homo.
XX
RN [1]
RP 1-518
RA Michiels L.M.R.;
RT ;
RL Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases.
RL L.M.R. Michiels, University of Antwerp, Dept of Biochemistry,
RL Universiteisplein 1, 2610 Wilrijk, BELGIUM
XX
RN [2]
RP 1-518
RX PUBMED; 8395683.
RA Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.;
RT " fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as
RT an antisense sequences in the Finkel-Biskis-Reilly murine sarcoma virus";
RL Oncogene 8(9):2537-2546(1993).
XX
DR H-InvDB; HIT000322806.
XX
FH Key Location/Qualifiers
FH
FT source 1..518
FT /organism="Homo sapiens"
FT /chromosome="11q"
FT /map="13"
FT /mol_type="mRNA"
FT /clone_lib="cDNA"
FT /clone="pUIA 631"
FT /tissue_type="placenta"
FT /db_xref="taxon:9606"
FT misc_feature 57..278
FT /note="ubiquitin like part"
FT CDS 57..458
FT /gene="fau"
FT /db_xref="GDB:135476"
FT /db_xref="GOA:P35544"
FT /db_xref="GOA:P62861"
FT /db_xref="HGNC:3597"
FT /db_xref="UniProtKB/Swiss-Prot:P35544"
FT /db_xref="UniProtKB/Swiss-Prot:P62861"
FT /protein_id="CAA46716.1"
FT /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAG
FT APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTG
FT RAKRRMQYNRRFVNVVPTFGKKKGPNANS"
FT misc_feature 98..102
FT /note="nucleolar localization signal"
FT misc_feature 279..458
FT /note="S30 part"
FT polyA_signal 484..489
FT polyA_site 509
XX
SQ Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other;
ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc 60
agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg 120
cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc 180
tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc 240
tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 300
gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga 360
agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca 420
cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc 480
tctaataaaa aagccactta gttcagtcaa aaaaaaaa 518
//
|
ID AMIR_PSEAE Reviewed; 196 AA.
AC P10932;
DT 01-JUL-1989, integrated into UniProtKB/Swiss-Prot.
DT 08-DEC-2000, sequence version 2.
DT 20-MAR-2007, entry version 55.
DE Aliphatic amidase regulator.
GN Name=amiR; OrderedLocusNames=PA3363;
OS Pseudomonas aeruginosa.
OC Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales;
OC Pseudomonadaceae; Pseudomonas.
OX NCBI_TaxID=287;
RN [1]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RC STRAIN=PAC433;
RX MEDLINE=89211409; PubMed=2495988; DOI=10.1016/0014-5793(89)80249-2;
RA Lowe N., Rice P.M., Drew R.E.;
RT "Nucleotide sequence of the aliphatic amidase regulator gene (amiR) of
RT Pseudomonas aeruginosa.";
RL FEBS Lett. 246:39-43(1989).
RN [2]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228;
RX MEDLINE=20437337; PubMed=10984043; DOI=10.1038/35023079;
RA Stover C.K., Pham X.-Q.T., Erwin A.L., Mizoguchi S.D., Warrener P.,
RA Hickey M.J., Brinkman F.S.L., Hufnagle W.O., Kowalik D.J., Lagrou M.,
RA Garber R.L., Goltry L., Tolentino E., Westbrock-Wadman S., Yuan Y.,
RA Brody L.L., Coulter S.N., Folger K.R., Kas A., Larbig K., Lim R.M.,
RA Smith K.A., Spencer D.H., Wong G.K.-S., Wu Z., Paulsen I.T.,
RA Reizer J., Saier M.H. Jr., Hancock R.E.W., Lory S., Olson M.V.;
RT "Complete genome sequence of Pseudomonas aeruginosa PAO1, an
RT opportunistic pathogen.";
RL Nature 406:959-964(2000).
RN [3]
RP CHARACTERIZATION.
RX MEDLINE=95286483; PubMed=7539417;
RA Wilson S.A., Drew R.E.;
RT "Transcriptional analysis of the amidase operon from Pseudomonas
RT aeruginosa.";
RL J. Bacteriol. 177:3052-3057(1995).
RN [4]
RP X-RAY CRYSTALLOGRAPHY (2.25 ANGSTROMS) OF COMPLEX WITH AMIC.
RC STRAIN=PAC1;
RX MEDLINE=99437995; PubMed=10508151; DOI=10.1093/emboj/18.19.5175;
RA O'Hara B.P., Norman R.A., Wan P.T., Roe S.M., Barrett T.E., Drew R.E.,
RA Pearl L.H.;
RT "Crystal structure and induction mechanism of AmiC-AmiR: a ligand-
RT regulated transcription antitermination complex.";
RL EMBO J. 18:5175-5186(1999).
CC -!- FUNCTION: Positive controlling element of amiE, the gene for
CC aliphatic amidase. Acts as a transcriptional antitermination
[Part of this file has been deleted for brevity]
CC -!- SIMILARITY: Contains 1 ANTAR domain.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; X13776; CAA32023.1; -; Genomic_DNA.
DR EMBL; AE004091; AAG06751.1; -; Genomic_DNA.
DR PIR; B83226; B83226.
DR PIR; S03884; S03884.
DR PDB; 1QO0; X-ray; D/E=1-196.
DR IntAct; P10932; -.
DR GenomeReviews; AE004091_GR; PA3363.
DR KEGG; pae:PA3363; -.
DR BioCyc; PAER287:PA3363-MONOMER; -.
DR InterPro; IPR005561; AmiR_NasR_reg.
DR InterPro; IPR011006; CheY_like.
DR InterPro; IPR008327; Res_reg_antiterm.
DR Pfam; PF03861; ANTAR; 1.
DR PIRSF; PIRSF036382; RR_antiterm; 1.
DR PROSITE; PS50921; ANTAR; 1.
KW 3D-structure; Complete proteome; Transcription;
KW Transcription antitermination; Transcription regulation.
FT CHAIN 1 196 Aliphatic amidase regulator.
FT /FTId=PRO_0000064582.
FT DOMAIN 129 190 ANTAR.
FT CONFLICT 48 48 S -> A (in Ref. 1).
FT CONFLICT 64 64 R -> G (in Ref. 1).
FT CONFLICT 141 141 E -> D (in Ref. 1).
FT CONFLICT 154 154 A -> V (in Ref. 1).
FT CONFLICT 170 170 Y -> H (in Ref. 1).
FT HELIX 3 8
FT HELIX 9 12
FT STRAND 14 19
FT HELIX 23 35
FT STRAND 38 42
FT STRAND 54 59
FT HELIX 65 75
FT STRAND 81 86
FT HELIX 91 100
FT STRAND 103 109
FT HELIX 112 114
FT HELIX 115 160
FT HELIX 164 175
FT TURN 176 179
FT HELIX 182 189
SQ SEQUENCE 196 AA; 21903 MW; 306A4F30E8E4C6C0 CRC64;
MSANSLLGSL RELQVLVLNP PGEVSDALVL QLIRIGCSVR QCWPPPESFD VPVDVVFTSI
FQNRHHDEIA ALLAAGTPRT TLVALVEYES PAVLSQIIEL ECHGVITQPL DAHRVLPVLV
SARRISEEMA KLKQKTEQLQ ERIAGQARIN QAKALLMQRH GWDEREAHQY LSREAMKRRE
PILKIAQELL GNEPSA
//
|
The sequence will be in uppercase.
>X65923 X65923.1 H.sapiens fau mRNA UUCCUCUUUCUCGACUCCAUCUUCGCGGUAGCUGGGACCGCCGUUCAGUCGCCAAUAUGC AGCUCUUUGUCCGCGCCCAGGAGCUACACACCUUCGAGGUGACCGGCCAGGAAACGGUCG CCCAGAUCAAGGCUCAUGUAGCCUCACUGGAGGGCAUUGCCCCGGAAGAUCAAGUCGUGC UCCUGGCAGGCGCGCCCCUGGAGGAUGAGGCCACUCUGGGCCAGUGCGGGGUGGAGGCCC UGACUACCCUGGAAGUAGCAGGCCGCAUGCUUGGAGGUAAAGUUCAUGGUUCCCUGGCCC GUGCUGGAAAAGUGAGAGGUCAGACUCCUAAGGUGGCCAAACAGGAGAAGAAGAAGAAGA AGACAGGUCGGGCUAAGCGGCGGAUGCAGUACAACCGGCGCUUUGUCAACGUUGUGCCCA CCUUUGGCAAGAAGAAGGGCCCCAAUGCCAACUCUUAAGUCUUUUGUAAUUCUGGCUUUC UCUAAUAAAAAAGCCACUUAGUUCAGUCAAAAAAAAAA |
>AMIR_PSEAE P10932 Aliphatic amidase regulator. MSANSLLGSLRELQVLVLNPPGEVSDALVLQLIRIGCSVRQCWXXPPPXXESFDVPVDVV FTSIFQNRHHDEIAALLAAGTPRTTLVALVEYESPAVLSQIIELECHGVITQPLDAHRVL PVLVSARRISEEMAKLKQKTEQLQERIAGQARINQAKALLMQRHGWDEREAHQYLSREAM KRREPILKIAQELLGNEPSA |
biosed was inspired by the useful UNIX utility sed which searches for a pattern in text and can replace or delete the found pattern.
No check for correct type (protein, nucleic, gapped etc) is made on the replacement sequence so you must ensure it is of the type required. Any text can be used, including characters only used in proteins (e.g. D, E, F, etc.), characters rarely used in proteins (e.g. U, J, O, etc), digits and punctuation characters.