Gforge Advanced Server - RNA-MATE - Open Discussion http://solidsoftwaretools.com/gf/ Gforge Advanced Server RSS Re: bug in rna2map 0.5: invalid filenames generatedIlya Goldin <goldinim@upmc.edu> There's another bug related to directory names for the split genome: RNA_matching_analysis_pipeline.pl, around line 199 The original line is:     $chr_name =~ s/.fa//g; The '.' is of course a special character in regular expressions, and should be escaped like so:     $chr_name =~ s/\.fa//g; Best, Ilya Re: bug in rna2map 0.5: invalid filenames generatedMike Kubal <mikekubal@yahoo.com> Thanks for the post Ilya. I'll share your recommendation with the tool developers. Regards, Mike Kubal Sr. FAS, SOLiD Bioinoformatics       Re: bug in rna2map 0.5: invalid filenames generatedIlya Goldin <goldinim@upmc.edu> The same bug as described in the parent message also occurs in post_genome_matching.pl around line 55:         $temp_name = substr($_ , 1);   It can be worked around using the same approach:         $temp_name = md5_hex(substr($_ , 1));   bug in rna2map 0.5: invalid filenames generatedIlya Goldin <goldinim@upmc.edu> The file RNA_pipeline_0.5.0/lib/RNA_matching_analysis_pipeline_subroutines.pm contains this code in split_genome_reference():     my $temp_line = <GENOME_FILE>;     chomp $temp_line;     $temp_name = substr($temp_line,1);     $reference[$index_refrence] = "$output_directory\/$temp_name.fa";   The code assumes that substr($temp_line,1) will produce a valid filename, which is not necessarily the case. Indeed, it failed for me with the Mus musculus NCBI v37 reference genome. One workaround is to generate a pseudo-random filename, e.g., use Digest::MD5 qw(md5_hex); ... $temp_name = md5_hex(substr($temp_line,1)); Note that split_genome_reference() generates invalid filenames using this technique in two different places in the code. Re: RNA2MAPteshome bizuayehu <teshome.bizuayehu@hibo.no> I found a solution from other thread 'rna2map v0.4'. The program run without error but there is no file in all of the out put directories.... /filter; ... /genome and /miRBase. So I didn't get it what goes wrong? Re: RNA2MAPteshome bizuayehu <teshome.bizuayehu@hibo.no> Dear Christian,   Thank you very much! Now it finds the files, but it shows me this error   Created /home/teshome/sample/filter Created /home/teshome/sample/miRBase Created /home/teshome/sample/genome Created /home/teshome/sample/miRBase/wig Total number of reads = 500000 SUBMIT: Filter SUBMIT: miRBase Failed to create script text:  at /home/teshome/sample/RNA_pipeline_0.5.0/bin/../lib/perl/PBSJob.pm line 358. I tried to locate the problem in the PBSJob.pm file line 358. it says from line 354 to 360    # Process template to produce script     my $script;     #my $tt = Template->new(); # FIXME - init once?     $tt->process( \$template, $stash, \$script );     die "Failed to create script text: " . $tt->error() unless $script;     #$self->{'script'} = $script; # don't include script in object     return $script;   What goes wrong? Thanks,   Re: RNA2MAPChristian Tellgren-Roth <chtellgren@gmail.com>  Hi Teshome, the parameter -corona has to point to the installation of RNA2MAP and not to the corona installation! And then check the spelling of all the path information in the config file (capitals at the beginning of the folder names) /Christian RNA2MAPteshome bizuayehu <teshome.bizuayehu@hibo.no> I have a problem anyone can help me.  I tried to run RNA2map v 0.5.0. I downloaded the program and example file. I tried to run the example file here is all what I did teshome@bok-desktop:~$ perl /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/RNA_matching_analysis_pipeline.pl -r /home/teshome/Sample_run/example_Barcode_01_0032.csfasta -c /home/teshome/Sample_run/config_file_example.txt -corona /home/teshome/Sample_run/Corona_Lite_4.2.2 -o /home/teshome/Sample_run Name "Template::Filters::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Name "Template::Context::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Name "Template::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Name "Template::Service::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Name "Template::Provider::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Name "Template::Plugins::BASEARGS" used only once: possible typo at /home/teshome/Sample_run/RNA_pipeline_0.5.0/bin/../lib/perl/Template/Base.pm line 49. Settings:     reads file = /home/teshome/Sample_run/example_Barcode_01_0032.csfasta     configuration file = /home/teshome/Sample_run/config_file_example.txt     package directory = /home/teshome/Sample_run/Corona_Lite_4.2.2     output_directory = /home/teshome/Sample_run File home/teshome/sample_run/human_filter_reference.fasta DOES NOT EXIST. Please check the path...   BUT the human_filter_reference.fasta file and miRbase gff file are in the same folder called Sample_run, is there anything wrong?  teshome Re: Re: Consistency between stats, counts and ma files Michael Muratet <mmuratet@hudsonalpha.org>> (enter your response here) Thanks for the response. When you say "alignment of the first colors" does that mean using only one color along the tag or using the first two bases? Re: Broken link URL /srna/data.tgzSOLiDâ„¢ System Developer <soliddev@appliedbiosystems.com>   Dear Irina, The link is fixed. Please access it and let us know if you still have any questions. We will be happy to assist you. Kind Regards, Solid Software Community Staff Broken link URL /srna/data.tgzIrina Rakova <irr4@pitt.edu> No data file available. Re: Consistency between stats, counts and ma filesSOLiDâ„¢ System Developer <soliddev@appliedbiosystems.com> Dear Mike, .ma and .stats files contain "partial" results of alignment, i.e. results for alignment of the first colors of the reads. extend.ma, extend.stats, extend.counts and extend.gff contain "final" results of alignment (all colors are aligned). The number of reads that align to a unique location is reported in extend.stats file (? is this one of the questions?).   The sum of reads in extend.counts depends on the criteria used to generate the file (miRBase_step_output_read_type parameter from configuration file). There are 3 options for this coefficient: 1) "unique": only uniquely aligned reads (from extend.ma file) are used to generate counts 2) "random": for each read, an alignment location is chosen randomly and the read is counted only for chosen location 3) "all": read is counted for all matching locations.   This means that the sum of reads in extend.counts file can be one of: 1) the sum of reads in extend.counts file = the number of reads aligned to a unique location from extend.stats file (under " Uniquely Placed Beads") 2) the sum of reads in extend.counts file = total number of reads aligned from extend.stats file (under "Total Beads") 3) (each read is counted to all alignment locations) the sum of reads in extend.counts file > the number of reads aligned from extend.stats file (under "Total Beads"). This number is not reported in any other file.   Please let us know if you still have any questions. we will be happy to assist you.   Kind Regards, Solid Software Community Staff   Consistency between stats, counts and ma filesMichael Muratet <mmuratet@hudsonalpha.org> Hello again I've been comparing the number of entries in .counts, .gff and .ma files and trying to reconcile those numbers with what's in .stats files. The number of reads in .counts sums to the number of rows in .gff (e.g., 580678) but the closest number in .stats is 581467. The sum of total reads in _extend.stats is equal to the number of records in _extend.ma (e.g. 803586). Is there a way to know what's the subset of unique relative to total reads? Are these values defined somewhere? Thanks Mike Definition of the extend suffix?Michael Muratet <mmuratet@hudsonalpha.org> Greetings Can anyone tell me the formal definition of the 'extend' suffix, e.g., path/file.csfasta.extend.counts.35.3? What's the difference between csfasta and csfasta.extend? Thanks Mike Tav: What is extension step?Marcel Willemsen <a.m.willemsen@amc.uva.nl> Hi Richard, Maybe this is helpfull (see attachment page 9). The latest version of RNA2Map is 0.5.0. Cheers, Marcel Error running RNA_pipeline_0.5.0Yu Sun <ysun25@its.jnj.com> Hi, I got an error while testing out RNA_pipeline_0.5.0: > perl  bin/RNA_matching_analysis_pipeline.pl -r data/data_release_01092009/example_Barcode_01_0032.csfasta -c config_file_test.txt -corona /polyserve/corona_lite_v4.0r2.0/RNA_pipeline_0.5.0/ -o Test_out/ Settings:         reads file = data/data_release_01092009/example_Barcode_01_0032.csfasta         configuration file = config_file_test.txt         package directory = /polyserve/corona_lite_v4.0r2.0/RNA_pipeline_0.5.0/         output_directory = Test_out/ Created Test_out//filter Created Test_out//miRBase Created Test_out//genome Created Test_out//miRBase/wig Total number of reads = 500000 SUBMIT: Filter SUBMIT: miRBase Failed to create script text:  at /polyserve/corona_lite_v4.0r2.0/RNA_pipeline_0.5.0/bin/../lib/perl/PBSJob.pm line 358.   I checked in the output directory and found that the first script "output_Filter_1.sh" has been generated but the second one "output_miRNA_1.sh" was not.  Has any of you met with such an error before?  Any suggestion on how to fix it is appreciated.  Thanks. Yu     What is extension step?Richard Casey <richard.casey@colostate.edu> Hi, Does anyone know what the extension step does in the smRNA pipeline? The smRNA v.0.3 states: ------------------- Extension: For each seed we estimate adaptor starting position as follows: if the adaptor starts at position n then the read (35 colors long) is compared with the “hypothetical” sequence composed of n colors from the reference (with same start point as the seed) followed by the first 35 – n colors from the adaptor. The actual read is compared (full 35 bases long) to the “hypothetical” one and the number of mismatches is recorded. The location n0 giving the smallest number of mismatches is considered adaptor start point, while the number of mismatches from full length of the read is associated with the starting seed. Seeds of the same read producing the same smallest number of mismatches are reported as hit locations of the read. The .bc files contain a column with explicit fragment length (read with trimmed adaptor). .ma and .gff files contain the color/base reads with adaptor trimmed. ------------------- I've read this several times but it is still not clear exactly what the extension step accomplishes. Richard Re: Another bug RNA2MAP v.0.5.0Arthur Liu <a.liu@victorchang.edu.au> The miRBase hsa.gff contains references to the Human genome. You must use the Human genome to generate the miRNA precursor sequences. After that, you can use the human miRNA precursors together with the viral genome by skipping the precursor extraction. Another bug RNA2MAP v.0.5.0Richard Casey <richard.casey@colostate.edu> Hi, There is a bug in RNA2MAP v.0.5.0.  We are testing viral reference genome (attached; M20558) with Mirbase human microRNA (attached; hsa.gff).  If we skip the mirbase step the app runs to completion.  If we include the mirbase step the app hangs and generates error messages Use of uninitialized value in scalar chomp at /apps/RNA_pipeline_0.5.0/bin/extract_precursor_sequences_from_genome.pl line 73, <GENOME_FILE> line 2. Use of uninitialized value in pattern match (m//) at /apps/RNA_pipeline_0.5.0/bin/extract_precursor_sequences_from_genome.pl line 74, <GENOME_FILE> line 2. Use of uninitialized value in length at /apps/RNA_pipeline_0.5.0/bin/extract_precursor_sequences_from_genome.pl line 83, <GENOME_FILE> line 2. Use of uninitialized value in concatenation (.) or string at /apps/RNA_pipeline_0.5.0/bin/extract_precursor_sequences_from_genome.pl line 84, <GENOME_FILE> line 2. I've tracked down the offending code to the routine extract_precursor_sequences_from_genome.pl and specifically the while loop in lines 71-99.  It looks like the indexing logic is wrong when reading the reference genome file.  What is the process for fixing this bug?  Is there a development team that does this?  Or do we fix it ourselves and submit the code to RNA2MAP project? Richard bug in RNA2MAPv0.5.0Christian Tellgren-Roth <chtellgren@gmail.com> Hi, The genome matching step creates error messages if there are no matches found.  Total number of reads = 2783365  number of matched reads = 0 Use of uninitialized value in scalar chomp at /share/apps/RNA_pipeline_0.5.0//bin/output_wig_from_single_sequence_reference.pl line 152. Use of uninitialized value in split at /share/apps/RNA_pipeline_0.5.0//bin/output_wig_from_single_sequence_reference.pl line 153. Use of uninitialized value in concatenation (.) or string at /share/apps/RNA_pipeline_0.5.0//bin/output_wig_from_single_sequence_reference.pl line 173. + RETVAL=0 Regards, Christian -- Dr. Christian Tellgren-Roth Department of Genetics and Pathology Rudbecklaboratoriet Uppsala University 751 85 UPPSALA Sweden Tel: +46 18 471 49 69 Fax: +46 18 471 48 08 e-mail: chtellgren@gmail.com