OT: help needed to debug Perl script
Ken D'Ambrosio
ken at jots.org
Wed Oct 17 05:18:12 UTC 2018
I'm sorry, but lacking both source data and the full script, this is
much akin to finding the black cat in the coal cellar at midnight that
isn't there. (And that's not even factoring in the fact that I left
Perl behind some eight years ago when I realized that Ruby did all the
cool things Perl did, but didn't look like line noise when you were
done. Arguments, of course, can be made for Python, but I really liked
Perl's regex handling, and Ruby pretty much maintains that.)
Since you're unable to share the data, I suggest, instead, getting the
top ~160 lines of a working dataset and a non-working dataset, and try
to see what's going wrong, where. Perl may not be cool, but I promise
you, it's not suddenly changing its mind on how to handle stuff.
Something is inconsistent between the datasets. Pay special attention
for possible unicode intrusion, which can be tricky to detect.
-Ken
On 2018-10-17 01:05, M. Fioretti wrote:
> Greetings,
>
> A few weeks ago I quickly put together a Perl script to parse big CSV
> files, for a project I am working on (I need to do this several times
> a day, always with new data). All was fine until yesterday, when the
> script started behaving in a consistent, but totally wrong way.
>
> The script runs with "use strict" and -w switch, but I only get a few
> warnings for using uninitialized values in certain statements.
>
> The relevant part of the code is this:
>
> 147 my $keycounter = 1;
> 148
> 149 foreach my $qtq (sort keys %all) {
> 150
> 151 printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
> 152 $keycounter++;
> 153 }
> 154
> 155 foreach my $qq (sort keys %all) {
> 156 $url = $qq;
> 157 print "\nADDINGURX: $url;\n";
> 158 print "\nADDINGURQ: $qq;\n";
>
> lines 157, 158 and from 147 to 153 are added only for diagnostics.
> What happens is that, when I dump the script output to a file, i.e.:
>
> ./myscript.pl > logfile
>
> then:
>
> a) logfile contains 26k+ lines starting with "ALLCHECK" = the %all
> hash contains 26k+ keys (
>
> b) the *same* logfile contains:
>
> ~4700 lines starting with ADDINGURX
> ZERO lines starting with ADDINGURQ
>
> in other words:
>
> the script worked perfectly for weeks. Starting yesterday, the same
> script says in line 151 that
> the hash has 26k keys, and 5 lines later, that the keys ofthe same
> hash are only 4700???
>
> I honestly have no idea of what is happening, or of why it only
> started happening now. The input CSV files (which I cannot share,
> sorry, not my data...) are different every time, so I initially
> thought that the last ones contained some weird character that
> confuses my code. But if that were the case, even the first printing
> statement would only print ~4700 lines.
>
> So, any help is appreciated,
>
> Thanks,
> Marco
More information about the ubuntu-users
mailing list