Statistics::Descriptive - Grundlegende Berechnungen der Beschreibenden Statistik

Veröffentlicht von Thomas Fahle am 16.08.2009 (Permalink)

Statistics::Descriptive von Shlomi Fish bietet einen einfachen Zugriff auf die grundlegenden Berechnungen der Beschreibenden Statistik, wie Median, Mittelwerte, Summe, Standardabweichung und Varianz, Häufigkeitsverteilung, Percentile bzw. Quantile und lineare Regresionsanalyse.

Beispiel:

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Descriptive;

my @messwerte = qw( 17 19 30 22 24 16 12 15 18 20 25 13 29 28 27 26 );

my $stat = Statistics::Descriptive::Full->new();

$stat->add_data( @messwerte ) ;

display_results();

sub display_results {

        # Returns the number of data items.
    print 'Anzahl Elemente: ',  $stat->count(), "\n";
    
        # Returns the mean of the data.
    print 'Arithmetischer Mittelwert: ',  $stat->mean(), "\n";

        # Sorts the data and returns the median value of the data.
    print 'Median: ',  $stat->median(), "\n";

        # Returns the harmonic mean of the data. 
    print 'Harmonischer Mittelwert: ', $stat->harmonic_mean(), "\n";

        # Returns the geometric mean of the data.
    print 'Geometrischer Mittelwert: ', $stat->geometric_mean(), "\n";
    
        # Returns the sum of the data.
    print 'Summe: ', $stat->sum(), "\n";
    
        # Returns the variance of the data. Division by n-1 is used.
    print 'Varianz: ', $stat->variance(), "\n";
    
        # Returns the standard deviation of the data. Division by n-1 is used.
    print 'Standardabweichung: ', $stat->standard_deviation(), "\n";
    
        # Returns the minimum value of the data set.
    print 'Minimum: ' , $stat->min(), "\n";
    
        # Returns the index of the minimum value of the data set.
    print 'Index Minimum: ', $stat->mindex(), "\n";
    
        # Returns the maximum value of the data set.
    print 'Maximum: ', $stat->max(), "\n";
    
        # Returns the index of the maximum value of the data set.
    print 'Index Maximum: ', $stat->maxdex(), "\n";
    
        # Returns the sample range (max - min) of the data set.
    print 'Stichprobenbereich: ',  $stat->sample_range(), "\n";

   print "\n";
   print "Percentile/Quantile\n";

        # Percentile
        # skalaren Kontext erzwingen, da print Listenkontext verwendet
    print 'Wert 25% Percentil: ', scalar $stat->percentile(25), "\n";
    my ($value,$index) =  $stat->percentile(25);
    print "Wert 25% Percentil: $value, Index: $index \n";

        # Quantile
    print 'Q1: ', $stat->quantile(1) , "\n";
    print 'Q2: ', $stat->quantile(2) , "\n";
    print 'Q3: ', $stat->quantile(3) , "\n";
    print 'Q4: ', $stat->quantile(4) , "\n";

    print "\n";

    my $partitions = 4;
    print "Haeufigkeitsverteilung fuer $partitions Partitionen: \n";
    my $f = $stat->frequency_distribution_ref($partitions); 
    foreach my $partition ( sort {$a <=> $b} keys %$f ) {
      		print "Partition = $partition, Anzahl = $f->{$partition}\n";
    }

    print "\n";
    print "Regressionsanalyse: Methode der kleinsten Quadrate\n";

    my ($q,$m,$r,$rms) = $stat->least_squares_fit();

    print "Geradengleichung: y = $m * x + $q\n";
    print "Linearer Korrelationskoeffizient nach Pearson: $r\n";
	# Wurzel aus dem mittleren quadratischen Fehler
    print "Root Mean Square Error (RMSE): $rms\n";

}

Das Programm erzeugt folgende Ausgabe:

Anzahl Elemente: 16
Arithmetischer Mittelwert: 21.3125
Median: 21
Harmonischer Mittelwert: 19.6902742958359
Geometrischer Mittelwert: 20.5128887373903
Summe: 341
Varianz: 34.3625
Standardabweichung: 5.86195359927047
Minimum: 12
Index Minimum: 0
Maximum: 30
Index Maximum: 15
Stichprobenbereich: 18

Percentile/Quantile
Wert 25% Percentil: 16
Wert 25% Percentil: 16, Index: 3 
Q1: 16.75
Q2: 21
Q3: 26.25
Q4: 30

Haeufigkeitsverteilung fuer 4 Partitionen: 
Partition = 16.5, Anzahl = 4
Partition = 21, Anzahl = 4
Partition = 25.5, Anzahl = 3
Partition = 30, Anzahl = 5

Regressionsanalyse: Methode der kleinsten Quadrate
Geradengleichung: y = 1.22794117647059 * x + 10.875
Linearer Korrelationskoeffizient nach Pearson: 0.997307339918983
Root Mean Square Error (RMSE): 0.41623752410203

Siehe auch

Weitere Posts