## DESCRIPTION
## Statistics 305 
## ENDDESCRIPTION

## KEYWORDS('Statistical inference; Hypothesis testing, Poisson
## distribution, Neyman--Pearson test; finding a likelihood ratio for a Poisson
## mean given a simple alternative hypothesis, identifying the behaviour of the
## likelihood ratio as the sample total increases, finding the criterion for
## rejection in a one-sided hypothesis test for the mean of a Poisson
## distribution, finding the P-value of the test, finding the power of the test
## for a given alternative value of the mean.')

## DBsubject('Statistics')
## DBchapter('Hypothesis testing')
## DBsection('Neyman-Pearson tests')
## level(5)
## Date('2014/06/12')
## Author('Andy Leung')
## Institution('University of British Columbia')
## TitleText1('No Text')
## EditionText1('?')
## AuthorText1('?')
## Section1('?')
## Problem1('?')


########################################################################

DOCUMENT();      

loadMacros(
  "PGstandard.pl",
  "PGchoicemacros.pl",
  "parserRadioButtons.pl",
  "MathObjects.pl",
  "parserMultiAnswer.pl",
  "unionTables.pl",
  "RserveClient.pl",
  "PGcourse.pl"
);

# Print problem number and point value (weight) for the problem
TEXT(beginproblem());


##############################################################
#  Question and R Setup
##############################################################
Context->("Numeric");

SRAND($psvn);
$setseed = "set.seed($psvn); 1";

# Do this in R
rserve_start();

# Randomly generate parameters: n, y, ysum, and mu
rserve_eval($setseed);
@n = rserve_eval('n <- round(runif(1, min=30, max=41),0); n');
$n = join ", ", @n;

@ysum = rserve_eval('y <- rpois(n, lambda=4); ysum <- sum(y); ysum');
$ysum = join ", ", @ysum;

@mu1 = rserve_eval('mu1 <- round(runif(1, min=3.9, max=4.9),1); mu1');
$mu1 = join ", ", @mu1;

# Answers
@ans_a = rserve_eval('ans_a <- round( log(exp(-n*(3 - 4))*(3/4)^ysum), 2); ans_a');
$ans_a = join ", ", @ans_a;

@ans_c_num = rserve_eval('ans_c <- qpois(0.05, lambda=5*n)- 1; ans_c');
$ans_c_num = join ", ", @ans_c_num;

@ans_d = rserve_eval('ans_d <- round( ppois(ysum, lambda = 5*n),3); ans_d');
$ans_d = join ", ", @ans_d;

@ans_e = rserve_eval('ans_e <- round( ppois(ans_c, lambda = n*mu1), 2); ans_e');
$ans_e = join ", ", @ans_e;

rserve_finish();


# Multiple choice question for part b
$qu_b = "Consider in general the likelihood ratio here when \(\mu _{1}<\mu _{0}\).
As the sum of the observations increases, the likelihood ratio";
$ans_b = "decreases.";
$fake1_b = "increases.";
$fake2_b = "can both increase and decrease.";

$mc_b = new_multiple_choice();
$mc_b->qa(
  $qu_b,
  $ans_b
);
$mc_b->extra(
$fake1_b, $fake2_b
);


# Multiple choice question for part c
$qu_c = "It has been hypothesised by the instructor of the course that there
are an average of five typing errors in each edition of $BITALIC The Guard $EITALIC
The editor of the newspaper has objected, claiming that the mean is less
than five per edition. To perform the appropriate hypothesis test here at a
significance level no greater than 5%, you would (complete one of the
following statements):";
$ans_c = "reject the null hypothesis when the total number of observed errors is
no greater than \(k\),";
$fake_c = "reject the null hypothesis when the total number of observed errors is
greater than \(k\),";

$mc_c = new_multiple_choice();
$mc_c->qa(
  $qu_c,
  $ans_c
);
$mc_c->extra(
$fake_c
);

##############################################################




##############################################################
#  Question in Text
##############################################################
Context()->texStrings;
BEGIN_TEXT


The number of typing errors in an edition of a newspaper could reasonably be
modelled by a Poisson distribution with some mean \(\mu\). Each of the \($n\)
students on a journalism course are allocated at random a past edition of
the $BITALIC The Guard $EITALIC newspaper and asked to find the number of typing errors it contains. 
The total number of errors found by the students is \($ysum\).
$BR
$BR

$BBOLD Part a) $EBOLD
In testing the null hypothesis \(H_{0}:\mu =\mu _{0}\) against the
alternative \(H_{a}:\mu =\mu _{1}\), find the log-likelihood ratio. Evaluate the
natural logarithm of this ratio for the case \(\mu _{0}=4\) and \(\mu _{1}=3\), giving your answer to two
decimal places.
$BR
\{ ans_rule(12) \} 
$BR
$BR

$BBOLD Part b) $EBOLD
\{ $mc_b->print_q() \}
$BR
\{ $mc_b->print_a() \}
$BR
$BR

$BBOLD Part c) $EBOLD
\{ $mc_c->print_q() \}
$BR
\{ $mc_c->print_a() \}
$BR
where \(k\) is
$BR
\{ ans_rule(12) \} 
$BR
$BR

$BBOLD Part d) $EBOLD
For the same test, what is the P-value? Give your answer to three decimal places. 
$BR
\{ ans_rule(12) \} 
$BR
$BR

$BBOLD Part e) $EBOLD
Suppose the true mean number of errors per edition of $BITALIC The Guard $EITALIC
is \($mu1\). Find the power of the test above, giving your answer to
two decimal places.
$BR
\{ ans_rule(12) \} 
$BR
$BR

END_TEXT
##############################################################



##############################################################
## Show answer
##############################################################
$showPartialCorrectAnswers = 1;
ANS( num_cmp($ans_a, tol=> 0.01) );
ANS( radio_cmp($mc_b->correct_ans()) );
ANS( radio_cmp($mc_c->correct_ans()) );
ANS( num_cmp($ans_c_num, tol=> 0.01) );
ANS( num_cmp($ans_d, tol=> 0.001) );
ANS( num_cmp($ans_e, tol=> 0.01) );
##############################################################



##############################################################
#  Solution
##############################################################
Context()->texStrings;
Context()->{format}{number} = "%.2f";
BEGIN_SOLUTION
$BR
$BBOLD Part a) $EBOLD
Neyman-Pearson theory suggests we look for a test based
on the likelihood ratio, 
\begin{align*}
\frac{L\left( \mu _{1};y_{1},\dots ,y_{n}\right) }{L\left( \mu
_{0};y_{1},\dots ,y_{n}\right) },
\end{align*}
and reject the null if the above is too large, i.e., greater than some
value \(c\) say. In this case the likelihood function is 
\begin{align*}
L\left( \mu ;y_{1},\dots ,y_{n}\right) =\frac{e^{-n\mu }\mu ^{\sum_{i}y_{i}}%
}{\prod_{i}y_{i}!},
\end{align*}
and so 
\begin{align*}
\frac{L\left( \mu _{1};y_{1},\dots ,y_{n}\right) }{L\left( \mu
_{0};y_{1},\dots ,y_{n}\right) } &=e^{-n\left( \mu _{1}-\mu _{0}\right)
}\left( \frac{\mu _{1}}{\mu _{0}}\right)^{\sum_{i}y_{i}} \\
\log\left( \frac{L\left( \mu _{1};y_{1},\dots ,y_{n}\right) }{L\left( \mu
_{0};y_{1},\dots ,y_{n}\right) } \right) &=  -n\left( \mu _{1}-\mu _{0}\right) + \sum_{i}y_{i} \log \left( \frac{\mu _{1}}{\mu _{0}}\right)
\end{align*}
In the case that \(\mu _{0}=4\), \(\mu _{1}=3\), and \(\sum_{i}y_{i}=$ysum\), the natural logarithm of this is \($ans_a\). 
$BR
$BR

$BBOLD Part b) $EBOLD
In the case that \(\mu _{1}<\mu _{0}\), the likelihood ratio
decreases as the sum of the observations increases.
$BR
$BR

$BBOLD Part c) $EBOLD
The likelihood ratio above is greater than a constant \(c\)
when
\begin{align*}
-n\left( \mu _{1}-\mu _{0}\right) +\sum_{i}y_{i}\log \left( \frac{\mu _{1}}{
\mu _{0}}\right) >\log \left( c\right) ,
\end{align*}
which on re-arranging (and note that \(log( \mu_{1}/\mu_{0}) <0\) gives the rejection region as being 
\begin{align*}
\sum_{i}y_{i}<\frac{\log \left( c\right) +n\left( \mu _{1}-\mu _{0}\right) }{
\log \left( \mu _{1}\right) -\log \left( \mu _{0}\right) }.
\end{align*}
This is not hard to implement, since using the additive property of
the Poisson distribution we know that \(\sum_{i}y_{i}\sim Po\left( n\mu
_{0}\right)\) under \(H_{0}\) and there is no necessity to
find \(c\) explicitly. We seek the value \(k\) such that
if \(Y\sim Po\left( 5n\right)\), \(P\left( Y\leq k\right) \leq 0.05.\) 
Using R, the required constant is found via
$BR
$BITALIC
k <- qpois(0.05, lambda=5*$n) - 1
$EITALIC
$BR
$BR


$BBOLD Part d) $EBOLD
We require the probability that the total number of errors
is no greater than \($ysum\), given that the mean number of
errors per issue is \(5\). Via R
$BR
$BITALIC
ppois($ysum, lambda = 5*$n)
$EITALIC
$BR
$BR

$BBOLD Part e) $EBOLD
The power of the test is the probability we reject the null
hypothesis given the true value of the mean as being \($mu1\). From R, this is
$BR
$BITALIC
ppois(k, lambda = $mu1*$n)
$EITALIC



END_SOLUTION
##############################################################



ENDDOCUMENT(); 

Perl Online Compiler

Write, Run & Share Perl code online using OneCompiler's Perl online compiler for free. It's one of the robust, feature-rich online compilers for Perl language, running on the latest version 5.22.1. Getting started with the OneCompiler's Perl compiler is simple and pretty fast. The editor shows sample boilerplate code when you choose language as Perl and start coding.

Taking inputs (stdin)

OneCompiler's Perl online editor supports stdin and users can give inputs to programs using the STDIN textbox under the I/O tab. Following is a sample Perl program which takes name as input and prints hello message with your name.

my $name = <STDIN>;             
print "Hello $name.\n";          

About Perl

Perl(Practical Extraction and Report Language) is especially desined for text processing by Larry Wall.

Key features

  • Cross-platform
  • Efficient for mission critical applications.
  • Open-source
  • Supports both procedural and object-oriented programming.
  • Perl interpreter is embeddable with other systems.
  • Loosely typed language

Syntax help

Data types

There is no need to specify the type of the data in Perl as it is loosely typed language.

TypeDescriptionUsage
ScalarScalar is either a number or a string or an address of a variable(reference)$var
ArraysArray is an ordered list of scalars, you can access arrays with indexes which starts from 0@arr = (1,2,3)
HashHash is an unordered set of key/value pairs%ul = (1,'foo', 2, 'bar)

Variables

In Perl, there is no need to explicitly declare variables to reserve memory space. When you assign a value to a variable, declaration happens automatically.

$var-name =value; #scalar-variable
@arr-name = (values); #Array-variables
%hashes = (key-value pairs); # Hash-variables 

Loops

1. If family:

If, If-else, Nested-Ifs are used when you want to perform a certain set of operations based on conditional expressions.

If

if(conditional-expression){    
//code    
} 

If-else

if(conditional-expression){  
//code if condition is true  
}else{  
//code if condition is false  
} 

Nested-If-else

if(condition-expression1){  
//code if above condition is true  
}else if(condition-expression2){  
//code if above condition is true  
}  
else if(condition-expression3){  
//code if above condition is true  
}  
...  
else{  
//code if all the conditions are false  
}  

2. Switch:

There is no case or switch in perl, instead we use given and when to check the code for multiple conditions.

given(expr){    
when (value1)  
{//code if above value is matched;}    
when (value2)  
{//code if above value is matched;}   
when (value3)  
{//code if above value is matched;}  
default  
{//code if all the above cases are not matched.}     
} 

3. For:

For loop is used to iterate a set of statements based on a condition.

for(Initialization; Condition; Increment/decrement){  
  // code  
} 

4. While:

While is also used to iterate a set of statements based on a condition. Usually while is preferred when number of iterations are not known in advance.

while(condition) {  
 // code 
}  

5. Do-While:

Do-while is also used to iterate a set of statements based on a condition. It is mostly used when you need to execute the statements atleast once.

do {
  // code 
} while (condition); 

Sub-routines

Sub-routines are similar to functions which contains set of statements. Usually sub-routines are written when multiple calls are required to same set of statements which increases re-usuability and modularity.

How to define a sub-routine

sub subroutine_name 
{
	# set of Statements
}

How to call a sub-routine

subroutine_name();
subroutine_name(arguments-list); // if arguments are present