# COMS 4771 SU19 HW1

COMS 4771 SU19 HW1Due: Sat Jun 22, 2019 at 11:59pmThis homework is to be done individually. No late homeworks are allowed. To receive credit, atypesetted copy of the homework pdf must be uploaded to Gradescope by the due date. You mustshow your work to receive full credit. Discussing possible solutions for homework questions isencouraged on piazza and with your peers, but the wrieup must be your own. You should cite allresources (including online material, books, articles, help taken from specific individuals, etc.) youused to complete your work.1[Maximum Likelihood Estimation]Here we shall examine some properties of MaximumLikelihood Estimation (MLE).(i) Consider the densityp(x|θ) :={θe−θxifx≥00otherwise, for someθ >0. Suppose thatnsamplesx1,…,xnare drawn i.i.d. fromp(x|θ). What is the MLE ofθgiven thesamples?(ii) Consider the densityp(x|θ) :={1/θif 0≤x≤θ0otherwise, for someθ >0. Suppose thatnsamplesx1,…,xnare drawn i.i.d. fromp(x|θ). What is the MLE ofθgiven thesamples?(iii) Recall the Gaussian density:p(x|μ,σ2) :=1√2πσ2exp(−(x−μ)22σ2), for some mean pa-rameterμ∈Rand variance parameterσ2>0. Suppose thatnsamplesx1,…,xnaredrawn i.i.d. fromp(x|μ,σ2). Show that ifμis unknown, then the MLEσ2MLisnotanunbiased estimator of the varianceσ2for all sample sizesn. What simple modificationcan we make to the estimate to make it unbiased?(iv) Show that for the MLEθMLof a parameterθ∈Rdand any known injective functiong:Rd→Rk, the MLE ofg(θ)isg(θML). From this result infer the MLE for thestandard deviation (σ) in the same setting as in Part (iii).2[Evaluating Classifiers]Consider the following decision ruleftfor a two-category probleminR. Given an inputx∈Rdecide categoryy1, ifx > t; otherwise decide categoryy2(i) What is the error rate for this rule, that is, what isP[ft(x)6=y]?(ii) Show that at for the optimally selected threshold valuet(i.e., the one which gives mini-mum error rate), it must be the case thatP(X=t|Y=y1)P(Y=y1) =P(X=t|Y=y2)P(Y=y2).1(iii) Assume that the underlying population distribution has equal class priors (i.e.,P[Y=y1] =P[Y=y2]), and the individual class conditionals (i.e.,P[X|Y=y1]andP[X|Y=y1]) are distributed as Gaussians. Give an example setting of the class con-ditionals (i.e., give an example parameter settings for the Gaussians) such that for somethreshold valuet, the ruleftachieves the Bayes error rate; and similarly, give an exam-ple setting of the class conditionals such that for no threshold valuet, the ruleftachievesthe Bayes error rate.3[Finding (local) minima of generic functions]Finding extreme values of functions in aclosed form is often not possible. Here we will develop a generic algorithm to find the ex-tremal values of a function. Consider a smooth functionf:R→R.(i) Recall that Taylor’s Remainder Theorem states:For anya,b∈R, existsz∈[a,b], such thatf(b) =f(a)+f′(a)(b−a)+12f′′(z)(b−a)2.Assuming that there existsL≥0such that for alla,b∈R,|f′(a)−f′(b)|≤L|a−b|,prove the following statement:For anyx∈R, there exists someη >0, such that if ̄x:=x−ηf′(x), thenf( ̄x)≤f(x),with equality if and only iff′(x) = 0.(Hint: first show that the assumption implies thatfhas bounded second derivative, i.e.,f′′(z)≤L(for allz); then apply the remainder theorem and analyze the differencef(x)−f( ̄x)).(ii) Part (i) gives us a generic recepie to find a new value ̄xfrom an old valuexsuch thatf( ̄x)≤f(x). Using this result, develop an iterative algorithm to find a local minimumstarting from an initial valuex0.(iii) Use your algorithm to find the minimum of the functionf(x) := (x−3)2+ex. Youshould code your algorithm in a scientific programming language like Matlab to find thesolution.(no code submission required, only include the result)4[A comparative study of classification performance of handwritten digits]Download thedatafile hw1data.mat. This datafile contains 10,000 images (each of size 28×28 pixels = 784dimensions) of handwritten digits along with the associated labels. Each handwritten digitbelongs to one of the 10 possible categories{0,1,…,9}. There are two variables in thisdatafile: (i) VariableXis a 10,000×784 data matrix, where each row is a sample image of ahandwritten digit. (ii) VariableYis the 10,000×1 label vector where theithentry indicates thelabel of theithsample image inX.Special note for those who are not using Matlab:Python users can usescipyto read inthe mat file, R users can useR.matlabpackage to read in the mat file, Julia users can useJuliaIO/MAT.jl. Octave users should be able to load the file directly.2

Assignment status: Already Solved By Our Experts

(USA, AUS, UK & CA  PhD. Writers)