ANC/DTC SEMINAR: Peter Richtarik, School of Maths
Semistochastic gradient methods
What 


When 
Nov 04, 2014 from 11:00 AM to 12:00 PM 
Where  Informatics Forum Room 4.31/4.33 
Add event to calendar 
vCal iCal 
We study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an εaccurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is O((κ/n)log(1/ε)), where κ is the condition number. This is achieved by running the method for O(log(1/ε)) epochs, with a single gradient evaluation and O(κ) stochastic gradient evaluations in each. The SVRG method of Johnson and Zhang arises as a special case. If our method is limited to a single epoch only, it needs to evaluate at most O((κ/ε)log(1/ε)) stochastic gradients. In contrast, SVRG requires O(κ/ε2) stochastic gradients. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find an 10−6accurate solution for a problem with n=109 and κ=103.
Peter Richtarik, http://www.maths.ed.ac.uk/~richtarik/