Read out

Shrinking and Regularizing Finite Mixture Models

Pro­ject staff

Leader: Sylvia Frühwirth-Schnat­ter

Scien­ti­fic staff: Ger­traud Malsin­er­-Walli

De­scrip­tion

Often in data the pres­ence of groups of ob­ser­va­tions with dif­fer­ent char­ac­ter­ist­ics is sus­pec­ted. However, the group mem­ber­ships are either not avail­able or not ob­serv­able. Such a situ­ation re­quires the ap­plic­a­tion of a stat­ist­ical method in the data ana­lysis which al­lows to ex­pli­citly ac­count for the pres­ence of these lat­ent groups and which aims at de­termin­ing the group sizes as well as the group char­ac­ter­ist­ics. The stand­ard mod­el-­based tool in stat­ist­ical ana­lysis for this prob­lem is the fin­ite mix­ture model.

Fin­ite mix­ture mod­els have been used for more than 100 years and rep­res­ent a flex­ible and gen­er­ally ap­plic­able stat­ist­ical tool with many ex­ten­sions and vari­ations already pro­posed. However, some prob­lems re­main still un­re­solved such as the cor­rect se­lec­tion of vari­ables to in­clude in the ana­lysis which drive the group struc­ture and the choice of a suit­able model which avoids over­fit­ting the het­ero­gen­eity in order to en­sure easy in­ter­pretab­il­ity and pre­cise es­tim­a­tion of para­met­ers.

In this re­search pro­ject we will aim at im­prov­ing the ap­plic­a­tion of fin­ite mix­ture mod­els by provid­ing tools based on shrink­age and reg­u­lar­iz­a­tion which al­low se­lect­ing a suit­able model where rel­ev­ant vari­ables and ir­rel­ev­ant vari­ables are auto­mat­ic­ally dis­tin­guished and the para­met­ers are chosen in a way to avoid over­fit­ting het­ero­gen­eity. The­or­et­ical res­ults will be com­ple­men­ted by ap­plic­a­tions and soft­ware im­ple­ment­a­tions as ad­d-on pack­age for the open-­source soft­ware R, an en­vir­on­ment for stat­ist­ical com­put­ing and graph­ics (http://www.R-­pro­ject.org).

The avail­ab­il­ity of im­proved stat­ist­ical meth­ods in com­bin­a­tion with soft­ware im­ple­ment­a­tions al­lows for a bet­ter ana­lysis and in­creased un­der­stand­ing of data in em­pir­ical quant­it­at­ive re­search. Due to the wide ap­plic­ab­il­ity of fin­ite mix­ture mod­els, for example in astro­nomy, bio­logy, eco­nomic, mar­ket­ing, medi­cine and psy­cho­logy, res­ults of this re­search pro­ject are as­sumed to have an im­pact also on other areas of re­search, by al­low­ing for im­proved in­sights into lat­ent group struc­tures which are present in the data.  

Pub­lic­a­tions

  • Malsin­er­-Walli G., Frühwirth-Schnat­ter S., Grün B. Identi­fy­ing mix­tures of mix­tures us­ing Bayesian es­tim­a­tion, in: Journal of Com­pu­ta­tional and Graph­ical Stat­ist­ics, 2016.

  • Malsin­er­-Walli G., Frühwirth-Schnat­ter S., Grün B. Mod­el-­based clus­ter­ing based on sparse fin­ite Gaus­sian mix­tures, in: Stat­ist­ics and Com­put­ing, Volume 26, Page(s) 303-324, 2016.

  • Grün B., Malsin­er­-Walli G. Dis­cus­sion of "How to Find an Ap­pro­pri­ate Clus­ter­ing for Mixed-­Type Vari­ables with Ap­plic­a­tion to So­cio-E­co­nomic Strat­i­fic­a­tion" by Hen­nig and Liao, in: Journal of the Royal Stat­ist­ical So­ci­ety: Ser­ies C (Ap­plied Stat­ist­ics), Volume 62, Num­ber 3, Page(s) 350-351, 2013.   

Fun­ded by:

FWF (Aus­trian Science Fund) 

FWF pro­ject num­ber: P28740

Dur­a­tion: 01.11.2016 - 31.10.2019   

http://www.fwf.ac.at