Підтримка
www.wikidata.uk-ua.nina.az
U teoriyi jmovirnostej ta mashinnomu navchanni zadacha bagatorukogo bandita yaku inodi nazivayut zadacheyu K abo N rukogo bandita ce zadacha rozpodilu obmezhenoyi mnozhini resursiv mizh konkuruyuchimi alternativami takim chinom shob maksimizuvati ochikuvanij vigrash koli vlastivosti kozhnogo variantu vidomi lishe chastkovo na moment uhvalennya rishennya i mozhut stati krashe zrozumilimi z plinom chasu abo shlyahom rozpodilu resursiv dlya realizaciyi variantu Ce klasichna zadacha navchannya z pidkriplennyam yaka ye prikladom dilemi balansu mizh ekspluataciyeyu ta rozvidkoyu Nazva pohodit vid uyavnogo gravcya na nizci igrovih avtomativ yih chasto nazivayut odnorukimi banditami yakij maye virishiti na yakih avtomatah varto grati skilki raziv varto grati na kozhnomu avtomati ta v yakomu poryadku slid grati i chi prodovzhuvati z potochnim avtomatom abo sprobuvati inshij Problema bagatorukih banditiv takozh pidpadaye pid shiroku kategoriyu en Ryad igrovih avtomativ u Las Vegasi U cij zadachi kozhen avtomat zabezpechuye vipadkovu vinagorodu vidpovidno do rozpodilu jmovirnostej yakij vlastivij dlya cogo avtomatu Meta gravcya maksimizuvati sumu vinagorod yaku vin otrimuye vidpovidno u vipadku zadiyannya obranoyi poslidovnosti vazheliv Principova problema z yakoyu stikayetsya gravec na kozhnomu kroci polyagaye v tomu sho vin maye zrobiti vibir mizh ekspluataciyeyu avtomata yakij maye najvishij ochikuvanij pributok i rozvidkoyu shob otrimati bilshe informaciyi pro ochikuvani vigrashi inshih avtomativ Pitannya kompromisu mizh rozvidkoyu ta ekspluataciyeyu takozh vinikaye u mashinnomu navchanni Na praktici bagatoruki banditi vikoristovuvalis dlya modelyuvannya takih zadach yak upravlinnya doslidnickimi proektami u velikij organizaciyi yak naukovij fond abo farmacevtichna kompaniya U rannih formulyuvannyah zadachi gravec pochinaye vzayemodiyu bez pochatkovih znan pro avtomati Gerbert Robbins u 1952 roci usvidomlyuyuchi vazhlivist ciyeyi zadachi pobuduvav zbizhni strategiyi vidboru sukupnosti v deyakih aspektah poslidovnogo planuvannya eksperimentiv Teorema pro indeks Gittinsa vpershe opublikovana en daye optimalnu strategiyu maksimizaciyi ochikuvanoyi vinagorodi z urahuvannyam koeficiyenta znecinyuvannya Empirichna motivaciyaYak potribno rozpodiliti nayavnij byudzhet mizh cimi doslidnickimi institutami shob maksimizuvati rezultati Problema bagatorukogo bandita modelyuye agenta yakij odnochasno namagayetsya zdobuti novi znannya sho nazivayetsya rozvidkoyu ta optimizuvati svoyi rishennya na osnovi isnuyuchih znan nazivayetsya ekspluataciyeyu Agent namagayetsya zbalansuvati ci konkuruyuchi zavdannya shob maksimizuvati zagalnu prinesenu nimi cinnist za rozglyanutij promizhok chasu Isnuye bagato praktichnih zastosuvan modeli bandita napriklad klinichni viprobuvannya yaki vivchayut efekti riznih eksperimentalnih metodiv likuvannya dlya minimizaciyi vtrat sered paciyentiv zusillya po en dlya minimizaciyi zatrimok u merezhi stvorennya finansovogo portfelya U cih praktichnih prikladah zadacha polyagaye u zbalansuvanni maksimizaciyi vinagorodi na osnovi vzhe nabutih znan iz sprobami novih dij dlya podalshogo zbilshennya znan Cya dilema vidoma yak kompromis mizh ekspluataciyeyu ta doslidzhennyam v mashinnomu navchanni Model takozh vikoristovuvalasya dlya upravlinnya dinamichnim rozpodilom resursiv dlya riznih proektiv vidpovidayuchi na pitannya nad yakim proektom varto pracyuvati zvazhayuchi na neviznachenist shodo skladnosti ta vigidnosti kozhnoyi mozhlivosti Spochatku zadacha doslidzhuvalas naukovcyami soyuznikiv u Drugij svitovij vijni ta viyavilasya nastilki skladnoyu sho za zhartivlivimi slovami en proponuvalosya skinuti yiyi nad Nimechchinoyu shob nimecki vcheni takozh mogli vitrachati na neyi svij chas Variant zadachi yakij nabuv shirokogo viznannya u suchasnomu formulyuvanni buv zaproponovanij Gerbertom Robbinsom u 1952 roci Model bagatorukogo banditaBagatorukogo bandita korotko bandit abo BRB mozhna rozglyadati yak sukupnist rozpodiliv dijsnih chisel B R 1 R K displaystyle B R 1 dots R K kozhen rozpodil pov yazanij z vinagorodami yaki otrimuyutsya natiskannyam odnogo iz K N displaystyle K in mathbb N vazheliv Nehaj m 1 m K displaystyle mu 1 dots mu K budut serednimi znachennyami yaki pov yazani z cimi rozpodilami vinagorod Gravec iterativno graye po odnomu vazhelyu za raund i otrimuye vidpovidnu vinagorodu Metoyu ye maksimizaciya sumi zibranih vinagorod Gorizontom H displaystyle H nazivayetsya kilkist raundiv yaki zalishilos zigrati Zadacha pro bagatorukih banditiv formalno ekvivalentna procesu Markova z odnim stanom Smutok r displaystyle rho pislya T displaystyle T raundiv viznachayetsya yak ochikuvana riznicya mizh sumoyu vinagorodi pov yazanoyu z optimalnoyu strategiyeyu ta sumoyu otrimanih vinagorod r T m t 1 T r t displaystyle rho T mu sum t 1 T widehat r t de m max k m k displaystyle mu max k mu k maksimalne serednye znachennya vinagorodi i r t displaystyle widehat r t vinagoroda u raundi t Strategiya nulovogo smutku ce strategiya serednij smutok yakoyi za raund r T displaystyle rho T pryamuye do nulya z imovirnistyu 1 koli kilkist zigranih raundiv pryamuye do neskinchennosti Intuyitivno zrozumilo sho strategiyi nulovogo smutku garantovano zbigayutsya do ne obov yazkovo yedinoyi optimalnoyi strategiyi yaksho zigrano dostatno raundiv Variaciyi zadachiZagalnoprijnyatim formulyuvannyam ye dvijkovij bagatorukij bandit abo bagatorukij bandit Bernulli yakij vidaye vinagorodu odinicya z jmovirnistyu p displaystyle p a v protilezhnomu vipadku vinagoroda dorivnyuye nulyu V inshomu formulyuvanni zadachi pro bagatorukogo bandita kozhna ruka predstavlyaye nezalezhnu mashinu Markova Kozhnogo razu koli grayetsya pevna ruka stan ciyeyi mashini perehodit u novij stan obranij vidpovidno do rozpodilu jmovirnostej staniv mashini Markova Tobto kozhnogo razu vinagoroda zalezhit vid potochnogo stanu mashini Dlya uzagalnennya yake nazivayut zadacheyu nespokijnogo bandita stani ruk yaki ne obirayutsya gravcem takozh mozhut evolyucionuvati z chasom Takozh rozglyadayutsya sistemi v yakih kilkist variantiv shodo togo yaku ruku zigrati z chasom zbilshuyetsya Doslidniki informatiki vivchali bagatorukih banditiv pri najgirshih pripushennyah otrimuyuchi algoritmi sho dozvolyayut minimizuvati smutok yak v kincevomu tak i v neskinchennomu asimptotichnomu chasovih gorizontah yak u vipadku stohastichnogo tak i u vipadku nestohastichnogo vigrashu StrategiyiOsnovnim prorivom bula pobudova optimalnoyi generalnoyi sukupnosti strategij abo politik yaki mayut rivnomirno maksimalnij koeficiyent zbizhnosti do sukupnosti z najvishim serednim znachennyam u roboti yaka opisana dali Optimalni rishennya U statti Asimptotichno efektivni pravila adaptivnogo rozpodilu Laj i Robbins nastupni statti Robbinsa ta jogo koleg vidsilayut do stati Robbinsa 1952 roku pobuduvali zbizhnij vidbir strategij sho mayut najshvidshij riven zbizhnosti do generalnoyi sukupnosti z najvishim serednim znachennyam u vipadku koli rozpodili vinagorod ye odnoparametrichnim eksponencialnim simejstvom Potim u roboti en ta Robbinsa za sproshenoyi strategiyi bulo nadano dovedennya u vipadku normalnih generalnih sukupnostej z vidomimi dispersiyami Podalshe vagome prosuvannya zdijsnili Burnetas i Katehakis u roboti Optimalni adaptivni politiki dlya zadachi poslidovnogo rozpodilu v yakij buli pobudovani strategiyi na osnovi indeksu z rivnomirno maksimalnim koeficiyentom zbizhnosti za bilsh zagalnih umov sho vklyuchayut vipadok koli rozpodili rezultativ vid kozhnoyi sukupnosti zalezhat vid vektoru nevidomih parametriv Burnetas i Katehakis 1996 takozh nadali yavne rishennya dlya vazhlivogo vipadku koli rozpodili rezultativ ye dovilnimi tobto neparametrichnimi diskretnimi odnovimirnimi rozpodilami Piznishe v Optimalnij adaptivnij politici dlya Markovskih procesiv uhvalyuvannya rishen Burnetas i Katehakis doslidili nabagato bilshu model procesiv uhvalyuvannya rishen Markova z chastkovoyu informaciyeyu koli zakon perehodu ta abo ochikuvana vinagoroda za odin period mozhut zalezhati vid nevidomih parametriv U cij roboti bula pobudovana yavna forma dlya klasu adaptivnih strategij yaki mayut rivnomirno maksimalnu zbizhnist dlya zagalnoyi ochikuvanoyi vinagorodi z kincevim gorizontom za neobhidnih pripushen shodo skinchennosti prostoru dij ta staniv j nezvidnosti pravila perehodu Golovnoyu osoblivistyu cih strategij ye te sho vibir dij u kozhnomu stani ta periodi chasu bazuyetsya na indeksah yaki ye zapisom pravoyi chastini ocinki serednoyi vinagorodi u rivnyannyah optimalnosti Yih nazvali optimistichnim pidhodom u robotah Tevari ta Bartletta Ortnera Filippi Kappe ta Gariv ye a takozh Hondi ta Takemuri Koli optimalni rishennya dlya zadach pro odnorukih banditiv vikoristovuyutsya dlya ocinki viboru yakij roblyat tvarini cherez aktivnist nejroniv u migdalini ta smugastomu tili koduyetsya znachennya yake vivoditsya z cih pravil i mozhe vikoristovuvatisya dlya dekoduvannya koli tvarini roblyat vibir mizh doslidzhennyam ta ekspluataciyeyu Bilshe togo optimalni strategiyi krashe prognozuyut povedinku tvarin pri vibori nizh alternativni strategiyi opisani nizhche Ce svidchit pro te sho optimalni rishennya dlya zadach bagatorukih banditiv ye biologichno pravdopodibnimi nezvazhayuchi na vimogi do obchislen Nablizheni rozv yazki Isnuye bagato strategij yaki zabezpechuyut nablizhene rozv yazannya ZBB i yih mozhna vidnesti do chotiroh shirokih kategorij yaki dokladno opisani nizhche Napivrivnomirni strategiyi Napivrivnomirni strategiyi buli najdavnishimi i najprostishimi strategiyami yaki dayut nablizhenij rozv yazok ZBB Vsi ci strategiyi mayut spilnu zhadibnu povedinku koli najkrashij vazhil na osnovi poperednih sposterezhen zavzhdi vikoristovuyetsya za vinyatkom vipadkiv koli vzhivayetsya rivnomirno vipadkova diya Epsilon zhadibna strategiya Najkrashij vazhil vibirayetsya z jmovirnistyu 1 e displaystyle 1 varepsilon i vazhil vibirayetsya vipadkovo z rivnoyu jmovirnistyu ϵ displaystyle epsilon Tipovim znachennyam parametra mozhe buti e 0 1 displaystyle varepsilon 0 1 ale vin mozhe silno vidriznyatisya v zalezhnosti vid obstavin ta cilej Epsilon persha strategiya Pislya chistoyi fazi rozvidki jde faza chistoyi ekspluataciyi Yaksho N displaystyle N zagalna kilkist viprobuvan to faza rozvidki skladayetsya z e N displaystyle varepsilon N viprobuvan i faza ekspluataciyi 1 e N displaystyle 1 varepsilon N viprobuvan Pid chas fazi rozvidki vipadkovim chinom vibirayetsya vazhil z rivnoyu jmovirnistyu na etapi ekspluataciyi zavzhdi vibirayetsya najkrashij vazhil Strategiya epsilon zmenshennya Podibna do epsilon zhadibnoyi strategiyi za vinyatkom togo sho znachennya e displaystyle varepsilon zmenshuyetsya z prohodzhennya eksperimentu sho prizvodit do nadzvichajno doslidnickoyi povedinki na pochatku ta nadzvichajno ekspluatatorskoyi povedinki na finishi Adaptivna epsilon zhadibna strategiya zasnovana na cinnisnih vidminnostyah VDBE Podibna do strategiyi zmenshennya epsilonu za vinyatkom togo sho epsilon zmenshuyetsya vidpovidno do progresu navchannya zamist ruchnogo regulyuvannya Tokich 2010 Veliki kolivannya ocinok vartosti prizvodyat do visokogo epsilonu velika rozvidka nizka ekspluataciya nizki kolivannya do nizkogo epsilonu nizka rozvidka visoka ekspluataciya Podalshi polipshennya mozhut buti dosyagnuti shlyahom viboru dij zvazhenih na softmaksi u vipadku doslidnickih dij Tokic amp Palm 2011 Adaptivna epsilon zhadibna strategiya na osnovi bayesovih ansambliv Epsilon BMC Adaptivna strategiya adaptaciyi epsilonu dlya posilennya navchannya podibnogo do VBDE z monotonnimi garantiyami zbizhnosti U comu pidhodi parametr epsilon rozglyadayetsya yak ochikuvannya posteriornogo rozpodilu yakij ocinyuye zhadibnogo agenta yakij povnistyu doviryaye otrimanij vinagorodi ta agenta yakij navchayetsya odnomanitno yakij ne doviryaye otrimanij vinagorodi Dokladnishe div Gimelfarb ta in 2019 Kontekstualna epsilon zhadibna strategiya Podibna do epsilon zhadibnoyi strategiyi za vinyatkom togo sho znachennya e displaystyle varepsilon obchislyuyetsya z urahuvannyam situaciyi pri prohodzhenni eksperimentu sho dozvolyaye algoritmu brati do uvagi kontekst Vin bazuyetsya na dinamichnomu balansuvanni rozviki ekspluataciyi i mozhe adaptivno zbalansuvati dva aspekti virishuyuchi yaka situaciya ye najbilsh vidpovidnoyu dlya rozvidki chi ekspluataciyi sho prizvodit do nadzvichajno doslidnickoyi povedinki koli situaciya ne ye kritichnoyu i duzhe ekspluatatorskoyi povedinki u kritichnij situaciyi Div takozhIndeks Gittinsa potuzhna zagalna strategiya analizu zadach pro BB Zhadibnij algoritm en en PrimitkiAuer P Cesa Bianchi N Fischer P 2002 Finite time Analysis of the Multiarmed Bandit Problem Machine Learning 47 2 3 235 256 doi 10 1023 A 1013689704352 Katehakis M N Veinott A F 1987 The Multi Armed Bandit Problem Decomposition and Computation Mathematics of Operations Research 12 2 262 268 doi 10 1287 moor 12 2 262 1989 Multi armed bandit allocation indices Wiley Interscience Series in Systems and Optimization Chichester John Wiley amp Sons Ltd ISBN 978 0 471 92059 5 Fristedt Bert 1985 Bandit problems Sequential allocation of experiments Monographs on Statistics and Applied Probability London Chapman amp Hall ISBN 978 0 412 24810 8 Weber Richard 1992 On the Gittins index for multiarmed bandits 2 4 1024 1033 doi 10 1214 aoap 1177005588 JSTOR 2959678 Robbins H 1952 Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society 58 5 527 535 doi 10 1090 S0002 9904 1952 09620 8 1979 Bandit Processes and Dynamic Allocation Indices Journal of the Royal Statistical Society Series B Methodological 41 2 148 177 doi 10 1111 j 2517 6161 1979 tb01068 x JSTOR 2985029 Press William H 2009 Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research Proceedings of the National Academy of Sciences 106 52 22387 22392 Bibcode 2009PNAS 10622387P doi 10 1073 pnas 0912378106 PMC 2793317 PMID 20018711 Press 1986 Brochu Eric Hoffman Matthew W de Freitas Nando September 2010 Portfolio Allocation for Bayesian Optimization arXiv 1009 5419 Bibcode 2010arXiv1009 5419B Shen Weiwei Wang Jun Jiang Yu Gang Zha Hongyuan 2015 Proceedings of International Joint Conferences on Artificial Intelligence IJCAI2015 arhiv originalu za 4 grudnya 2021 procitovano 4 grudnya 2021 Farias Vivek F Ritesh Madan 2011 The irrevocable multiarmed bandit problem 59 2 383 399 doi 10 1287 opre 1100 0891 1979 Discussion of Dr Gittins paper Series B 41 2 148 177 doi 10 1111 j 2517 6161 1979 tb01069 x Vermorel Joannes Mohri Mehryar 2005 PDF In European Conference on Machine Learning Springer s 437 448 arhiv originalu PDF za 21 listopada 2008 procitovano 4 grudnya 2021 1988 Restless bandits Activity allocation in a changing world Journal of Applied Probability 25A 287 298 doi 10 2307 3214163 JSTOR 3214163 MR 0974588 1981 Arm acquiring bandits Annals of Probability 9 2 284 292 doi 10 1214 aop 1176994469 Auer P Cesa Bianchi N Freund Y Schapire R E 2002 The Nonstochastic Multiarmed Bandit Problem 32 1 48 77 CiteSeerX 10 1 1 130 158 doi 10 1137 S0097539701398375 Lai T L Robbins H 1985 Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics 6 1 4 22 doi 10 1016 0196 8858 85 90002 8 Katehakis M N Robbins H 1995 Sequential choice from several populations Proceedings of the National Academy of Sciences of the United States of America 92 19 8584 5 Bibcode 1995PNAS 92 8584K doi 10 1073 pnas 92 19 8584 PMC 41010 PMID 11607577 Burnetas A N Katehakis M N 1996 Optimal adaptive policies for sequential allocation problems Advances in Applied Mathematics 17 2 122 142 doi 10 1006 aama 1996 0007 Burnetas A N Katehakis M N 1997 Optimal adaptive policies for Markov decision processes Math Oper Res 22 1 222 255 doi 10 1287 moor 22 1 222 Tewari A Bartlett P L 2008 PDF Advances in Neural Information Processing Systems 20 CiteSeerX 10 1 1 69 5482 Arhiv originalu PDF za 25 travnya 2012 Procitovano 4 grudnya 2021 Ortner R 2010 Online regret bounds for Markov decision processes with deterministic transitions Theoretical Computer Science 411 29 2684 2695 doi 10 1016 j tcs 2010 04 005 Filippi S and Cappe O and Garivier A 2010 Online regret bounds for Markov decision processes with deterministic transitions Communication Control and Computing Allerton 2010 48th Annual Allerton Conference on pp 115 122 Honda J Takemura A 2011 An asymptotically optimal policy for finite support models in the multi armed bandit problem Machine Learning 85 3 361 391 arXiv 0905 2776 doi 10 1007 s10994 011 5257 4 Averbeck B B 2015 Theory of choice in bandit information sampling and foraging tasks PLOS Computational Biology 11 3 e1004164 Bibcode 2015PLSCB 11E4164A doi 10 1371 journal pcbi 1004164 PMC 4376795 PMID 25815510 a href wiki D0 A8 D0 B0 D0 B1 D0 BB D0 BE D0 BD Cite journal title Shablon Cite journal cite journal a Obslugovuvannya CS1 Storinki iz nepoznachenim DOI z bezkoshtovnim dostupom posilannya Costa V D Averbeck B B 2019 Subcortical Substrates of Explore Exploit Decisions in Primates Neuron 103 3 533 535 doi 10 1016 j neuron 2019 05 017 PMC 6687547 PMID 31196672 Sutton R S amp Barto A G 1998 Reinforcement learning an introduction Cambridge MA MIT Press Tokic Michel 2010 PDF KI 2010 Advances in Artificial Intelligence Lecture Notes in Computer Science t 6359 Springer Verlag s 203 210 doi 10 1007 978 3 642 16111 7 23 ISBN 978 3 642 16110 0 arhiv originalu PDF za 4 grudnya 2021 procitovano 4 grudnya 2021 Tokic Michel Palm Gunther 2011 PDF KI 2011 Advances in Artificial Intelligence Lecture Notes in Computer Science t 7006 Springer Verlag s 335 346 ISBN 978 3 642 24455 1 arhiv originalu PDF za 23 listopada 2018 procitovano 4 grudnya 2021 Gimelfarb Michel Sanner Scott Lee Chi Guhn 2019 PDF Proceedings of the Thirty Fifth Conference on Uncertainty in Artificial Intelligence AUAI Press s 162 arhiv originalu PDF za 1 bereznya 2021 procitovano 4 grudnya 2021 Bouneffouf D Bouzeghoub A Gancarski A L 2012 A Contextual Bandit Algorithm for Mobile Context Aware Recommender System Neural Information Processing Lecture Notes in Computer Science t 7665 s 324 doi 10 1007 978 3 642 34487 9 40 ISBN 978 3 642 34486 2 V inshomu movnomu rozdili ye povnisha stattya Multi armed bandit angl Vi mozhete dopomogti rozshirivshi potochnu stattyu za dopomogoyu perekladu z anglijskoyi Divitis avtoperekladenu versiyu statti z movi anglijska Perekladach povinen rozumiti sho vidpovidalnist za kincevij vmist statti u Vikipediyi nese same avtor redaguvan Onlajn pereklad nadayetsya lishe yak korisnij instrument pereglyadu vmistu zrozumiloyu movoyu Ne vikoristovujte nevichitanij i nevidkorigovanij mashinnij pereklad u stattyah ukrayinskoyi Vikipediyi Mashinnij pereklad Google ye korisnoyu vidpravnoyu tochkoyu dlya perekladu ale perekladacham neobhidno vipravlyati pomilki ta pidtverdzhuvati tochnist perekladu a ne prosto skopiyuvati mashinnij pereklad do ukrayinskoyi Vikipediyi Ne perekladajte tekst yakij vidayetsya nedostovirnim abo neyakisnim Yaksho mozhlivo perevirte tekst za posilannyami podanimi v inshomovnij statti Dokladni rekomendaciyi div Vikipediya Pereklad
Топ