Підтримка
www.wikidata.uk-ua.nina.az
Ma rkovski proce si viri shuvannya MPV angl Markov decision process MDP zabezpechuyut matematichnu sistemu dlya modelyuvannya uhvalennya rishen u situaciyah v yakih naslidki ye chastkovo vipadkovimi a chastkovo kontrolovanimi uhvalyuvachem rishennya MPV ye korisnimi dlya doslidzhennya shirokogo spektra zadach optimizaciyi rozv yazuvanih dinamichnim programuvannyam ta navchannyam z pidkriplennyam MPV buli vidomi shonajmenshe z 1950 h rokiv por Bellman 1957 Osnovna masa doslidzhen markovskih procesiv virishuvannya stala rezultatom knigi en opublikovanoyi 1960 roku Dinamichne programuvannya ta markovski procesi angl Dynamic Programming and Markov Processes Yih zastosovuyut u shirokij oblasti disciplin vklyuchno z robototehnikoyu avtomatizovanim keruvannyam ekonomikoyu ta virobnictvom Yaksho tochnishe to markovskij proces virishuvannya ye stohastichnim procesom keruvannya en Na kozhnomu kroci chasu proces perebuvaye v yakomus stani s displaystyle s i uhvalyuvach rishennya mozhe obrati bud yaku diyu a displaystyle a dostupnu v stani s displaystyle s Proces reaguye na nastupnomu kroci chasu vipadkovim perehodom do novogo stanu s displaystyle s i nadannyam uhvalyuvachevi rishennya vidpovidnoyi vinagorodi angl reward Ra s s displaystyle R a s s Jmovirnist perehodu procesu do jogo novogo stanu s displaystyle s znahoditsya pid vplivom obranoyi diyi Konkretno vona zadayetsya funkciyeyu perehodu stanu Pa s s displaystyle P a s s Takim chinom nastupnij stan s displaystyle s zalezhit vid potochnogo stanu s displaystyle s ta vid diyi uhvalyuvacha rishennya a displaystyle a Ale dlya zadanih s displaystyle s ta a displaystyle a vin ye umovno nezalezhnim vid usih poperednih staniv ta dij inshimi slovami perehodi staniv procesu MPV zadovolnyayut markovsku vlastivist Markovski procesi virishuvannya ye rozshirennyam markovskih lancyugiv riznicya polyagaye v dodanni dij sho daye vibir ta vinagorod sho daye motivaciyu I navpaki yaksho dlya kozhnogo stanu isnuye lishe odna diya napriklad chekati ta vsi vinagorodi ye odnakovimi napriklad nul to markovskij proces virishuvannya zvoditsya do markovskogo lancyuga ViznachennyaPriklad prostogo MPV z troma stanami ta dvoma diyami Markovskij proces virishuvannya ye 5 koyu S A P R g displaystyle S A P cdot cdot cdot R cdot cdot cdot gamma de S displaystyle S ye skinchennoyu mnozhinoyu staniv A displaystyle A ye skinchennoyu mnozhinoyu dij yak alternativa As displaystyle A s ye skinchennoyu mnozhinoyu dij dostupnih zi stanu s displaystyle s Pa s s Pr st 1 s st s at a displaystyle P a s s Pr s t 1 s mid s t s a t a ye jmovirnistyu togo sho diya a displaystyle a v stani s displaystyle s v moment chasu t displaystyle t prizvede do stanu s displaystyle s v moment chasu t 1 displaystyle t 1 Ra s s displaystyle R a s s ye bezposerednoyu vinagorodoyu abo ochikuvanoyu bezposerednoyu vinagorodoyu otrimuvanoyu pislya perehodu do stanu s displaystyle s zi stanu s displaystyle s g 0 1 displaystyle gamma in 0 1 ye koeficiyentom znecinyuvannya angl discount factor yakij predstavlyaye vidminnist vazhlivosti majbutnih ta potochnih vinagorod Zauvazhennya Teoriya markovskih procesiv virishuvannya ne stverdzhuye sho S displaystyle S chi A displaystyle A ye skinchennimi ale osnovni algoritmi navedeni nizhche peredbachayut sho voni ye skinchennimi ZadachaOsnovnoyu zadacheyu MPV ye znajti strategiyu angl policy dlya uhvalyuvacha rishen funkciyu p displaystyle pi yaka viznachaye diyu p s displaystyle pi s yaku uhvalyuvach rishennya obere v stani s displaystyle s Zauvazhte sho shojno markovskij proces virishuvannya poyednano takim chinom zi strategiyeyu to ce fiksuye diyu dlya kozhnogo stanu i otrimane v rezultati poyednannya povoditsya yak markovskij lancyug Metoyu ye obrati taku strategiyu p displaystyle pi yaka maksimizuvatime deyaku kumulyativnu funkciyu vipadkovih vinagorod zazvichaj ochikuvanu znecinenu funkciyu nad potencijno neskinchennim gorizontom t 0 gtRat st st 1 displaystyle sum t 0 infty gamma t R a t s t s t 1 de mi obirayemo at p st displaystyle a t pi s t de g displaystyle gamma ye koeficiyentom znecinyuvannya i zadovolnyaye 0 g lt 1 displaystyle 0 leq gamma lt 1 Napriklad g 1 1 r displaystyle gamma 1 1 r de intensivnistyu znecinyuvannya ye r displaystyle r Zazvichaj g displaystyle gamma ye blizkim do 1 Zavdyaki markovskij vlastivosti optimalnu strategiyu dlya ciyeyi konkretnoyi zadachi naspravdi mozhe buti zapisano yak funkciyu lishe vid s displaystyle s yak i peredbachalosya vishe AlgoritmiMPV mozhe buti rozv yazuvano linijnim programuvannyam abo dinamichnim programuvannyam Dali mi predstavimo ostannij pidhid Pripustimo sho mi znayemo funkciyu perehodu stanu P displaystyle P ta funkciyu vinagorodi R displaystyle R i hochemo obchisliti strategiyu yaka maksimizuye ochikuvanu znecinenu vinagorodu Standartne simejstvo algoritmiv dlya obchislennya ciyeyi optimalnoyi strategiyi vimagaye zberigannya dvoh masiviv proindeksovanih za stanom cinnostej angl value V displaystyle V yakij mistit dijsni znachennya ta strategiyi p displaystyle pi yakij mistit diyi Po zavershennyu algoritmu p displaystyle pi mistitime rozv yazok a V s displaystyle V s mistitime znecinenu sumu vinagorod yaku bude zarobleno v serednomu pri sliduvanni cim rozv yazkom zi stanu s displaystyle s Algoritm maye nastupni dva vidi krokiv yaki povtoryuyutsya v pevnomu poryadku dlya vsih staniv dopoki podalshih zmin vzhe ne vidbuvatimetsya Voni viznachayutsya rekursivno nastupnim chinom p s arg maxa s Pa s s Ra s s gV s displaystyle pi s arg max a left sum s P a s s left R a s s gamma V s right right V s s Pp s s s Rp s s s gV s displaystyle V s sum s P pi s s s left R pi s s s gamma V s right Yihnij poryadok zalezhit vid variantu algoritmu mozhna robiti yih odnochasno dlya vsih staniv abo stan za stanom i chastishe dlya odnih staniv nizh dlya inshih Yaksho zhoden zi staniv ne viklyuchatimetsya nazavzhdi z bud yakogo z krokiv to algoritm zreshtoyu prijde do pravilnogo rozv yazku Vidomi varianti Iteraciya za cinnostyami V iteraciyi za cinnostyami angl value iteration Bellman 1957 yaku takozh nazivayut zvorotnoyu indukciyeyu funkciya p displaystyle pi ne vikoristovuyetsya natomist znachennya p s displaystyle pi s obchislyuyetsya v mezhah V s displaystyle V s za potreboyu Metod iteraciyi za cinnostyami dlya MPV uvijshov do praci Llojda Shepli 1953 roku pro stohastichni igri yak okremij vipadok ale ce bulo viznano lishe zgodom Pidstavlennya obchislennya p s displaystyle pi s do obchislennya V s displaystyle V s daye poyednanij krok Vi 1 s maxa s Pa s s Ra s s gVi s displaystyle V i 1 s max a left sum s P a s s left R a s s gamma V i s right right de i displaystyle i ye nomerom iteraciyi Iteraciya za cinnostyami pochinayetsya z i 0 displaystyle i 0 ta V0 displaystyle V 0 yak pripushennya pro funkciyu cinnosti Potim vikonuyetsya iteruvannya z povtornim obchislennyam Vi 1 displaystyle V i 1 dlya vsih staniv s displaystyle s poki V displaystyle V ne zbizhitsya koli liva storona dorivnyuvatime pravij sho ye rivnyannyam Bellmana dlya ciyeyi zadachi Iteraciya za strategiyami V iteraciyi za strategiyami angl policy iteration Howard 1960 pershij krok vikonuyetsya odin raz a potim drugij krok povtoryuyetsya do zbizhnosti Potim pershij krok vikonuyetsya znovu i tak dali Zamist povtoryuvannya drugogo kroku do zbizhnosti jogo mozhe buti sformulovano ta rozv yazano yak nabir linijnih rivnyan Cej variant maye perevagu v tomu sho isnuye chitka umova zupinki koli masiv p displaystyle pi ne zminyuyetsya v procesi zastosuvannya kroku 1 do vsih staniv algoritm zavershuyetsya Vidozminena iteraciya za strategiyami U vidozminenij iteraciyi za strategiyami angl modified policy iteration van Nunen 1976 Puterman ta Shin 1978 pershij krok vikonuyetsya odin raz a potim drugij krok povtoryuyetsya kilka raziv Potim znovu pershij krok vikonuyetsya odin raz i tak dali Prioritetne pidmitannya V comu varianti angl prioritized sweeping kroki zastosovuyutsya do staniv iz nadavannyam perevagi tim yaki ye yakimos chinom vazhlivimi chi to na osnovi algoritmu neshodavno buli veliki zmini v V displaystyle V abo p displaystyle pi navkolo cih staniv chi to na osnovi vikoristannya ci stani znahodyatsya blizko do pochatkovogo stanu abo inshim chinom stanovlyat interes dlya osobi abo programi yaka zastosovuye cej algoritm Rozshirennya ta uzagalnennyaMarkovskij proces virishuvannya ye stohastichnoyu groyu z lishe odnim gravcem Chastkova sposterezhuvanist Detalnishi vidomosti z ciyeyi temi vi mozhete znajti v statti en Navedene vishe rozv yazannya peredbachaye sho v toj moment koli treba virishuvati yaku diyu vchiniti stan s displaystyle s ye vidomim inakshe p s displaystyle pi s obchisleno buti ne mozhe Yaksho ce pripushennya ne ye virnim zadachu nazivayut chastkovo sposterezhuvanim markovskim procesom virishuvannya ChSMPV angl partially observable Markov decision process POMDP Golovnij postup u cij oblasti bulo zabezpecheno Burnetasom ta Katehakisom v Optimalnih adaptivnih strategiyah dlya markovskih procesiv virishuvannya V cij praci bulo pobudovano klas adaptivnih strategij yaki volodiyut vlastivostyami rivnomirno maksimalnogo tempu zbizhnosti dlya zagalnoyi ochikuvanoyi vinagorodi skinchennogo intervalu za pripushen skinchennih prostoriv stan diya ta neskorotnosti zakonu perehodu Ci strategiyi pripisuyut shobi vibir dij na kozhnomu stani v kozhen moment chasu gruntuvavsya na pokaznikah yaki ye rozduvannyami pravoyi storoni rivnyan optimalnosti ochikuvanoyi userednenoyi vinagorodi Navchannya z pidkriplennyam Dokladnishe Navchannya z pidkriplennyam Yaksho jmovirnosti vinagorod ye nevidomimi to zadacha ye zadacheyu navchannya z pidkriplennyam Sutton ta Barto 1998 Dlya cogo korisno viznachiti nastupnu funkciyu yaka vidpovidaye vchinennyu diyi a displaystyle a z prodovzhennyam optimalnim chinom abo vidpovidno do bud yakoyi nayavnoyi v danij moment strategiyi Q s a s Pa s s Ra s s gV s displaystyle Q s a sum s P a s s R a s s gamma V s I hocha cya funkciya takozh ye nevidomoyu dosvid pid chas navchannya gruntuyetsya na parah s a displaystyle s a razom z naslidkom s displaystyle s tobto Ya buv u stani s displaystyle s sprobuvav vchiniti a displaystyle a i stalosya s displaystyle s Takim chinom ye masiv Q displaystyle Q i dosvid vikoristovuyetsya dlya jogo bezposerednogo utochnennya Ce vidome yak Q navchannya Navchannya z pidkriplennyam mozhe rozv yazuvati markovski procesi virishuvannya bez yavnogo vkazannya jmovirnostej perehodiv znachennya jmovirnostej perehodiv neobhidni dlya iteraciyi za cinnostyami ta za strategiyami V navchanni z pidkriplennyami zamist yavnogo vkazannya jmovirnostej perehodiv dostup do nih otrimuyetsya cherez imitator yakij zazvichaj bagatorazovo perezapuskayetsya z rivnomirnogo vipadkovogo pochatkovogo stanu Navchannya z pidkriplennyam takozh mozhe poyednuvatisya z nablizhennyam funkcij shobi mozhna bulo bratisya za zadachi z duzhe velikim chislom staniv Avtomati z samonavchannyam Detalnishi vidomosti z ciyeyi temi vi mozhete znajti v statti en She odne zastosuvannya procesu MPV v teoriyi mashinnogo navchannya nazivayetsya avtomatami z samonavchannyam Vono takozh ye odnim iz tipiv navchannya z pidkriplennyam yaksho seredovishe maye stohastichnij harakter Pershe detalne doslidzhennya pro avtomati z samonavchannyam angl learning automata zdijsnili en ta Tatachar 1974 v yakomu yih bulo pervisno opisano yavno yak skinchenni avtomati Podibno do navchannya z pidkriplennyam algoritm avtomativ iz samonavchannyam takozh maye perevagu rozv yazannya zadach u yakih imovirnosti abo vinagorodi ye nevidomimi Vidminnist avtomativ iz samonavchannyam vid Q navchannya polyagaye v tomu sho voni ne vklyuchayut pam yat Q znachen a dlya znahodzhennya rezultatu navchannya utochnyuyut jmovirnosti dij bezposeredno Avtomati z samonavchannyam ye odniyeyu zi shem navchannya z suvorim dovedennyam zbizhnosti V teoriyi avtomativ iz samonavchannyam stohastichnij avtomat angl stochastic automaton skladayetsya z mnozhini mozhlivih vhodiv x mnozhini mozhlivih vnutrishnih staniv F F1 Fs mnozhini mozhlivih vihodiv abo dij a a1 ar de r s vektora pochatkovoyi jmovirnosti staniv p 0 p1 0 ps 0 obchislyuvanoyi funkciyi A yaka pislya kozhnogo kroku chasu t porodzhuye p t 1 z p t potochnogo vhodu ta potochnogo stanu i funkciyi G F a yaka porodzhuye vihid na kozhnomu kroci chasu Stani takogo avtomatu vidpovidayut stanam markovskogo procesu diskretnogo chasu z diskretnimi parametrami Na kozhnomu kroci chasu t 0 1 2 3 avtomat chitaye vhid zi svogo seredovisha utochnyuye P t do P t 1 za dopomogoyu A vipadkovo obiraye nastupnij stan vidpovidno do jmovirnostej P t 1 ta vivodit vidpovidnu diyu Seredovishe avtomata v svoyu chergu chitaye cyu diyu i nadsilaye avtomatovi nastupnij vhid Interpretaciya v terminah teoriyi kategorij Krim yak cherez vinagorodi markovskij proces virishuvannya S A P displaystyle S A P mozhna rozumiti i v terminah teoriyi kategorij A same nehaj A displaystyle mathcal A poznachaye en z porodzhuvalnoyu mnozhinoyu A displaystyle A Nehaj Dist displaystyle mathbf Dist poznachaye en monadi Zhiri 6 travnya 2016 u Wayback Machine Todi funktor A Dist displaystyle mathcal A to mathbf Dist koduye yak mnozhinu staniv S displaystyle S tak i funkciyu jmovirnostej P displaystyle P Takim chinom markovskij proces virishuvannya mozhe buti uzagalneno z monoyidiv kategorij z odnim ob yektom na dovilni kategoriyi Rezultat C F C Dist displaystyle mathcal C F mathcal C to mathbf Dist mozhna nazvati kontekstno zalezhnim markovskim procesom virishuvannya angl context dependent Markov decision process oskilki perehid vid odnogo ob yektu do inshogo v C displaystyle mathcal C zminyuye mnozhinu dostupnih dij ta mnozhinu mozhlivih staniv Markovskij proces virishuvannya bezperervnogo chasuV markovskih procesah virishuvannya diskretnogo chasu rishennya zdijsnyuyutsya cherez diskretni promizhki chasu Prote dlya markovskih procesiv virishuvannya bezperervnogo chasu angl Continuous time Markov Decision Processes rishennya mozhut zdijsnyuvatisya v bud yakij chas yakij obere uhvalyuvach rishen U porivnyanni z markovskimi procesami virishuvannya diskretnogo chasu markovski procesi virishuvannya bezperervnogo chasu mozhut krashe modelyuvati proces uhvalyuvannya rishen dlya sistemi yaka maye en tobto sistemi dinamika yakoyi viznachayetsya diferencialnimi rivnyannyami z chastinnimi pohidnimi Viznachennya Dlya obgovorennya markovskih procesiv virishuvannya bezperervnogo chasu vvedimo dva nabori poznachen Yaksho prostir staniv ta prostir dij ye skinchennimi S displaystyle mathcal S prostir staniv angl State space A displaystyle mathcal A prostir dij angl Action space q i j a displaystyle q i j a S A S displaystyle mathcal S times mathcal A rightarrow triangle mathcal S funkciya intensivnosti perehodiv angl transition rate function R i a displaystyle R i a S A R displaystyle mathcal S times mathcal A rightarrow mathbb R funkciya vinagorodi angl reward function Yaksho prostir staniv ta prostir dij ye neperervnimi X displaystyle mathcal X prostir staniv angl state space U displaystyle mathcal U prostir mozhlivogo keruvannya angl space of possible control f x u displaystyle f x u X U X displaystyle mathcal X times mathcal U rightarrow triangle mathcal X funkciya intensivnosti perehodiv angl transition rate function r x u displaystyle r x u X U R displaystyle mathcal X times mathcal U rightarrow mathbb R funkciya intensivnosti vinagorodi angl reward rate function taka sho r x t u t dt dR x t u t displaystyle r x t u t dt dR x t u t de R x u displaystyle R x u ye funkciyeyu vinagorodi yaku mi obgovoryuvali v poperednomu vipadku Zadacha Yak i v markovskih procesah virishuvannya diskretnogo chasu v markovskomu procesi virishuvannya bezperervnogo chasu mi hochemo znahoditi optimalnu strategiyu angl policy abo keruvannya angl control yake davalo bi nam optimalnu ochikuvanu prointegrovanu vinagorodu maxEu 0 gtr x t u t dt x0 displaystyle max quad mathbb E u int 0 infty gamma t r x t u t dt x 0 De 0 g lt 1 displaystyle 0 leq gamma lt 1 Formulyuvannya linijnogo programuvannya Yaksho prostori staniv ta dij ye skinchennimi to dlya poshuku optimalnoyi strategiyi mi mozhemo vikoristovuvati linijne programuvannya sho bulo odnim iz najpershih zastosovanih pidhodiv Tut mi rozglyadayemo lishe ergodichnu model yaka oznachaye sho nash MPV bezperervnogo chasu staye ergodichnim markovskim lancyugom bezperervnogo chasu za staloyi strategiyi Za cogo pripushennya hocha uhvalyuvach rishennya i mozhe zdijsnyuvati rishennya v bud yakij chas u potochnomu stani vin ne mozhe vigrati bilshe zdijsnyuyuchi bilshe nizh odnu diyu Dlya nogo krashe zdijsnyuvati diyu lishe v toj moment chasu koli sistema perehodit z potochnogo stanu do inshogo Za deyakih umov detalnishe div Naslidok 3 14 u Continuous Time Markov Decision Processes 2 lyutogo 2012 u Wayback Machine yaksho nasha funkciya optimalnoyi cinnosti V displaystyle V ye nezalezhnoyu vid stanu i displaystyle i to mi matimemo nastupnu nerivnist g R i a j Sq j i a h j i Sanda A i displaystyle g geq R i a sum j in S q j i a h j quad forall i in S and a in A i Yaksho isnuye funkciya h displaystyle h to V displaystyle bar V bude najmenshim g displaystyle g yake zadovolnyaye navedene vishe rivnyannya Shobi znahoditi V displaystyle bar V mi mozhemo zastosovuvati nastupnu model linijnogo programuvannya Pryama linijna programa P LP angl primal linear program P LP Minimizegs tg j Sq j i a h j R i a i S a A i displaystyle begin aligned text Minimize quad amp g text s t quad amp g sum j in S q j i a h j geq R i a forall i in S a in A i end aligned Dvoyista linijna programa D LP angl dual linear program D LP Maximize i S a A i R i a y i a s t i S a A i q j i a y i a 0 j S i S a A i y i a 1 y i a 0 a A i and i S displaystyle begin aligned text Maximize amp sum i in S sum a in A i R i a y i a text s t amp sum i in S sum a in A i q j i a y i a 0 quad forall j in S amp sum i in S sum a in A i y i a 1 amp y i a geq 0 qquad forall a in A i and forall i in S end aligned y i a displaystyle y i a ye pridatnim rozv yazkom D LP yaksho y i a displaystyle y i a ye nevirodzhenoyu i zadovolnyaye obmezhennya zadachi D LP Pridatnij rozv yazok D LP y i a displaystyle y i a nazivayut optimalnim rozv yazkom yaksho i S a A i R i a y i a i S a A i R i a y i a displaystyle begin aligned sum i in S sum a in A i R i a y i a geq sum i in S sum a in A i R i a y i a end aligned dlya vsih pridatnih rozv yazkiv D LP y i a displaystyle y i a Shojno mi znajshli optimalnij rozv yazok y i a displaystyle y i a mi mozhemo vikoristovuvati jogo dlya vstanovlennya optimalnih strategij Rivnyannya Gamiltona Yakobi Bellmana Yaksho prostir staniv ta prostir dij v MPV bezperervnogo chasu ye neperervnimi to optimalnij kriterij mozhna znahoditi shlyahom rozv yazannya diferencialnogo rivnyannya Gamiltona Yakobi Bellmana GYaB angl Hamilton Jacobi Bellman equation HJB v chastkovih pohidnih Dlya obgovorennya rivnyannya GYaB nam neobhidno pereformulyuvati nashu zadachu V x 0 0 maxu 0Tr x t u t dt D x T s t dx t dt f t x t u t displaystyle begin aligned V x 0 0 amp text max u int 0 T r x t u t dt D x T s t quad amp frac dx t dt f t x t u t end aligned D displaystyle D cdot ye funkciyeyu ostatochnoyi vinagorodi angl terminal reward function x t displaystyle x t ye vektorom stanu sistemi u t displaystyle u t ye vektorom keruvannya sistemoyu yakij mi namagayemosya znajti f displaystyle f cdot pokazuye yak stan sistemi zminyuyetsya z chasom Rivnyannya Gamiltona Yakobi Bellmana ye takim 0 maxu r t x u V t x xf t x u displaystyle 0 text max u r t x u frac partial V t x partial x f t x u Mi mozhemo rozv yazuvati ce rivnyannya shobi znahoditi optimalne keruvannya u t displaystyle u t yake davalo bi nam optimalnu cinnist V displaystyle V Zastosuvannya Markovski procesi virishuvannya bezperervnogo chasu mayut zastosuvannya v sistemah masovogo obslugovuvannya procesah epidemiyi ta en Alternativni poznachennyaTerminologiya ta poznachennya MPV ne ye ostatochno uzgodzhenimi Ye dvi osnovni techiyi odna zoseredzhuyetsya na zadachah maksimizaciyi z kontekstiv na kshtalt ekonomiki zastosovuyuchi termini diya angl action vinagoroda angl reward cinnist angl value ta nazivayuchi koeficiyent znecinyuvannya angl discount factor b displaystyle beta abo g displaystyle gamma v toj chas yak insha zoseredzhuyetsya na zadachah minimizaciyi z tehniki ta navigaciyi zastosovuyuchi termini keruvannya angl control vitrati angl cost ostatochni vitrati angl cost to go i nazivayuchi koeficiyent znecinyuvannya angl discount factor a displaystyle alpha Na dodachu riznitsya j poznachennya jmovirnosti perehodu v cij statti alternativne komentardiya a displaystyle a keruvannya u displaystyle u vinagoroda R displaystyle R vitrati g displaystyle g g displaystyle g ye vid yemnoyu R displaystyle R cinnist V displaystyle V ostatochni vitrati J displaystyle J J displaystyle J ye vid yemnoyu V displaystyle V strategiya p displaystyle pi strategiya m displaystyle mu koeficiyent znecinyuvannya g displaystyle gamma koeficiyent znecinyuvannya a displaystyle alpha jmovirnist perehodu Pa s s displaystyle P a s s jmovirnist perehodu pss a displaystyle p ss a Na dodachu jmovirnist perehodu inodi zapisuyut yak Pr s a s displaystyle Pr s a s Pr s s a displaystyle Pr s s a abo ridshe yak ps s a displaystyle p s s a Obmezheni markovski procesi virishuvannyaObmezheni markovski procesi virishuvannya OMPV angl Constrained Markov Decision Process CMDP ye rozshirennyami markovskih procesiv virishuvannya MPV Mizh MPV ta OMPV ye tri dokorinni vidminnosti Pislya zastosuvannya diyi zamist odniyeyi vitrati nesutsya dekilka vitrat OMPV rozv yazuyutsya lishe za dopomogoyu linijnih program a dinamichne programuvannya ne pracyuye Ostatochna strategiya zalezhit vid pochatkovogo stanu Isnuye ryad zastosuvan OMPV Neshodavno yih bulo zastosovano v scenariyah planuvannya ruhu v robototehnici Div takozh en en en Dinamichne programuvannya Rivnyannya Bellmana dlya zastosuvan v ekonomici Rivnyannya Gamiltona Yakobi Bellmana Optimalne keruvannya en en Stohastichni igri Q navchannyaPrimitkiHoward 1960 Shapley 1953 Kallenberg 2002 Burnetas ta Katehakis 1997 Narendra ta Thathachar 1974 Narendra ta Thathachar 1989 Narendra ta Thathachar 1974 s 325 livoruch Altman 1999 Feyzabadi ta Carpin 2014 DzherelaBellman R 1957 Journal of Mathematics and Mechanics 6 Arhiv originalu za 30 kvitnya 2021 Procitovano 8 veresnya 2016 angl Bellman R E 2003 1957 Dynamic Programming vid Dover paperback edition Princeton NJ Princeton University Press ISBN 0 486 42809 5 angl Howard Ronald A 1960 PDF The M I T Press Arhiv originalu PDF za 9 zhovtnya 2011 Procitovano 8 veresnya 2016 angl Shapley Lloyd 1953 Stochastic Games Proceedings of National Academy of Science 39 1095 1100 angl Kallenberg Lodewijk 2002 Finite state and action MDPs U Feinberg Eugene A Shwartz Adam red Handbook of Markov decision processes methods and applications Springer ISBN 0 7923 7459 2 angl Bertsekas D 1995 Dynamic Programming and Optimal Control T 2 MA Athena angl Burnetas A N Katehakis M N 1997 Optimal Adaptive Policies for Markov Decision Processes Mathematics of Operations Research 22 1 222 doi 10 1287 moor 22 1 222 angl Feinberg E A Shwartz A red 2002 Handbook of Markov Decision Processes Boston MA Kluwer angl Derman C 1970 Finite state Markovian decision processes Academic Press angl Puterman M L 1994 Markov Decision Processes Wiley angl Tijms H C 2003 A First Course in Stochastic Models Wiley angl Sutton R S Barto A G 1998 Cambridge MA The MIT Press Arhiv originalu za 11 grudnya 2013 Procitovano 8 veresnya 2016 angl van Nunen J A E E 1976 A set of successive approximation methods for discounted Markovian decision problems Z Operations Research 20 203 208 angl Thathachar M A L 1 lipnya 1974 Learning Automata A Survey IEEE Transactions on Systems Man and Cybernetics SMC 4 4 323 334 doi 10 1109 TSMC 1974 5408453 ISSN 0018 9472 angl Thathachar Mandayam A L 1989 angl Prentice Hall ISBN 9780134855585 Arhiv originalu za 16 bereznya 2017 Procitovano 8 veresnya 2016 angl Meyn S P 2007 Cambridge University Press ISBN 978 0 521 88441 9 Arhiv originalu za 19 chervnya 2010 Dodatok mistit skorochenu Arhiv originalu za 18 grudnya 2012 angl Ross S M 1983 Introduction to stochastic dynamic programming Academic press angl Guo X Hernandez Lerma O 2009 Springer Arhiv originalu za 2 lyutogo 2012 Procitovano 8 veresnya 2016 angl Puterman M L Shin M C 1978 Modified Policy Iteration Algorithms for Discounted Markov Decision Problems Management Science 24 angl Altman Eitan 1999 Constrained Markov decision processes T 7 CRC Press angl Feyzabadi S Carpin S 18 22 Aug 2014 Risk aware path planning using hierarchical constrained Markov Decision Processes Automation Science and Engineering CASE IEEE International Conference s 297 303 angl PosilannyaMDP Toolbox dlya MATLAB GNU Octave Scilab ta R 13 lipnya 2016 u Wayback Machine Instrumentarij markovskih procesiv virishuvannya MPV Chudovij navchalnij posibnik ta instrumentarij Matlab dlya roboti z MPV angl Instrumentarij MPV dlya Python 2 zhovtnya 2016 u Wayback Machine Paket dlya rozv yazannya MPV Reinforcement Learning nedostupne posilannya z kvitnya 2019 Vvedennya vid Richarda Sattona ta Endryu Barto angl SPUDD 24 kvitnya 2016 u Wayback Machine Strukturovanij rozv yazuvach MPV dlya zavantazhennya vid Jesse Hoey Learning to Solve Markovian Decision Processes 29 sichnya 2012 u Wayback Machine vid Satinder P Singh 22 lyutogo 2012 u Wayback Machine angl Optimal Adaptive Policies for Markov Decision Processes vid Burnetas ta Katehakis 1997 angl V inshomu movnomu rozdili ye povnisha stattya Markov decision process angl Vi mozhete dopomogti rozshirivshi potochnu stattyu za dopomogoyu perekladu z anglijskoyi listopad 2023 Divitis avtoperekladenu versiyu statti z movi anglijska Perekladach povinen rozumiti sho vidpovidalnist za kincevij vmist statti u Vikipediyi nese same avtor redaguvan Onlajn pereklad nadayetsya lishe yak korisnij instrument pereglyadu vmistu zrozumiloyu movoyu Ne vikoristovujte nevichitanij i nevidkorigovanij mashinnij pereklad u stattyah ukrayinskoyi Vikipediyi Mashinnij pereklad Google ye korisnoyu vidpravnoyu tochkoyu dlya perekladu ale perekladacham neobhidno vipravlyati pomilki ta pidtverdzhuvati tochnist perekladu a ne prosto skopiyuvati mashinnij pereklad do ukrayinskoyi Vikipediyi Ne perekladajte tekst yakij vidayetsya nedostovirnim abo neyakisnim Yaksho mozhlivo perevirte tekst za posilannyami podanimi v inshomovnij statti Dokladni rekomendaciyi div Vikipediya Pereklad
Топ