[4Yin2-30] Optimality and division of labor through little information sharing among multiple satisficing agents
Keywords:bandit problem, satisficing, social learning
Human groups show efficient decision-making even in unknown environments harnessing the division of labor. It may be reasonable for individuals to be optimistic about the uncertainty of the environment, as in the UCB strategy. On the other hand, a group may be better perform by dividing this labor with certain pessimism -- its members underestimating or utterly disbelieving the information shared by the other members. RS (Risk-sensitive Satisficing) is an algorithm that satisficing, a human decision-making tendency, and can quickly search for actions that satisfy some desired level. In the bandit problem, we have successfully modeled the emulation by sharing a criterion value among multiple RS agents. Emulation is the imitation of only the results of others. In an environment with stochastic fluctuations, we found that the conventional method of sharing the highest record, which is a method of sharing reference values, has a problem with performance degradation as the number of RS agents increases. We propose a method of sharing reference values that improves performance by pessimistically estimating the records of others even in environments with stochastic fluctuations, and demonstrate the usefulness of the algorithm.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.