http://dbpedia.org/ontology/abstract
|
Proximal Policy Optimization (PPO) is a fa … Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function.e by using a different objective function.
|
http://dbpedia.org/ontology/wikiPageExternalLink
|
https://openai.com/blog/openai-baselines-ppo/ +
, https://github.com/openai/baselines/tree/master/baselines/ +
|
http://dbpedia.org/ontology/wikiPageID
|
70774614
|
http://dbpedia.org/ontology/wikiPageLength
|
1786
|
http://dbpedia.org/ontology/wikiPageRevisionID
|
1113497752
|
http://dbpedia.org/ontology/wikiPageWikiLink
|
http://dbpedia.org/resource/Game_theory +
, http://dbpedia.org/resource/OpenAI +
, http://dbpedia.org/resource/Category:Reinforcement_learning +
, http://dbpedia.org/resource/Model-free_%28reinforcement_learning%29 +
, http://dbpedia.org/resource/Policy_gradient_method +
, http://dbpedia.org/resource/Temporal_difference_learning +
, http://dbpedia.org/resource/Reinforcement_learning +
, http://dbpedia.org/resource/Category:Machine_learning_algorithms +
|
http://dbpedia.org/property/date
|
October 2022
|
http://dbpedia.org/property/reason
|
Both sources currently in the article are from OpenAI. First paper is by researcher's at OpenAI, second is to OpenAI's website. What developments have been published since 2017?
|
http://dbpedia.org/property/wikiPageUsesTemplate
|
http://dbpedia.org/resource/Template:Short_description +
, http://dbpedia.org/resource/Template:Compu-AI-stub +
, http://dbpedia.org/resource/Template:More_citations_needed +
, http://dbpedia.org/resource/Template:Machine_learning +
, http://dbpedia.org/resource/Template:Reflist +
|
http://purl.org/dc/terms/subject
|
http://dbpedia.org/resource/Category:Reinforcement_learning +
, http://dbpedia.org/resource/Category:Machine_learning_algorithms +
|
http://www.w3.org/ns/prov#wasDerivedFrom
|
http://en.wikipedia.org/wiki/Proximal_Policy_Optimization?oldid=1113497752&ns=0 +
|
http://xmlns.com/foaf/0.1/isPrimaryTopicOf
|
http://en.wikipedia.org/wiki/Proximal_Policy_Optimization +
|
owl:sameAs |
http://www.wikidata.org/entity/Q112150238 +
, http://dbpedia.org/resource/Proximal_Policy_Optimization +
, https://global.dbpedia.org/id/GXCj7 +
|
rdfs:comment |
Proximal Policy Optimization (PPO) is a fa … Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function.e by using a different objective function.
|
rdfs:label |
Proximal Policy Optimization
|