https://dblp.org/rdf/schema#authoredBy
|
https://dblp.org/pid/137/8096 +
, https://dblp.org/pid/340/3992 +
, https://dblp.org/pid/33/4054 +
, https://dblp.org/pid/97/4708-7 +
, https://dblp.org/pid/74/184 +
, https://dblp.org/pid/09/851-2 +
, https://dblp.org/pid/47/1798-1 +
|
https://dblp.org/rdf/schema#bibtexType
|
http://purl.org/net/nknouf/ns/bibtex#Article +
|
https://dblp.org/rdf/schema#createdBy
|
https://dblp.org/pid/137/8096 +
, https://dblp.org/pid/340/3992 +
, https://dblp.org/pid/33/4054 +
, https://dblp.org/pid/97/4708-7 +
, https://dblp.org/pid/74/184 +
, https://dblp.org/pid/09/851-2 +
, https://dblp.org/pid/47/1798-1 +
|
https://dblp.org/rdf/schema#documentPage
|
https://doi.org/10.48550/ARXIV.2404.18922 +
|
https://dblp.org/rdf/schema#doi
|
https://doi.org/10.48550/ARXIV.2404.18922 +
|
https://dblp.org/rdf/schema#listedOnTocPage
|
https://dblp.org/db/journals/corr/corr2404 +
|
https://dblp.org/rdf/schema#numberOfCreators
|
7
|
https://dblp.org/rdf/schema#primaryDocumentPage
|
https://doi.org/10.48550/ARXIV.2404.18922 +
|
https://dblp.org/rdf/schema#publishedIn
|
CoRR
|
https://dblp.org/rdf/schema#publishedInJournal
|
CoRR
|
https://dblp.org/rdf/schema#publishedInJournalVolume
|
abs/2404.18922
|
https://dblp.org/rdf/schema#publishedInStream
|
https://dblp.org/streams/journals/corr +
|
https://dblp.org/rdf/schema#title
|
DPO Meets PPO: Reinforced Token Optimization for RLHF.
|
https://dblp.org/rdf/schema#yearOfPublication
|
2024
|
owl:sameAs |
https://doi.org/10.48550/ARXIV.2404.18922 +
, http://dx.doi.org/10.48550/ARXIV.2404.18922 +
|
rdf:type |
https://dblp.org/rdf/schema#Publication +
, https://dblp.org/rdf/schema#Informal +
|
rdfs:label |
Han Zhong et al.: DPO Meets PPO: Reinforced Token Optimization for RLHF. (2024)
|