
Hi I'm trying to learn how DPOP works but got stuck at the Derivation of DPO part (the provement of DPOP's motivation)
In specific, the equation 4 in Appendix B.1 Derivation for DPO.
Could you help elaborate a little on how equation 4 generates or could you give more directions on the references?
Really appreciate it!