Zzong's Notes

RLVR

2건의 항목

  • 2026년 6월 14일

    One Token to Fool LLM-as-a-Judge

    • language_model
    • RLVR
    • nlp
    • paper_review
  • 2026년 6월 14일

    Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

    • LLM
    • RLVR
    • paper_review
    • reinforcement_learning