生成式AI学习笔记 (Study Notes on Generative AI)
Generative AI Lifecycle
- Generative AI on AWS: Building Context-Aware, Multimodal Reasoning Applications – This O’Reilly book dives deep into all phases of the generative AI lifecycle including model selection, fine-tuning, adapting, evaluation, deployment, and runtime optimizations. 这本 O’Reilly 著作深入解析生成式 AI 生命周期的各个阶段,包括模型选择、微调、适配、评估、部署及运行时优化。
Reinforcement Learning from Human-Feedback (RLHF)
- Training language models to follow instructions with human feedback – Paper by OpenAI introducing a human-in-the-loop process to create a model that is better at following instructions (InstructGPT). OpenAI 论文,提出“human-in-the-loop”流程,以构建更善于遵循指令的模型(InstructGPT)。
- Learning to summarize from human feedback – This paper presents a method for improving language model-generated summaries using a reward-based approach, surpassing human reference summaries. 通过奖励机制改进语言模型摘要质量的方法,结果超越人工参考摘要。
Proximal Policy Optimization (PPO)
- Proximal Policy Optimization Algorithms – The paper from researchers at OpenAI that first proposed the PPO algorithm. The paper discusses the performance of the algorithm on a number of benchmark tasks including robotic locomotion and game play. OpenAI 研究人员首次提出 PPO 算法的论文,展示其在机器人行走、游戏等基准任务上的表现。
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model – This paper presents a simpler and effective method for precise control of large-scale unsupervised language models by aligning them with human preferences. 提出一种更简单且高效的方法,通过对齐人类偏好来精确控制大规模无监督语言模型。
Scaling human feedback
- Constitutional AI: Harmlessness from AI Feedback – This paper introduces a method for training a harmless AI assistant without human labels, allowing better control of AI behavior with minimal human input. 介绍一种在几乎无需人工标注的情况下训练“无害”AI助手的方法,以最小人力实现更好的行为控制。
Advanced Prompting Techniques
- Chain-of-thought Prompting Elicits Reasoning in Large Language Models – Paper by researchers at Google exploring how chain-of-thought prompting improves the ability of LLMs to perform complex reasoning. Google 研究表明,链式思维提示能显著提升 LLM 的复杂推理能力。
- PAL: Program-aided Language Models – This paper proposes an approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps. 提出让 LLM 先生成可执行程序作为中间推理步骤,以解决自然语言问题。
- ReAct: Synergizing Reasoning and Acting in Language Models This paper presents an advanced prompting technique that allows an LLM to make decisions about how to interact with external applications. 高级提示技术,让 LLM 能自主决策并与外部应用交互。
LLM powered application architectures
- LangChain Library (GitHub) – This library is aimed at assisting in the development of those types of applications, such as Question Answering, Chatbots and other Agents. You can read the documentation here. 面向问答、聊天机器人及其他代理类应用的开发框架,文档详见 GitHub。
- Who Owns the Generative AI Platform? – The article examines the market dynamics and business models of generative AI. 文章探讨生成式 AI 的市场格局与商业模式。