您现在的位置是：首页 > 工具

当前栏目

28.5K Star微软开源图像交互神器 Visual ChatGPT

Visual chatGPT 开源微软图像神器交互 Star

2023-06-13 09:18:26 时间

除了大力投资 Open AI ，微软还亲自下场大搞 AI 。微软开源了 Visual ChatGPT ，这个软件可以连接 ChatGPT 和一系列视觉模型，以实现在 ChatGPT 的聊天过程中发送和接收图像。

众所周知，尽管 ChatGPT 的功能非常强大，甚至可以用来写小说写论文，但目前也仅限于文字交流。

Visual ChatGPT 的出现，就像在以文字交流的 APP 中首次添加了表情包功能，而且还是根据用户输入的文本自动生成的 “定制化表情包”，大大提升了 ChatGPT 的趣味性和应用领域。

一方面，ChatGPT（或 LLM）充当通用界面，提供对图像的理解和用户的交互功能。另一方面，基础图像模型通过提供特定领域的深入知识来充当背后的技术专家。

仓库中列出了技术架构及原理图：

Demo 中共进行了三种不同类型的对话，分别是 Visual ChatGPT 接收用户的图像、Visual ChatGPT 根据用户的文本修改图像并发送给用户，以及 Visual ChatGPT 识别图片，并回答用户的提问。Visual ChatGPT 会根据用户的输入，判断是否需要使用 VFM （Visual Foundation Model，视觉基础模型）来处理该问题。
仓库中还给出了 Visual ChatGPT 所使用的图像模型和显存使用情况：

更详细的内容可以阅读 Visual ChatGPT 的 arxiv 论文：https://arxiv.org/abs/2303.04671

使用

说明：如果计算机配置高，需要显卡，可以进行尝试，或者通过 Google Colab 来进行配置

环境安装：

conda create -n visgpt python=3.8 #创建环境 conda activate visgpt #激活环境 pip install -r requirement.txt #准备环境 bash download.sh #下载模型

快速开始

# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git
# Go to directory
cd visual-chatgpt
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
#  prepare the basic environments
pip install -r requirements.txt
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are sperated by underline '_', the different models are seperated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu
# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"  
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

更多功能广大网友可以继续挖掘。

猜你喜欢

Linux 下快速搜索文件的方法（linux 下搜索文件）
MySQL Status Table_locks_waited 数据库状态作用意思及如何正确
MySQL免费版：开启你的专业之路（mysql免费版）
查看 Linux 上的磁盘挂载信息（查看磁盘挂载linux）
利用Redis进行远程调用的探索（redis 远程调用）
Array 数组的几种排序方法与常用添加数组元素方法
如何不用一行 JS 代码做一个现代化可交互网站
Alien Skin Exposure7安装激活教程 PSLR免费插件
重温JavaScript中的正则表达式js学习笔记
Linux发展史：从分支版本到丰富多彩（linux的分支）
docker使用笔记VII -- Scrapy
JavaScript中的原型链prototype介绍
简便快捷：在PHP服务器上使用MSSQL（php服务器 mssql）
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORACLE 报错故障修复远程处理
访问者模式二
MySQL : 全表锁定的解决方案（mysql锁所有表）
牛掰！“基础-中级-高级”Java程序员面试集结，看完献出我的膝盖

zl程序教程

当前栏目

28.5K Star微软开源图像交互神器 Visual ChatGPT

使用

相关文章