weekly

GitHub All Languages Trending

The latest build: 2024-06-14Source of data: GitHubTrendingRSS

18 Lessons, Get Started Building with Generative AI https://microsoft.github.io/generative-ai-for-beginners/


Generative AI For Beginners

18 Lessons teaching everything you need to know to start building Generative AI applications

GitHub licenseGitHub contributorsGitHub issuesGitHub pull-requestsPRs Welcome

GitHub watchersGitHub forksGitHub stars

Generative AI for Beginners (Version 2) - A Course

Learn the fundamentals of building Generative AI applications with our 18-lesson comprehensive course by Microsoft Cloud Advocates.

Getting Started

This course has 18 lessons. Each lesson covers its own topic so start wherever you like!

Lessons are labeled either "Learn" lessons explaining a Generative AI concept or "Build" lessons that explain a concept and code examples in both Python and TypeScript when possible.

Each lesson also includes a "Keep Learning" section with additional learning tools.

What You Need

We have created a Course Setup lesson to help you with setting up your development environment.

Don't forget to star () this repo to find it easier later.

Ready to Deploy?

If you are looking for more advanced code samples, check out our collection of Generative AI Code Samples in both Python and TypeScript.

Meet Other Learners, Get Support

Join our official AI Discord server to meet and network with other learners taking this course and get support.

Building a Startup?

Sign up for Microsoft for Startups Founders Hub to receive free OpenAI credits and up to $150k towards Azure credits to access OpenAI models through Azure OpenAI Services.

Want to help?

Do you have suggestions or found spelling or code errors? Raise an issue or Create a pull request

Each lesson includes:

  • A short video introduction to the topic
  • A written lesson located in the README
  • Python and TypeScript code samples supporting Azure OpenAI and OpenAI API
  • Links to extra resources to continue your learning

Lessons

Lesson LinkDescriptionAdditional Learning
00Course SetupLearn: How to Setup Your Development EnvironmentLearn More
01Introduction to Generative AI and LLMsLearn: Understanding what Generative AI is and how Large Language Models (LLMs) work.Learn More
02Exploring and comparing different LLMsLearn: How to select the right model for your use caseLearn More
03Using Generative AI ResponsiblyLearn: How to build Generative AI Applications responsiblyLearn More
04Understanding Prompt Engineering FundamentalsLearn: Hands-on Prompt Engineering Best PracticesLearn More
05Creating Advanced PromptsLearn: How to apply prompt engineering techniques that improve the outcome of your prompts.Learn More
06Building Text Generation ApplicationsBuild: A text generation app using Azure OpenAILearn More
07Building Chat ApplicationsBuild: Techniques for efficiently building and integrating chat applications.Learn More
08Building Search Apps Vector DatabasesBuild: A search application that uses Embeddings to search for data.Learn More
09Building Image Generation ApplicationsBuild: A image generation applicationLearn More
10Building Low Code AI ApplicationsBuild: A Generative AI application using Low Code toolsLearn More
11Integrating External Applications with Function CallingBuild: What is function calling and its use cases for applicationsLearn More
12Designing UX for AI ApplicationsLearn: How to apply UX design principles when developing Generative AI ApplicationsLearn More
13Securing Your Generative AI ApplicationsLearn: The threats and risks to AI systems and methods to secure these systems.Learn More
14The Generative AI Application LifecycleLearn: The tools and metrics to manage the LLM Lifecycle and LLMOpsLearn More
15Retrieval Augmented Generation (RAG) and Vector DatabasesBuild: An application using a RAG Framework to retrieve embeddings from a Vector DatabasesLearn More
16Open Source Models and Hugging FaceBuild: An application using open source models available on Hugging FaceLearn More
17AI AgentsBuild: An application using an AI Agent FrameworkLearn More
18Fine-Tuning LLMsLearn: The what, why and how of fine-tuning LLMsLearn More

Special thanks

Special thanks to John Aziz for creating all of the GitHub Actions and workflows

Other Courses

Our team produces other courses! Check out:

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone


A GPT-4V Level Multimodal LLM on Your Phone

| English

Join our WeChat

MiniCPM-Llama3-V 2.5 | MiniCPM-V 2.0 | Technical Blog

MiniCPM-V is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image and text as inputs and provide high-quality text outputs. Since February 2024, we have released 4 versions of the model, aiming to achieve strong performance and efficient deployment. The most notable models in this series currently include:

  • MiniCPM-Llama3-V 2.5: The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance. Equipped with the enhanced OCR and instruction-following capability, the model can also support multimodal conversation for over 30 languages including English, Chinese, French, Spanish, German etc. With help of quantization, compilation optimizations, and several efficient inference techniques on CPUs and NPUs, MiniCPM-Llama3-V 2.5 can be efficiently deployed on end-side devices.

  • MiniCPM-V 2.0: The lightest model in the MiniCPM-V series. With 2B parameters, it surpasses larger models such as Yi-VL 34B, CogVLM-Chat 17B, and Qwen-VL-Chat 10B in overall performance. It can accept image inputs of any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving comparable performance with Gemini Pro in understanding scene-text and matches GPT-4V in low hallucination rates.

News

Pinned

  • [2024.05.28] MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code of our provided forks (llama.cpp, ollama). GGUF models in various sizes are available here. MiniCPM-Llama3-V 2.5 series is not supported by the official repositories yet, and we are working hard to merge PRs. Please stay tuned!
  • [2024.05.28] We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics here.
  • [2024.05.23] We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency . Click here to view more details.
  • [2024.05.23] MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradios official account, is available here. Come and try it out!

  • [2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this link.
  • [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it here!
  • [2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
  • [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!
  • [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click here to view more details.
  • [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at here!
  • [2024.04.17] MiniCPM-V-2.0 supports deploying WebUI Demo now!
  • [2024.04.15] MiniCPM-V-2.0 now also supports fine-tuning with the SWIFT framework!
  • [2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. Click here to view the MiniCPM-V 2.0 technical blog.
  • [2024.03.14] MiniCPM-V now supports fine-tuning with the SWIFT framework. Thanks to Jintao for the contribution
  • [2024.03.01] MiniCPM-V now can be deployed on Mac!
  • [2024.02.01] We open-source MiniCPM-V and OmniLMM-12B, which support efficient end-side deployment and powerful multimodal capabilities correspondingly.

Contents

MiniCPM-Llama3-V 2.5

MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:

  • Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.

  • Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving a 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.

  • Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technique in the RLHF-V [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves a 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released.

  • Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM, MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean etc.All Supported Languages.

  • Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations, achieving high-efficiency deployment on end-side devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150x acceleration in end-side MLLM image encoding and a 3x speedup in language decoding.

  • Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output, (5) quick local WebUI demo setup with Gradio and Streamlit, and (6) interactive demos on HuggingFace Spaces.

Evaluation

Click to view results on TextVQA, DocVQA, OCRBench, OpenCompass, MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.
ModelSizeOCRBenchTextVQA valDocVQA testOpen-CompassMMEMMB test (en)MMB test (cn)MMMU valMath-VistaLLaVA BenchRealWorld QAObject HalBench
Proprietary
Gemini Pro-68074.688.162.92148.973.674.348.945.879.960.4-
GPT-4V (2023.11.06)-64578.088.463.51771.577.074.453.847.893.163.086.4
Open-source
Mini-Gemini2.2B-56.234.2*-1653.0--31.7----
Qwen-VL-Chat9.6B48861.562.651.61860.061.856.337.033.867.749.356.2
DeepSeek-VL-7B7.3B43564.7*47.0*54.61765.473.871.438.336.877.854.2-
Yi-VL-34B34B29043.4*16.9*52.22050.272.470.745.130.762.354.879.3
CogVLM-Chat17.4B59070.433.3*54.21736.665.855.937.334.773.960.373.6
TextMonkey9.7B55864.366.7---------
Idefics28.0B-73.074.057.21847.675.768.645.252.249.160.7-
Bunny-LLama-3-8B8.4B---54.31920.377.073.941.331.561.258.8-
LLaVA-NeXT Llama-3-8B8.4B--78.2-1971.5--41.737.580.160.0-
Phi-3-vision-128k-instruct4.2B639*70.9--1537.5*--40.444.564.2*58.8*-
MiniCPM-V 1.02.8B36660.638.247.51650.264.162.638.328.951.351.278.4
MiniCPM-V 2.02.8B60574.171.954.51808.669.166.538.238.769.255.885.5
MiniCPM-Llama3-V 2.58.5B72576.684.865.12024.677.274.245.854.386.763.589.7
* We evaluate the officially released checkpoint by ourselves.

Evaluation results of multilingual LLaVA Bench

Examples

We deploy MiniCPM-Llama3-V 2.5 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.

MiniCPM-V 2.0

Click to view more details of MiniCPM-V 2.0

MiniCPM-V 2.0 is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and MiniCPM-2.4B, connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features.

  • State-of-the-art Performance.

    MiniCPM-V 2.0 achieves state-of-the-art performance on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. Notably, MiniCPM-V 2.0 shows strong OCR capability, achieving comparable performance to Gemini Pro in scene-text understanding, and state-of-the-art performance on OCRBench among open-source models.

  • Trustworthy Behavior.

    LMMs are known for suffering from hallucination, often generating text not factually grounded in images. MiniCPM-V 2.0 is the first end-side LMM aligned via multimodal RLHF for trustworthy behavior (using the recent RLHF-V [CVPR'24] series technique). This allows the model to match GPT-4V in preventing hallucinations on Object HalBench.

  • High-Resolution Images at Any Aspect Raito.

    MiniCPM-V 2.0 can accept 1.8 million pixels (e.g., 1344x1344) images at any aspect ratio. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from LLaVA-UHD.

  • High Efficiency.

    MiniCPM-V 2.0 can be efficiently deployed on most GPU cards and personal computers, and even on end devices such as mobile phones. For visual encoding, we compress the image representations into much fewer tokens via a perceiver resampler. This allows MiniCPM-V 2.0 to operate with favorable memory cost and speed during inference even when dealing with high-resolution images.

  • Bilingual Support.

    MiniCPM-V 2.0 supports strong bilingual multimodal capabilities in both English and Chinese. This is enabled by generalizing multimodal capabilities across languages, a technique from VisCPM [ICLR'24].

Examples

We deploy MiniCPM-V 2.0 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.

Legacy Models

ModelIntroduction and Guidance
MiniCPM-V 1.0Document
OmniLMM-12BDocument

Chat with Our Demo on Gradio

We provide online and local demos powered by HuggingFace Gradio, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts, and other useful features.

Online Demo

Click here to try out the online demo of MiniCPM-Llama3-V 2.5MiniCPM-V 2.0 on HuggingFace Spaces.

Local WebUI Demo

You can easily build your own local WebUI demo with Gradio using the following commands.

pip install -r requirements.txt
# For NVIDIA GPUs, run:python web_demo_2.5.py --device cuda# For Mac with MPS (Apple silicon or AMD GPUs), run:PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps

Install

  1. Clone this repository and navigate to the source folder
git clone https://github.com/OpenBMB/MiniCPM-V.gitcd MiniCPM-V
  1. Create conda environment
conda create -n MiniCPM-V python=3.10 -yconda activate MiniCPM-V
  1. Install dependencies
pip install -r requirements.txt

Inference

Model Zoo

ModelDeviceMemory DescriptionDownload
MiniCPM-Llama3-V 2.5GPU19 GBThe lastest version, achieving state-of-the end-side multimodal performance.   
MiniCPM-Llama3-V 2.5 ggufCPU5 GBThe gguf version, lower memory usage and faster inference.   
MiniCPM-Llama3-V 2.5 int4GPU8 GBThe int4 quantized versionlower GPU memory usage.   
MiniCPM-V 2.0GPU8 GBLight version, balance the performance the computation cost.   
MiniCPM-V 1.0GPU7 GBLightest version, achieving the fastest inference.   

Multi-turn Conversation

Please refer to the following codes to run.

from chat import MiniCPMVChat, img2base64import torchimport jsontorch.manual_seed(0)chat_model = MiniCPMVChat('openbmb/MiniCPM-Llama3-V-2_5')im_64 = img2base64('./assets/airplane.jpeg')# First round chat msgs = [{"role": "user", "content": "Tell me the model of this aircraft."}]inputs = {"image": im_64, "question": json.dumps(msgs)}answer = chat_model.chat(inputs)print(answer)# Second round chat # pass history context of multi-turn conversationmsgs.append({"role": "assistant", "content": answer})msgs.append({"role": "user", "content": "Introduce something about Airbus A380."})inputs = {"image": im_64, "question": json.dumps(msgs)}answer = chat_model.chat(inputs)print(answer)

You will get the following output:

"The aircraft in the image is an Airbus A380, which can be identified by its large size, double-deck structure, and the distinctive shape of its wings and engines. The A380 is a wide-body aircraft known for being the world's largest passenger airliner, designed for long-haul flights. It has four engines, which are characteristic of large commercial aircraft. The registration number on the aircraft can also provide specific information about the model if looked up in an aviation database.""The Airbus A380 is a double-deck, wide-body, four-engine jet airliner made by Airbus. It is the world's largest passenger airliner and is known for its long-haul capabilities. The aircraft was developed to improve efficiency and comfort for passengers traveling over long distances. It has two full-length passenger decks, which can accommodate more passengers than a typical single-aisle airplane. The A380 has been operated by airlines such as Lufthansa, Singapore Airlines, and Emirates, among others. It is widely recognized for its unique design and significant impact on the aviation industry."

Inference on Mac

Click to view an example, to run MiniCPM-Llama3-V 2.5 on Mac with MPS (Apple silicon or AMD GPUs).
# test.py Need more than 16GB memory.import torchfrom PIL import Imagefrom transformers import AutoModel, AutoTokenizermodel = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, low_cpu_mem_usage=True)model = model.to(device='mps')tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)model.eval()image = Image.open('./assets/hk_OCR.jpg').convert('RGB')question = 'Where is this photo taken?'msgs = [{'role': 'user', 'content': question}]answer, context, _ = model.chat( image=image, msgs=msgs, context=None, tokenizer=tokenizer, sampling=True)print(answer)

Run with command:

PYTORCH_ENABLE_MPS_FALLBACK=1 python test.py

Deployment on Mobile Phone

MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0 can be deployed on mobile phones with Android operating systems. Click MiniCPM-Llama3-V 2.5 / MiniCPM-V 2.0 to install apk.

Inference with llama.cpp

MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of llama.cpp for more detail. This implementation supports smooth inference of 6~8 token/s on mobile phones (test environmentXiaomi 14 pro + Snapdragon 8 Gen 3).

Inference with vLLM

Click to see how to inference MiniCPM-V 2.0 with vLLM (MiniCPM-Llama3-V 2.5 coming soon) Because our pull request to vLLM is still waiting for reviewing, we fork this repository to build and test our vLLM demo. Here are the steps:
  1. Clone our version of vLLM:
git clone https://github.com/OpenBMB/vllm.git
  1. Install vLLM:
cd vllmpip install -e .
  1. Install timm:
pip install timm==0.9.10
  1. Run our demo:
python examples/minicpmv_example.py 

Fine-tuning

Simple Fine-tuning

We support simple fine-tuning with Hugging Face for MiniCPM-V 2.0 and MiniCPM-Llama3-V 2.5.

Reference Document

With the SWIFT Framework

We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO.

Best PracticesMiniCPM-V 1.0, MiniCPM-V 2.0

TODO

  • MiniCPM-V fine-tuning support
  • Code release for real-time interactive assistant

Model License

  • This repository is released under the Apache-2.0 License.

  • The usage of MiniCPM-V model weights must strictly follow MiniCPM Model License.md.

  • The models and weights of MiniCPM are completely free for academic research. after filling out a "questionnaire" for registration, are also available for free commercial use.

Statement

As LMMs, MiniCPM-V models (including OmniLMM) generate contents by learning a large amount of multimodal corpora, but they cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V models does not represent the views and positions of the model developers

We will not be liable for any problems arising from the use of MiniCPMV-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.

Institutions

This project is developed by the following institutions:

Other Multimodal Projects from Our Team

Welcome to explore other multimodal projects of our team:

VisCPM | RLHF-V | LLaVA-UHD | RLAIF-V

Star History

Citation

If you find our model/code/paper helpful, please consider cite our papers and star us

@article{yu2023rlhf, title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback}, author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others}, journal={arXiv preprint arXiv:2312.00849}, year={2023}}@article{viscpm, title={Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages}, author={Jinyi Hu and Yuan Yao and Chongyi Wang and Shan Wang and Yinxu Pan and Qianyu Chen and Tianyu Yu and Hanghao Wu and Yue Zhao and Haoye Zhang and Xu Han and Yankai Lin and Jiao Xue and Dahai Li and Zhiyuan Liu and Maosong Sun}, journal={arXiv preprint arXiv:2308.12038}, year={2023}}@article{xu2024llava-uhd, title={{LLaVA-UHD}: an LMM Perceiving Any Aspect Ratio and High-Resolution Images}, author={Xu, Ruyi and Yao, Yuan and Guo, Zonghao and Cui, Junbo and Ni, Zanlin and Ge, Chunjiang and Chua, Tat-Seng and Liu, Zhiyuan and Huang, Gao}, journal={arXiv preprint arXiv:2403.11703}, year={2024}}@article{yu2024rlaifv, title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong}, journal={arXiv preprint arXiv:2405.17220}, year={2024}}

The Web OS! Free, Open-Source, and Self-Hostable.


Puter.com, The Personal Cloud Computer: All your files, apps, and games in one place accessible from anywhere at any time.

The Internet OS! Free, Open-Source, and Self-Hostable.

« LIVE DEMO »

Puter.com · SDK · Discord · Reddit · X (Twitter) · Bug Bounty

screenshot


Puter

Puter is an advanced, open-source internet operating system designed to be feature-rich, exceptionally fast, and highly extensible. It can be used to build remote desktop environments or serve as an interface for cloud storage services, remote servers, web hosting platforms, and more.


Getting Started

After reading this section, please proceed to Self-Hosting and Configuration below. Read these instructions carefully or you may see errors due to an invalid setup.

Local Development

git clone https://github.com/HeyPuter/putercd putercp .env.example .envnpm installnpm start

This will launch Puter at http://localhost:4000 (or the next available port).


Using Docker

note: it is not necessary to run this within a clone of this repository. For contributors, it is recommended to use the Local Development instructions.

mkdir puter && cd puter && mkdir -p puter/config puter/data && sudo chown -R 1000:1000 puter && docker run --rm -p 4100:4100 -v `pwd`/puter/config:/etc/puter -v `pwd`/puter/data:/var/puter ghcr.io/heyputer/puter

Using Docker Compose

note: it is not necessary to run this within a clone of this repository. For contributors, it is recommended to use the Local Development instructions.

mkdir -p puter/config puter/datasudo chown -R 1000:1000 puterwget https://raw.githubusercontent.com/HeyPuter/puter/main/docker-compose.ymldocker compose up

See Configuration for next steps.


[!WARNING] The self-hosted version of Puter is currently in alpha stage and should not be used in production yet. It is under active development and may contain bugs, other issues. Please exercise caution and use it for testing and evaluation purposes only.

Self-Hosting Differences

Currently, the self-hosted version of Puter is different in a few ways from Puter.com:

  • There is no built-in way to install or create other apps (see below)
  • Several "core" apps are missing, such as Code or Draw, because we can't include them in this repository
  • Some icons are different

Work is ongoing to improve the App Center and make it available on self-hosted. Until then, it's possible to add other apps by manually editing the database file. This process is not recommended unless you know what you are doing. The file will appear after you first launch Puter, and should be found in puter/data/puter-database.sqlite for Docker, or volatile/runtime/puter-database.sqlite otherwise. You will need a database tool that can understand SQLite databases.


Configuration

Running the server will generate a configuration file in one of these locations:

  • config/config.json when Using Docker
  • volatile/config/config.json in Local Development
  • /etc/puter/config.json on a server (or within a Docker container)

Domain Name

To access Puter on your device, you can simply go to the address printed in the server console (usually puter.localhost:4100).

To access Puter from another device, a domain name must be configured, as well as an api subdomain. For example, example.local might be the domain name pointing to the IP address of the server running puter, and api.example.com must point to this address as well. This domain must be specified in the configuration file (usually volatile/config/config.json) as well.

See domain configuration for more information.

Configure the Port

  • You can specify a custom port by setting http_port to a desired value
  • If you're using a reverse-proxy such as nginx or cloudflare, you should also set pub_port to the public (external) port (usually 443)
  • If you have HTTPS enabled on your reverse-proxy, ensure that protocol in config.json is set accordingly

Default User

By default, Puter will create a user called default_user. This user will have a randomly generated password, which will be printed in the development console. A warning will persist in the dev console until this user's password is changed. Please login to this user and change the password as your first step.


License

This repository is licensed under AGPL-3.0; However, our SDK (puter.js) is also available under Apache 2.0, as indicated by the license file in that section (packages/puter-js) of this repository.


FAQ

What's the use case for Puter?

Puter can be used as:

  • An alternative to Dropbox, Google Drive, OneDrive, etc. with a fresh interface and powerful features.
  • Remote desktop environment for servers and workstations.
  • A platform for building and hosting websites, web apps, and games.
  • A friendly, open-source project and community to learn about web development, cloud computing, distributed systems, and much more!

Why isn't Puter built with React, Angular, Vue, etc.?

For performance reasons, Puter is built with vanilla JavaScript and jQuery. Additionally, we'd like to avoid complex abstractions and to remain in control of the entire stack, as much as possible.

Also partly inspired by some of our favorite projects that are not built with frameworks: VSCode, Photopea, and OnlyOffice.


Why jQuery?

Puter interacts directly with the DOM and jQuery provides an elegant yet powerful API to manipulate the DOM, handle events, and much more. It's also fast, mature, and battle-tested.


#DoesItRunPuter


Credits

The default wallpaper is created by Milad Fakurian and published on Unsplash.

Icons by Papirus under GPL-3.0 license.

Icons by Iconoir under MIT license.

Icons by Elementary Icons under GPL-3.0 license.

Icons by Tabler Icons under MIT license.

Icons by bootstrap-icons under MIT license.