monthly

GitHub All Languages Trending

The latest build: 2024-05-30Source of data: GitHubTrendingRSS

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI


Perplexica - An AI-powered search engine

preview

Table of Contents

Overview

Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers. Inspired by Perplexity AI, it's an open-source option that not just searches the web but understands your questions. It uses advanced machine learning algorithms like similarity searching and embeddings to refine results and provides clear answers with sources cited.

Using SearxNG to stay current and fully open source, Perplexica ensures you always get the most up-to-date information without compromising your privacy.

Want to know more about its architecture and how it works? You can read it here.

Preview

video-preview

Features

  • Local LLMs: You can make use local LLMs such as Llama3 and Mixtral using Ollama.
  • Two Main Modes:
    • Copilot Mode: (In development) Boosts search by generating different queries to find more relevant internet sources. Like normal search instead of just using the context by SearxNG, it visits the top matches and tries to find relevant sources to the user's query directly from the page.
    • Normal Mode: Processes your query and performs a web search.
  • Focus Modes: Special modes to better answer specific types of questions. Perplexica currently has 6 focus modes:
    • All Mode: Searches the entire web to find the best results.
    • Writing Assistant Mode: Helpful for writing tasks that does not require searching the web.
    • Academic Search Mode: Finds articles and papers, ideal for academic research.
    • YouTube Search Mode: Finds YouTube videos based on the search query.
    • Wolfram Alpha Search Mode: Answers queries that need calculations or data analysis using Wolfram Alpha.
    • Reddit Search Mode: Searches Reddit for discussions and opinions related to the query.
  • Current Information: Some search tools might give you outdated info because they use data from crawling bots and convert them into embeddings and store them in a index. Unlike them, Perplexica uses SearxNG, a metasearch engine to get the results and rerank and get the most relevant source out of it, ensuring you always get the latest information without the overhead of daily data updates.

It has many more features like image and video search. Some of the planned features are mentioned in upcoming features.

Installation

There are mainly 2 ways of installing Perplexica - With Docker, Without Docker. Using Docker is highly recommended.

Getting Started with Docker (Recommended)

  1. Ensure Docker is installed and running on your system.

  2. Clone the Perplexica repository:

    git clone https://github.com/ItzCrazyKns/Perplexica.git
  3. After cloning, navigate to the directory containing the project files.

  4. Rename the sample.config.toml file to config.toml. For Docker setups, you need only fill in the following fields:

    • OPENAI: Your OpenAI API key. You only need to fill this if you wish to use OpenAI's models.

    • OLLAMA: Your Ollama API URL. You should enter it as http://host.docker.internal:PORT_NUMBER. If you installed Ollama on port 11434, use http://host.docker.internal:11434. For other ports, adjust accordingly. You need to fill this if you wish to use Ollama's models instead of OpenAI's.

    • GROQ: Your Groq API key. You only need to fill this if you wish to use Groq's hosted models

      Note: You can change these after starting Perplexica from the settings dialog.

    • SIMILARITY_MEASURE: The similarity measure to use (This is filled by default; you can leave it as is if you are unsure about it.)

  5. Ensure you are in the directory containing the docker-compose.yaml file and execute:

    docker compose up -d
  6. Wait a few minutes for the setup to complete. You can access Perplexica at http://localhost:3000 in your web browser.

Note: After the containers are built, you can start Perplexica directly from Docker without having to open a terminal.

Non-Docker Installation

  1. Clone the repository and rename the sample.config.toml file to config.toml in the root directory. Ensure you complete all required fields in this file.
  2. Rename the .env.example file to .env in the ui folder and fill in all necessary fields.
  3. After populating the configuration and environment files, run npm i in both the ui folder and the root directory.
  4. Install the dependencies and then execute npm run build in both the ui folder and the root directory.
  5. Finally, start both the frontend and the backend by running npm run start in both the ui folder and the root directory.

Note: Using Docker is recommended as it simplifies the setup process, especially for managing environment variables and dependencies.

See the installation documentation for more information like exposing it your network, etc.

Ollama connection errors

If you're facing an Ollama connection error, it is often related to the backend not being able to connect to Ollama's API. How can you fix it? You can fix it by updating your Ollama API URL in the settings menu to the following:

On Windows: http://host.docker.internal:11434
On Mac: http://host.docker.internal:11434
On Linux: http://private_ip_of_computer_hosting_ollama:11434

You need to edit the ports accordingly.

Using as a Search Engine

If you wish to use Perplexica as an alternative to traditional search engines like Google or Bing, or if you want to add a shortcut for quick access from your browser's search bar, follow these steps:

  1. Open your browser's settings.
  2. Navigate to the 'Search Engines' section.
  3. Add a new site search with the following URL: http://localhost:3000/?q=%s. Replace localhost with your IP address or domain name, and 3000 with the port number if Perplexica is not hosted locally.
  4. Click the add button. Now, you can use Perplexica directly from your browser's search bar.

One-Click Deployment

Deploy to RepoCloud

Upcoming Features

  • Finalizing Copilot Mode
  • Add settings page
  • Adding support for local LLMs
  • Adding Discover and History Saving features
  • Introducing various Focus Modes

Support Us

If you find Perplexica useful, consider giving us a star on GitHub. This helps more people discover Perplexica and supports the development of new features. Your support is greatly appreciated.

Donations

We also accept donations to help sustain our project. If you would like to contribute, you can use the following button to make a donation in cryptocurrency. Thank you for your support!

Crypto donation button by NOWPayments

Contribution

Perplexica is built on the idea that AI and large language models should be easy for everyone to use. If you find bugs or have ideas, please share them in via GitHub Issues. For more information on contributing to Perplexica you can read the CONTRIBUTING.md file to learn more about Perplexica and how you can contribute to it.

Help and Support

If you have any questions or feedback, please feel free to reach out to us. You can create an issue on GitHub or join our Discord server. There, you can connect with other users, share your experiences and reviews, and receive more personalized help. Click here to join the Discord server. To discuss matters outside of regular support, feel free to contact me on Discord at itzcrazykns.

Thank you for exploring Perplexica, the AI-powered search engine designed to enhance your search experience. We are constantly working to improve Perplexica and expand its capabilities. We value your feedback and contributions which help us make Perplexica even better. Don't forget to check back for updates and new features!

A browser based Pokémon fangame heavily inspired by the roguelite genre.


PokéRogue

PokéRogue is a browser based Pokémon fangame heavily inspired by the roguelite genre. Battle endlessly while gathering stacking items, exploring many different biomes, fighting trainers, bosses, and more!

Contributing

Development

If you have the motivation and experience with Typescript/Javascript (or are willing to learn) please feel free to fork the repository and make pull requests with contributions. If you don't know what to work on but want to help, reference the below To-Do section or the #feature-vote channel in the discord.

Environment Setup

Prerequisites

Running Locally

  1. Clone the repo and in the root directory run npm install
    • if you run into any errors, reach out in the #dev-corner channel in discord
  2. Run npm run start:dev to locally run the project in localhost:8000

Linting

We're using ESLint as our common linter and formatter. It will run automatically during the pre-commit hook but if you would like to manually run it, use the npm run eslint script.

FAQ

How do I test a new _______?

  • In the src/overrides.ts file there are overrides for most values you'll need to change for testing

To Do

Check out Github Issues to see how can you help us!

Credits

If this project contains assets you have produced and you do not see your name here, please reach out.

BGM

  • Pokémon Mystery Dungeon: Explorers of Sky
    • Arata Iiyoshi
    • Hideki Sakamoto
    • Keisuke Ito
    • Ken-ichi Saito
    • Yoshihiro Maeda
  • Pokémon Black/White
    • Go Ichinose
    • Hitomi Sato
    • Shota Kageyama
  • Pokémon Mystery Dungeon: Rescue Team DX
    • Keisuke Ito
    • Arata Iiyoshi
    • Atsuhiro Ishizuna
  • Pokémon Black/White 2
  • Firel (Custom Metropolis and Laboratory biome music)
  • Lmz (Custom Jungle biome music)

Sound Effects

  • Pokémon Emerald
  • Pokémon Black/White

Backgrounds

  • Squip (Paid Commissions)
  • Contributions by Someonealive-QN

UI

  • GAMEFREAK
  • LJ Birdman

Pagefault Games Intro

  • Spectremint

Game Logo

  • Gonstar (Paid Commission)

Trainer Sprites

  • GAMEFREAK (Pokémon Black/White 2, Pokémon Diamond/Pearl)
  • kyledove
  • Brumirage
  • pkmn_realidea (Paid Commissions)

Trainer Portraits

  • pkmn_realidea (Paid Commissions)

Pokemon Sprites and Animation

  • GAMEFREAK (Pokémon Black/White 2)
  • Smogon Sprite Project (Various Artists)
  • Skyflyer
  • Nolo33
  • Ebaru
  • EricLostie
  • KingOfThe-X-Roads
  • kiriaura
  • Caruban
  • Sopita_Yorita
  • Azrita
  • AshnixsLaw
  • Hellfire0raptor
  • RetroNC
  • Franark122k
  • OldSoulja
  • PKMarioG
  • ItsYugen
  • lucasomi
  • Pkm Sinfonia
  • Poki Papillon
  • Fleimer_
  • bizcoeindoloro
  • mangalos810
  • Involuntary-Twitch
  • selstar

Move Animations

  • Pokémon Reborn

Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.


Firecrawl

Crawl and convert any website into LLM-ready markdown. Built by Mendable.ai and the firecrawl community.

This repository is in its early development stages. We are still merging custom modules in the mono repo. It's not completely yet ready for full self-host deployment, but you can already run it locally.

What is Firecrawl?

Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required.

Pst. hey, you, join our stargazers :)

How to use it?

We provide an easy to use API with our hosted version. You can find the playground and documentation here. You can also self host the backend if you'd like.

To run locally, refer to guide here.

API Key

To use the API, you need to sign up on Firecrawl and get an API key.

Crawling

Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.

curl -X POST https://api.firecrawl.dev/v0/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://mendable.ai" }'

Returns a jobId

{ "jobId": "1234-5678-9101" }

Check Crawl Job

Used to check the status of a crawl job and get its result.

curl -X GET https://api.firecrawl.dev/v0/crawl/status/1234-5678-9101 \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY'
{ "status": "completed", "current": 22, "total": 22, "data": [ { "content": "Raw Content ", "markdown": "# Markdown Content", "provider": "web-scraper", "metadata": { "title": "Mendable | AI for CX and Sales", "description": "AI for CX and Sales", "language": null, "sourceURL": "https://www.mendable.ai/" } } ]}

Scraping

Used to scrape a URL and get its content.

curl -X POST https://api.firecrawl.dev/v0/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://mendable.ai" }'

Response:

{ "success": true, "data": { "content": "Raw Content ", "markdown": "# Markdown Content", "provider": "web-scraper", "metadata": { "title": "Mendable | AI for CX and Sales", "description": "AI for CX and Sales", "language": null, "sourceURL": "https://www.mendable.ai/" } }}

Search (Beta)

Used to search the web, get the most relevant results, scrape each page and return the markdown.

curl -X POST https://api.firecrawl.dev/v0/search \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "query": "firecrawl", "pageOptions": { "fetchPageContent": true // false for a fast serp api } }'
{ "success": true, "data": [ { "url": "https://mendable.ai", "markdown": "# Markdown Content", "provider": "web-scraper", "metadata": { "title": "Mendable | AI for CX and Sales", "description": "AI for CX and Sales", "language": null, "sourceURL": "https://www.mendable.ai/" } } ]}

Intelligent Extraction (Beta)

Used to extract structured data from scraped pages.

curl -X POST https://api.firecrawl.dev/v0/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://www.mendable.ai/", "extractorOptions": { "mode": "llm-extraction", "extractionPrompt": "Based on the information on the page, extract the information from the schema. ", "extractionSchema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } }'
{ "success": true, "data": { "content": "Raw Content", "metadata": { "title": "Mendable", "description": "Mendable allows you to easily build AI chat applications. Ingest, customize, then deploy with one line of code anywhere you want. Brought to you by SideGuide", "robots": "follow, index", "ogTitle": "Mendable", "ogDescription": "Mendable allows you to easily build AI chat applications. Ingest, customize, then deploy with one line of code anywhere you want. Brought to you by SideGuide", "ogUrl": "https://mendable.ai/", "ogImage": "https://mendable.ai/mendable_new_og1.png", "ogLocaleAlternate": [], "ogSiteName": "Mendable", "sourceURL": "https://mendable.ai/" }, "llm_extraction": { "company_mission": "Train a secure AI on your technical resources that answers customer and employee questions so your team doesn't have to", "supports_sso": true, "is_open_source": false, "is_in_yc": true } }}

Using Python SDK

Installing Python SDK

pip install firecrawl-py

Crawl a website

from firecrawl import FirecrawlAppapp = FirecrawlApp(api_key="YOUR_API_KEY")crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*']}})# Get the markdownfor result in crawl_result: print(result['markdown'])

Scraping a URL

To scrape a single URL, use the scrape_url method. It takes the URL as a parameter and returns the scraped data as a dictionary.

url = 'https://example.com'scraped_data = app.scrape_url(url)

Extracting structured data from a URL

With LLM extraction, you can easily extract structured data from any URL. We support pydanti schemas to make it easier for you too. Here is how you to use it:

class ArticleSchema(BaseModel): title: str points: int by: str commentsURL: strclass TopArticlesSchema(BaseModel): top: List[ArticleSchema] = Field(..., max_items=5, description="Top 5 stories")data = app.scrape_url('https://news.ycombinator.com', { 'extractorOptions': { 'extractionSchema': TopArticlesSchema.model_json_schema(), 'mode': 'llm-extraction' }, 'pageOptions':{ 'onlyMainContent': True }})print(data["llm_extraction"])

Search for a query

Performs a web search, retrieve the top results, extract data from each page, and returns their markdown.

query = 'What is Mendable?'search_result = app.search(query)

Using the Node SDK

Installation

To install the Firecrawl Node SDK, you can use npm:

npm install @mendable/firecrawl-js

Usage

  1. Get an API key from firecrawl.dev
  2. Set the API key as an environment variable named FIRECRAWL_API_KEY or pass it as a parameter to the FirecrawlApp class.

Scraping a URL

To scrape a single URL with error handling, use the scrapeUrl method. It takes the URL as a parameter and returns the scraped data as a dictionary.

try { const url = 'https://example.com'; const scrapedData = await app.scrapeUrl(url); console.log(scrapedData);} catch (error) { console.error( 'Error occurred while scraping:', error.message );}

Crawling a Website

To crawl a website with error handling, use the crawlUrl method. It takes the starting URL and optional parameters as arguments. The params argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.

const crawlUrl = 'https://example.com';const params = { crawlerOptions: { excludes: ['blog/'], includes: [], // leave empty for all pages limit: 1000, }, pageOptions: { onlyMainContent: true }};const waitUntilDone = true;const timeout = 5;const crawlResult = await app.crawlUrl( crawlUrl, params, waitUntilDone, timeout);

Checking Crawl Status

To check the status of a crawl job with error handling, use the checkCrawlStatus method. It takes the job ID as a parameter and returns the current status of the crawl job.

const status = await app.checkCrawlStatus(jobId);console.log(status);

Extracting structured data from a URL

With LLM extraction, you can easily extract structured data from any URL. We support zod schema to make it easier for you too. Here is how you to use it:

import FirecrawlApp from "@mendable/firecrawl-js";import { z } from "zod";const app = new FirecrawlApp({ apiKey: "fc-YOUR_API_KEY",});// Define schema to extract contents intoconst schema = z.object({ top: z .array( z.object({ title: z.string(), points: z.number(), by: z.string(), commentsURL: z.string(), }) ) .length(5) .describe("Top 5 stories on Hacker News"),});const scrapeResult = await app.scrapeUrl("https://news.ycombinator.com", { extractorOptions: { extractionSchema: schema },});console.log(scrapeResult.data["llm_extraction"]);

Search for a query

With the search method, you can search for a query in a search engine and get the top results along with the page content for each result. The method takes the query as a parameter and returns the search results.

const query = 'what is mendable?';const searchResults = await app.search(query, { pageOptions: { fetchPageContent: true // Fetch the page content for each search result }});

Contributing

We love contributions! Please read our contributing guide before submitting a pull request.

It is the sole responsibility of the end users to respect websites' policies when scraping, searching and crawling with Firecrawl. Users are advised to adhere to the applicable privacy policies and terms of use of the websites prior to initiating any scraping activities. By default, Firecrawl respects the directives specified in the websites' robots.txt files when crawling. By utilizing Firecrawl, you expressly agree to comply with these conditions.