Which open-source GitHub repositories match “Product analytics, scraping and quality”?

firecrawl/firecrawl is the closest match — Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platfor…

Why does firecrawl/firecrawl match “Product analytics, scraping and quality”?

Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gatheri…

Why does public-apis/public-apis match “Product analytics, scraping and quality”?

This project is a community-curated directory of REST and GraphQL service endpoints designed to assist developers in discovering and integrating third-party data sources. It functions as a centralized registry where external services are organized by domain to facilitate rapid software prototyping…

Why does gleitz/howdoi match “Product analytics, scraping and quality”?

howdoi is a command-line coding answer engine that retrieves programming solutions and code snippets from the web for display directly in the terminal. It functions as a web-based code search tool that uses natural language queries to find technical answers without requiring a web browser. The too…

Why does camel-ai/camel match “Product analytics, scraping and quality”?

This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models…

Why does scrapy/scrapy match “Product analytics, scraping and quality”?

Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web…

Product analytics, scraping and quality

Tools for tracking user behavior, extracting web data, and monitoring software quality and performance metrics.

Find the best repos with AI.We'll search the best matching repositories with AI.

firecrawl/firecrawl
firecrawl/firecrawl
133,479View on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live
TypeScriptAutonomous Web AgentsAutonomous Web CrawlersAutonomous Web Researchers
View on GitHub133,479
public-apis/public-apis
public-apis/public-apis
441,986View on GitHub
This project is a community-curated directory of REST and GraphQL service endpoints designed to assist developers in discovering and integrating third-party data sources. It functions as a centralized registry where external services are organized by domain to facilitate rapid software prototyping and application development. The registry relies on a peer-reviewed contribution model, utilizing distributed version control to manage updates and ensure the accuracy of listed endpoints. To maintain high data quality, the project employs schema-based validation for all incoming submissions and com
PythonAPI DirectoriesAPI DirectoriesAPI Discovery Directories
View on GitHub441,986
gleitz/howdoi
gleitz/howdoi
10,840View on GitHub
howdoi is a command-line coding answer engine that retrieves programming solutions and code snippets from the web for display directly in the terminal. It functions as a web-based code search tool that uses natural language queries to find technical answers without requiring a web browser. The tool provides a JSON-exportable query system, allowing search results to be output as structured data for integration with other software and text editors. It features terminal-based knowledge retrieval that includes local caching and stashing of answers to reduce network latency and avoid search engine
PythonNatural Language InterfacesCLI Coding Answer EnginesCoding Assistants
View on GitHub10,840
camel-ai/camel
camel-ai/camel
17,253View on GitHub
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
PythonAgent State PersistenceAgent Tool IntegrationsAgentic LLM Frameworks
View on GitHub17,253
scrapy/scrapy
scrapy/scrapy
62,274View on GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-based selectors. The system distinguishes itself through a highly modular architecture that supports complex data collection workflows. Users can implement custom middleware and signal handlers to intercept and modify request flows, while a priority-based scheduler manages concu
PythonWeb ScrapersWeb ScrapingDistributed Crawling Engines
View on GitHub62,274
googlechrome/lighthouse
GoogleChrome/lighthouse
30,355View on GitHub
Lighthouse is an automated diagnostic tool that evaluates web pages against industry standards for performance, accessibility, and search engine optimization. It functions as a programmatic analysis engine and a command-line utility, allowing developers to integrate comprehensive web quality checks directly into continuous integration pipelines and local development workflows. The project distinguishes itself through a modular architecture that utilizes artifact-based data collection to ensure consistent analysis across different environments. It supports a headless execution mode for automat
JavaScriptBrowser AutomationStatic AnalysisAccessibility Auditing Tools
View on GitHub30,355
mixmark-io/turndown
mixmark-io/turndown
11,278View on GitHub
Turndown is a JavaScript library designed to transform HTML documents into structured Markdown. It functions as a flexible engine that parses web content by traversing the document object model and applying rule-based transformations to convert elements into their corresponding text-based syntax. The tool distinguishes itself through a modular architecture that allows for extensive customization of the conversion process. Users can define custom conversion rules to handle specific elements, implement content filtering to discard unwanted nodes, and configure character escaping to ensure outpu
HTMLHTML to Markdown Reversion ToolsMarkdown ParsersContent Format Transformers
View on GitHub11,278
dgtlmoon/changedetection.io
dgtlmoon/changedetection.io
32,027View on GitHub
Changedetection.io is a self-hosted monitoring service designed to track web pages for content updates and notify users of changes. It functions as a centralized platform where users can manage tracking tasks, observe specific website elements, and receive automated alerts through various communication channels whenever modifications are detected. The service distinguishes itself through an integrated headless browser engine that executes interaction sequences, such as logins or form submissions, to access dynamic or restricted content. It maintains a historical record of page snapshots, util
PythonWeb MonitoringInventory MonitoringInventory Tracking
View on GitHub32,027
imranr98/obtainium
ImranR98/Obtainium
17,651View on GitHub
Obtainium is an Android application manager designed to track, download, and install software updates directly from developer websites and third-party repositories. By bypassing centralized app stores, it enables users to maintain and update sideloaded applications through automated monitoring of external release sources. The application distinguishes itself through flexible source integration, allowing users to track software via direct URLs or by applying custom regex-based web scraping patterns to arbitrary web pages. It supports private repository access through configurable authenticatio
DartApplication Lifecycle ManagementApplication ManagersLifecycle Managers
View on GitHub17,651
unclecode/crawl4ai
unclecode/crawl4ai
68,644View on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs. By integrating language models directly into the extraction workflow, the system converts raw HTML into clean, structured data or Markdown files optimized for downstream ingestion. The platform distinguishes itself through a distributed, self-hosted infrastructure that manages l
PythonAutomated Web ScrapingAI-Powered Web CrawlersHeadless
View on GitHub68,644
jhy/jsoup
jhy/jsoup
11,340View on GitHub
Jsoup is a Java library designed for parsing, extracting, and manipulating HTML and XML content. It provides a document object model that represents web content as a hierarchical tree, allowing for programmatic navigation and modification of elements, attributes, and text. The library functions as a toolkit for web scraping, enabling the retrieval of remote content via standard web protocols and the management of HTTP sessions for automated form interaction. The library distinguishes itself through its fault-tolerant tokenization, which reconstructs valid document structures from malformed or
JavaHTML AllowlistsHTML Document TransformationHTML Parsers
View on GitHub11,340
gocolly/colly
gocolly/colly
25,101View on GitHub
Colly is a high-performance web scraping framework designed for the automated extraction of structured data from websites. It provides a programmable toolkit that manages the complexities of large-scale data collection, including concurrent request orchestration, automatic cookie handling, and robots.txt compliance. By utilizing an asynchronous execution model, the engine maintains high throughput while preventing resource exhaustion during recursive or distributed crawling tasks. The framework is distinguished by its modular, event-driven architecture, which allows developers to hook into sp
GoWeb Scraping EnginesWeb Scraping FrameworksConcurrent Crawling Engines
View on GitHub25,101
nickscamara/open-deep-research
nickscamara/open-deep-research
6,173View on GitHub
Open Deep Research is an AI-powered web research agent that combines a reasoning model with live web search and data extraction to perform deep, multi-source investigations on any topic. It operates through a dual interface, offering both a command-line tool and a Model Context Protocol server, allowing developers to integrate web capabilities directly into AI agents and coding assistants. The project distinguishes itself by orchestrating an iterative research loop where a reasoning model plans steps, interprets search results, and guides subsequent web interactions. It uses Firecrawl for scr
TypeScriptWeb Research AgentsAgentic Web InteractionAI Agent Capabilities
View on GitHub6,173
plausible/analytics
plausible/analytics
24,245View on GitHub
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
ElixirPrivacy-Preserving AnalyticsAnalytics ProxyingFirst-Party Collection
View on GitHub24,245
omnivore-app/omnivore
omnivore-app/omnivore
15,882View on GitHub
Omnivore is an open-source, self-hostable read-it-later application designed to centralize web articles, newsletters, and digital documents into a personal library. It functions as a comprehensive content archiver that captures web pages and stores them locally, ensuring permanent access and readability regardless of internet connectivity. The platform distinguishes itself through an event-sourced synchronization engine that maintains a consistent state across multiple devices by replaying user actions. It utilizes a headless web scraping service to extract clean text and metadata from raw we
TypeScriptRead-It-Later ApplicationsRead-It-Later PlatformsSelf-Hosted Applications
View on GitHub15,882
nanmicoder/mediacrawler
NanmiCoder/MediaCrawler
51,294View on GitHub
MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces. The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To ma
PythonWeb ScrapersWeb Scraping FrameworksBrowser Automation
View on GitHub51,294
simplifyjobs/new-grad-positions
SimplifyJobs/New-Grad-Positions
17,201View on GitHub
New-Grad-Positions is a centralized job aggregation platform designed to track and filter entry-level career opportunities for recent graduates across technical industries. The system functions as an automated career search tool, utilizing a relational database schema to organize job listings and user profiles for efficient querying. The platform distinguishes itself through integrated browser-based automation that populates online job application fields to reduce manual data entry. It further supports career search automation by monitoring new listings and triggering email alerts based on sp
Job AggregatorsJob Application AutomationCareer Discovery Platforms
View on GitHub17,201
umami-software/umami
umami-software/umami
37,285View on GitHub
Umami is a self-hosted, privacy-focused web analytics platform designed to provide full control over infrastructure and user data. It captures website traffic and visitor behavior through anonymous tracking methods that avoid cookies, browser fingerprinting, and the storage of personally identifiable information. The platform distinguishes itself through a comprehensive suite of behavioral analysis tools, including session replays, heatmaps, and cohort-based retention reporting. It features a multi-tenant architecture that allows teams to manage multiple websites within a single, collaborativ
TypeScriptPrivacy-Focused AnalyticsPrivacy-Preserving AnalyticsAnalytics Tracking
View on GitHub37,285
simstudioai/sim
simstudioai/sim
28,796View on GitHub
This project is an AI agent orchestration platform that provides a visual environment for building, testing, and deploying complex automation workflows. It functions as a low-code development interface where users can chain discrete functional blocks into dependency-aware pipelines to integrate artificial intelligence with external data and services. The platform supports the creation of intelligent conversational agents, automated business processes, and multi-service API orchestrations within a unified workspace. The platform distinguishes itself through its event-driven integration engine,
TypeScriptAutomation PlatformsAgent ConfigurationAgent Orchestration Platforms
View on GitHub28,796
naibowang/easyspider
NaiboWang/EasySpider
44,092View on GitHub
EasySpider is a no-code automation platform designed to orchestrate repetitive web interactions and data collection processes. It functions as a browser task orchestrator, providing a visual environment where users can build and execute complex workflows through point-and-click configuration rather than manual programming. The platform distinguishes itself by enabling visual web scraping design, allowing users to create data extraction tasks by interacting directly with web elements. It utilizes a headless browser engine to simulate human navigation and event-driven interactions, mapping thes
JavaScriptBrowser Task OrchestratorsNo-Code AutomationVisual Web Scraping Tools
View on GitHub44,092
dontriskit/awesome-ai-system-prompts
dontriskit/awesome-ai-system-prompts
5,206View on GitHub
This project is a comprehensive library of structured system prompts and configuration templates designed to define the behavior, persona, and operational boundaries of autonomous artificial intelligence agents. It serves as a framework for prompt engineering, providing modular instructions that help models parse complex tasks, maintain consistent interaction tones, and adhere to specific domain constraints. The repository distinguishes itself by offering specialized configurations for agent safety and security, including protocols to prevent prompt injection and unauthorized data access. It
TypeScriptSystem PromptsAwesome ListAgent Persona Definitions
View on GitHub5,206
jivoi/awesome-osint
jivoi/awesome-osint
26,831View on GitHub
This project is a comprehensive, community-curated directory of resources and methodologies for open-source intelligence gathering. It serves as a centralized reference framework for researchers, providing a structured index of specialized tools, databases, and search techniques used to collect and analyze publicly available information from across the global internet. The directory distinguishes itself through a hierarchical taxonomy that organizes complex investigative domains, ranging from cyber threat intelligence and digital forensic investigation to geospatial analysis and operational s
Awesome ListDigital Forensics ResourcesThreat Intelligence Platforms
View on GitHub26,831
apurvsinghgautam/robin
apurvsinghgautam/robin
4,238View on GitHub
Robin is an AI-powered open source intelligence framework and dark web investigation tool. It functions as a multi-model AI orchestrator that integrates search engines and web scrapers with language models to automate information gathering and data synthesis. The system utilizes a crawl-and-filter architecture to isolate high-value data from raw web content and employs a query-refinement pipeline to optimize search terms. It specifically supports dark web investigations by routing requests through proxies to access hidden services and using language models to analyze and summarize findings fr
PythonDark Web Search EnginesMulti-Model AI OrchestratorsAI Query Optimizers
View on GitHub4,238
puppeteer/puppeteer
puppeteer/puppeteer
94,811View on GitHub
Puppeteer is a browser automation library that provides a programmatic interface for controlling web browsers to execute tasks, simulate user interactions, and perform end-to-end testing. It functions as a headless browser controller, managing browser lifecycles, isolated session contexts, and remote connections to facilitate stable, automated web-based workflows. The library distinguishes itself through its deep integration with the Chrome DevTools Protocol, utilizing a bidirectional message bus to execute commands and receive real-time event notifications. It supports advanced automation pa
TypeScriptAutomated End-to-End TestingBrowser Lifecycle ManagersChrome DevTools Protocols
View on GitHub94,811
microsoft/playwright-python
microsoft/playwright-python
14,279View on GitHub
Playwright for Python is a browser automation framework designed for end-to-end testing, web scraping, and user interaction simulation. It functions as a headless browser controller that enables programmatic navigation, data extraction, and the execution of complex workflows across multiple rendering engines. The framework distinguishes itself through an actionability-aware interaction engine that automatically verifies element readiness before performing actions, significantly reducing test flakiness. It utilizes isolated browser contexts to maintain separate storage and cookies for parallel
PythonBrowser Automation FrameworksEnd-to-End TestingBrowser Automation
View on GitHub14,279
browser-use/browser-use
browser-use/browser-use
100,229View on GitHub
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions. The project distinguishes itself through its ability to translate high-level intent into
PythonAutonomous Browser AgentsAutonomous Web AgentsChrome DevTools Protocols
View on GitHub100,229
firecrawl/firecrawl-mcp-server
firecrawl/firecrawl-mcp-server
5,542View on GitHub
Firecrawl MCP Server is a Model Context Protocol tool server that exposes the full suite of Firecrawl’s web scraping, crawling, and automation capabilities as tools that large language models can invoke directly. It acts as a proxy to the Firecrawl cloud platform, which manages headless browser orchestration, async job queues, and rate limiting behind the scenes. The server distinguishes itself by packaging autonomous web agents — both a research agent that browses and collects structured data from multiple pages, and a general web agent that performs multi-step browsing and extraction tasks
JavaScriptMCP ServersAPI ProxiesAsynchronous Extraction Job Management
View on GitHub5,542
anuraghazra/github-readme-stats
anuraghazra/github-readme-stats
79,661View on GitHub
This project is a serverless service that generates dynamic, themeable visual summaries of software development activity. It functions as an automated metadata visualizer, transforming raw platform logs and repository metrics into resolution-independent vector graphics that can be embedded directly into markdown environments. The service distinguishes itself by offering highly configurable, query-parameter-driven rendering that allows users to customize the visual presentation of their coding patterns, language proficiency, and repository details. It supports both real-time generation via ser
JavaScriptGitHub Stats CardsLanguage Distribution CardsProfile Personalization Suites
View on GitHub79,661
doriandarko/claude-engineer
Doriandarko/claude-engineer
11,199View on GitHub
Claude-engineer is an autonomous software engineering agent and command-line interface for interacting with the Claude 3.5 Sonnet model. It functions as an AI code editor that writes code, manages local files, and executes terminal commands to automate technical workflows. The system features a self-evolving tool framework that allows the agent to design and implement its own functional scripts to expand its capabilities during a session. It utilizes a sandboxed Python executor to run scripts for data analysis and complex computations in a secure remote environment. The project covers a broa
PythonAutonomous AI WorkflowsDynamic Tool GenerationAI Code Editors
View on GitHub11,199
meilisearch/meilisearch
meilisearch/meilisearch
58,118View on GitHub
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
RustDeveloper-Focused Search ToolsDocument Indexing EnginesFinite State Transducers
View on GitHub58,118
lorien/web-scraping
lorien/web-scraping
7,931View on GitHub
This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats. The toolkit focuses on advanced data collection strategies, including headless browser automation to interact with JavaScript and a suite of network utilities for DNS resolution and WebSocket connections. It specifically covers methods for bypassing bot protections through proxy pool management, us
MakefileWeb CrawlingWeb CrawlingWeb Data Extraction
View on GitHub7,931
posthog/posthog
PostHog/posthog
35,060View on GitHub
PostHog is a comprehensive product analytics and feature management platform designed to capture, process, and visualize user behavior data. It provides a unified suite for tracking application events, managing feature rollouts, and monitoring system health through session recordings and error tracking. By leveraging a columnar-storage-optimized architecture, the platform enables high-performance aggregation and filtering across massive event datasets. What distinguishes PostHog is its integrated approach to data pipelines and application control. It features a robust event ingestion system t
PythonFeature Flag ManagementFeature FlaggingProduct Analytics
View on GitHub35,060
freetubeapp/freetube
FreeTubeApp/FreeTube
21,247View on GitHub
FreeTube is a privacy-focused desktop application for watching YouTube videos without ads, tracking cookies, or the requirement of a Google account. It functions as a local-first subscription manager that tracks channels and playlists in local files instead of a centralized cloud account. The application avoids tracking-heavy official APIs by using a content extractor that parses web pages directly. To further protect user identity, it can route network traffic through proxies or Tor to mask the hardware IP address. The software provides tools for distraction-free viewing, including the abil
VueLocal-First ArchitecturesPrivate Content ConsumptionAnti-Tracking Proxies
View on GitHub21,247
growinggit/github-chinese-top-charts
GrowingGit/GitHub-Chinese-Top-Charts
108,509View on GitHub
This project functions as a curated software directory and developer resource index, providing a centralized platform for discovering and evaluating high-quality open-source repositories. It serves as an aggregator that monitors trending software and educational resources, organizing them by technical domain and programming language to assist developers in identifying tools for their specific technical challenges. The directory distinguishes itself through a community-driven curation workflow, where repository lists are validated and updated based on collective developer consensus. This infor
JavaCurated Software DirectoriesCurated Resource ListsLearning Directories
View on GitHub108,509
g1879/drissionpage
g1879/DrissionPage
12,102View on GitHub
DrissionPage is a Python library designed for web automation, data scraping, and testing. It functions as a browser automation framework that communicates directly with the browser engine via the Chrome DevTools Protocol, allowing for precise control over browser instances and page states. The library distinguishes itself by providing a unified interface that combines full browser automation with raw HTTP request capabilities. This hybrid approach allows users to switch between lightweight network requests and heavy browser-based interactions within a single workflow. By wrapping asynchronous
PythonBrowser Automation FrameworksBrowser AutomationChrome DevTools Protocols
View on GitHub12,102
responsively-org/responsively-app
responsively-org/responsively-app
24,991View on GitHub
This application is a specialized web browser designed to streamline responsive design testing by rendering multiple viewport configurations simultaneously. It functions as a cross-platform testing suite that allows developers to preview and interact with web content across diverse mobile, tablet, and desktop device profiles within a single workspace. The tool distinguishes itself by synchronizing user interactions and application state across all active browser instances. When a user navigates, scrolls, or clicks in one view, these events are broadcast to every other open viewport to ensure
TypeScriptResponsive Testing ToolsTesting FrameworksJavaScript Runtimes
View on GitHub24,991
cantino/huginn
cantino/huginn
49,487View on GitHub
Huginn is an open-source automation platform that functions as an event-driven task automator and webhook integration engine. It enables the creation of agents that monitor web data and automate tasks across various web services, operating as a self-hosted web scraper and JavaScript workflow orchestrator. The system uses a directed graph of event flows to route and transform data between external APIs. It differentiates itself by allowing custom JavaScript execution within workflows to modify data payloads and by integrating human-in-the-loop automation to insert manual judgment or data entry
RubyAutomation PlatformsEvent-Driven Automation EnginesAgent-Based Modularization
View on GitHub49,487
microsoft/playwright
microsoft/playwright
91,074View on GitHub
Playwright is a comprehensive browser automation framework designed for end-to-end testing and web workflow automation. It provides a unified API to drive web applications across multiple browser engines, enabling developers to simulate complex user interactions, perform web scraping, and validate application behavior in consistent, isolated environments. The framework distinguishes itself through a web-first testing paradigm that prioritizes stability and resilience. By utilizing an auto-waiting actionability engine and accessibility-tree-based locators, it eliminates common sources of test
TypeScriptBrowser Automation FrameworksAccessibility-Tree-Based LocatorsAssertion Libraries
View on GitHub91,074
continuedev/continue
continuedev/continue
33,716View on GitHub
Continue is an automated code review platform that integrates AI agents directly into the software development lifecycle. By executing custom validation rules against pull request diffs, it provides immediate feedback through repository status checks, allowing teams to enforce quality, security, and documentation standards before manual review begins. The system distinguishes itself through a file-based configuration model where validation logic is defined in version-controlled markdown files. These files act as system prompts that guide autonomous agents in evaluating code changes. This appr
TypeScriptAgentic WorkflowsAutomated Code ReviewAI Orchestration
View on GitHub33,716
flaresolverr/flaresolverr
FlareSolverr/FlareSolverr
12,656View on GitHub
FlareSolverr is a proxy server designed to provide programmatic access to websites protected by automated security challenges and firewall restrictions. It functions by orchestrating headless browser instances to render web pages, execute JavaScript, and retrieve the necessary cookies and content required to bypass common security hurdles. The service distinguishes itself by maintaining persistent browser sessions in memory, which allows for the reuse of authenticated states across multiple requests. It integrates with external captcha resolution services to handle interactive security challe
PythonChallenge ResolutionProtection BypassersWeb Scraping and Automation
View on GitHub12,656
getsentry/sentry
getsentry/sentry
44,108View on GitHub
This project is a comprehensive software observability suite and application performance monitoring platform designed to track runtime errors, performance bottlenecks, and system health. It functions as a centralized diagnostic service that aggregates and categorizes exceptions, providing the infrastructure necessary to visualize complex execution paths across distributed systems and microservices. The platform distinguishes itself through a high-throughput distributed event ingestion pipeline and a columnar storage analytics engine that enables rapid aggregation of large-scale performance me
PythonApplication Performance MonitoringApplication Performance Monitoring PlatformsIncident Management Systems
View on GitHub44,108
kepano/defuddle
kepano/defuddle
3,189View on GitHub
Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and associated metadata. The tool provides a terminal interface that processes content from remote URLs, local files, or piped HTML streams. It supports custom content targeting, allowing users to specify CSS selectors to manually define the main content area when automatic detection is insufficient. The sy
TypeScriptWeb Content Parsing CLIWeb Page Content CleaningBody Content Extractors
View on GitHub3,189
microsoft/playwright-mcp
microsoft/playwright-mcp
33,988View on GitHub
Playwright MCP is a browser automation server that provides a standardized interface for connecting large language models to web navigation and interaction capabilities. By operating as a Model Context Protocol server, it enables external AI agents to execute browser-based tasks, extract data, and perform complex web sequences through a unified communication protocol. The project distinguishes itself by acting as a remote controller that manages headless browser lifecycles and isolated automation contexts. It maintains session-based state isolation, allowing for distinct user profiles and per
TypeScriptBrowser Automation ToolsModel Context Protocol ServersAgentic Browser Interfaces
View on GitHub33,988
gosom/google-maps-scraper
gosom/google-maps-scraper
3,192View on GitHub
This project is a distributed scraping engine designed to extract business details, customer reviews, and lead information from Google Maps. It functions as a business scraper and data extractor that can be deployed as a permanent system or as on-demand serverless functions. The system utilizes a proxy-routed web crawler to manage request origins via SOCKS5, HTTP, and HTTPS proxies. To locate contact information, it includes an email extraction tool that recursively crawls business websites linked within map listings. The software supports coordinate-based radius searches for efficient data
GoMaps Data ExtractionBusiness ScrapersDistributed Task Coordination
View on GitHub3,192
seleniumhq/selenium
SeleniumHQ/selenium
34,203View on GitHub
Selenium is a comprehensive browser automation framework that provides a standardized interface for controlling web browsers to perform automated tasks, user interactions, and data extraction. It functions as a cross-browser testing tool, enabling developers to execute identical automation scripts across various browser engines and operating systems to ensure consistent application behavior. By implementing the WebDriver protocol, it maps high-level automation commands to browser-specific drivers using a standardized HTTP-based wire protocol. The project distinguishes itself through its distr
JavaBrowser AutomationBrowser Capability ConfigurationDistributed Testing Grids
View on GitHub34,203
kr1s77/awesome-python-login-model
Kr1s77/awesome-python-login-model
16,225View on GitHub
This project is a Python-based automation toolkit designed to manage programmatic authentication and session persistence across web services. It provides a framework for executing automated login sequences, including the handling of interactive security challenges such as QR code verification and captcha resolution. The toolkit distinguishes itself by simulating native mobile application environments, allowing for the execution of scripts that require specific device-level headers and behaviors. It also incorporates hook-based interception to monitor workflow states and manage exceptions duri
PythonAwesome ListAutomated Login FrameworksRemote Service Authentication
View on GitHub16,225
eslint/eslint
eslint/eslint
27,349View on GitHub
This project is a static analysis engine designed to identify patterns, enforce coding standards, and automate code quality improvements in software projects. By parsing source code into structured abstract syntax trees, it enables deep programmatic inspection and the automated remediation of identified programming issues. The engine functions as a pluggable linting framework, allowing developers to extend its core capabilities through a modular architecture. Users can inject custom rules, parsers, and processors to support non-standard file formats or domain-specific logic. This extensibilit
JavaScriptAutomated Code Quality ToolsStatic AnalysisAnalysis Plugin Frameworks
View on GitHub27,349
chromedevtools/chrome-devtools-mcp
ChromeDevTools/chrome-devtools-mcp
43,761View on GitHub
This project serves as an agentic browser controller, providing a programmatic bridge that enables autonomous software agents to navigate web pages and interact with document elements. It functions as a browser automation protocol, facilitating headless browser operations and automated web interactions to perform repetitive tasks and end-to-end testing without manual human input. The system distinguishes itself by utilizing the Chrome DevTools Protocol to establish a bidirectional communication channel with the browser engine. This allows for protocol-based remote control, where external appl
TypeScriptHeadless BrowsersBrowser AutomationWeb Automation Frameworks
View on GitHub43,761

Product analytics, scraping and quality

firecrawl/firecrawl

public-apis/public-apis

gleitz/howdoi

camel-ai/camel

scrapy/scrapy

GoogleChrome/lighthouse

mixmark-io/turndown

dgtlmoon/changedetection.io

ImranR98/Obtainium

unclecode/crawl4ai

jhy/jsoup

gocolly/colly

nickscamara/open-deep-research

plausible/analytics

omnivore-app/omnivore

NanmiCoder/MediaCrawler

SimplifyJobs/New-Grad-Positions

umami-software/umami

simstudioai/sim

NaiboWang/EasySpider

dontriskit/awesome-ai-system-prompts

jivoi/awesome-osint

apurvsinghgautam/robin

puppeteer/puppeteer

microsoft/playwright-python

browser-use/browser-use

firecrawl/firecrawl-mcp-server

anuraghazra/github-readme-stats

Doriandarko/claude-engineer

meilisearch/meilisearch

lorien/web-scraping

PostHog/posthog

FreeTubeApp/FreeTube

GrowingGit/GitHub-Chinese-Top-Charts

g1879/DrissionPage

responsively-org/responsively-app

cantino/huginn

microsoft/playwright

continuedev/continue

FlareSolverr/FlareSolverr

getsentry/sentry

kepano/defuddle

microsoft/playwright-mcp

gosom/google-maps-scraper

SeleniumHQ/selenium

Kr1s77/awesome-python-login-model

eslint/eslint

ChromeDevTools/chrome-devtools-mcp