30 open-source projects similar to dataabc/weibo-crawler, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Weibo Crawler alternative.
weiboSpider is a Python web scraper and social media crawler designed to extract user profiles, posts, and engagement metrics from Sina Weibo. It functions as an automated data pipeline for academic research and trend analysis, collecting long-form text and multimedia content. The tool distinguishes itself through the use of browser session cookies to authenticate requests and access protected profiles. It implements randomized request pacing and global pauses to manage traffic and avoid platform rate limits, while supporting incremental crawling to capture only new content based on timestamp
Weibospider is a distributed web crawler designed to extract posts, profiles, and interaction data from the Weibo social network. It functions as a social media data extractor that utilizes a distributed task queue to scale scraping operations across multiple worker nodes. The system includes a graphical administrative interface for configuring crawler settings, target user identifiers, and search keywords. It employs a distributed architecture to increase data throughput and manage large-scale collection of social media content. The tool covers a wide range of data collection capabilities,
This project is an unauthenticated web scraper designed to extract public data from the Twitter frontend API. It functions as a social media data extractor that simulates browser requests to gather information without the need for official API keys or user account authentication. The tool provides capabilities for gathering public posts, harvesting user profile metadata such as biographies and locations, and retrieving trending topics categorized by geographical region. It can perform targeted content scraping based on specific usernames, hashtags, or search queries. The system manages data
WeiboSpider is a social media scraper designed to extract user profiles, posts, and interaction data from the Sina Weibo platform. It functions as a web-based data crawler that retrieves information via external interfaces rather than parsing the visual frontend. The tool includes a content lineage tracer to follow shared posts back to their original sources. It also features a social engagement analyzer to collect view counts and nested comment threads to measure user interaction metrics. The system provides capabilities for keyword-based social monitoring and search result filtering to tra
Lets-chat is a self-hosted team communication platform and XMPP chat server designed for private messaging. It provides a containerized communication environment for small teams to exchange messages and files, featuring a programmable REST API for automating conversations and managing messages from external tools. The platform functions as an XMPP gateway and server, ensuring interoperability with other compliant messaging clients. It distinguishes itself by supporting enterprise identity management, allowing administrators to verify user identities through local accounts or external director
EspoCRM is an open-source customer relationship management platform and SQL-based business application. It serves as a centralized web interface for tracking leads, opportunities, and contacts, providing a sales pipeline manager and a customizable business logic engine. The platform is distinguished by its ability to function as a custom business application builder, allowing for the creation of tailored entities and automated workflows. It integrates marketing automation tools for campaign coordination and a structured customer support ticketing system for case management. The system covers
Vendure is a Node.js e-commerce engine and headless commerce framework built with NestJS and TypeScript. It serves as a multi-channel commerce platform that manages product catalogs, orders, and customers via a strongly typed GraphQL API. The platform is distinguished by its highly extensible architecture, featuring a customizable administrative dashboard where developers can inject custom React components and entity views. It supports multi-channel commerce, allowing the isolation of products, currencies, and regional catalogs from a single unified backend. The engine covers a broad range o
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
This project is a Telegram API client and media archiving system designed to programmatically retrieve chat histories and export media. It functions as a download manager and message forwarder, allowing users to back up photos, videos, and documents from Telegram chats into structured local archives. The system distinguishes itself through advanced content filtering and forwarding capabilities. It can monitor chats for new messages, apply custom regular expressions to filter media by size or date, and automatically forward content between chats. This includes the ability to export protected c
trump2cash is a sentiment-based stock trading bot and social media market monitor. It uses a natural language processing sentiment analysis tool to scan real-time social media feeds for mentions of publicly traded companies and translates the emotional tone of that text into automated buy or short stock market orders. The system utilizes a ticker mapping utility to resolve company names, subsidiaries, and brands into valid public stock market ticker symbols. To verify the efficacy of these sentiment-driven signals, it includes an algorithmic trading backtester that evaluates trading strategie
AzuraCast is a self-hosted web radio management suite and dashboard designed for internet radio broadcasting. It functions as an automated playback manager and broadcasting system, integrating an audio orchestrator for transcoding and cross-fading with a streaming server to distribute audio via mount points. The platform enables multi-tenant station management, allowing a single installation to host multiple independent radio entities. It distinguishes itself by combining an Auto-DJ broadcast system with the ability to manage live DJ accounts and coordinate real-time broadcasts. The system c
This project serves as a comprehensive reference architecture and a guide to best practices for developing scalable applications with the Spring Boot framework. It provides a structural blueprint for Java backend development, focusing on the implementation of decoupled APIs and the establishment of architectural standards. The project specifically details the creation of custom starters and auto-configuration modules to simplify the integration of third-party libraries. It also provides a deployment blueprint for packaging applications as executable jars and optimizing layered builds for cont
This project is a self-hosted, web-based interface designed for interacting with large language models. It provides a centralized dashboard that enables users to manage model communications, maintain persistent conversation histories, and organize prompt libraries within a private, containerized environment. The platform distinguishes itself through robust administrative and security controls, including support for enterprise identity providers, password-based access, and request rate limiting. It facilitates flexible connectivity by allowing users to configure custom network proxies, manage
This project is a containerized IPTV service that aggregates channel playlists from multiple public sources and serves them alongside Electronic Program Guide (EPG) XML data. It functions as an automated playlist aggregator and EPG provider, designed to run as a self-maintaining system that keeps channel listings current without manual intervention. The service distinguishes itself by combining playlist aggregation, EPG serving, and automated updates into a single containerized deployment. It refreshes aggregated channel data on a recurring schedule, pulling from multiple public repositories
This is a reference implementation library providing a collection of code samples, Transact-SQL scripts, and schemas for SQL Server, Azure SQL, and Azure Synapse. It focuses on providing standardized implementation patterns and reference code for building relational databases and cloud data warehouses. The library distinguishes itself by offering specialized guides and examples for deploying database instances within containerized environments and Azure cloud services. It includes specific reference databases and language extensions for integrating machine learning services and advanced analy
Stringer is a self-hosted RSS feed aggregator and reader. It functions as a multi-user feed manager that collects and organizes content from various web feeds into a single unified interface. The project operates as an RSS API server, exposing data feeds to third-party mobile applications for synchronization and reading. It includes automated background tasks to fetch new content entries and provides a private environment for content curation. The system covers user account management with password-hashed authentication, subscription billing via third-party payment processors, and user inter
MeTube is a self-hosted, containerized media downloader that provides a web-based interface for archiving online video content. It functions as a manager for the command-line tool yt-dlp, automating the retrieval, organization, and post-processing of media files directly to your local hardware. The application distinguishes itself by supporting authenticated downloads, allowing users to inject browser-derived session cookies to access private or restricted content. It also features advanced post-processing capabilities, including the automatic embedding of metadata, chapter markers, and subti
Nitter is a privacy-focused, alternative web interface for viewing public social media content. It functions as a server-side proxy that fetches and renders external posts, allowing users to browse content without requiring a personal account or executing third-party tracking scripts. By stripping away user identifiers and tracking mechanisms, the application provides a lightweight, anonymous viewing experience. The project distinguishes itself through its emphasis on network-level privacy and self-sovereignty. It supports routing traffic through Tor and I2P networks to bypass censorship and
Horizon is a realtime API server and RethinkDB backend designed to push database changes instantly to front-end clients. It utilizes a WebSocket data streaming API to synchronize data between the database and user interfaces without requiring manual polling. The project integrates an OAuth identity manager for verifying user identities through third-party providers and a role-based access control system to define granular permissions for viewing or modifying database documents. It is delivered as a containerized backend framework, allowing the server and its dependencies to be deployed as a p
dalle-mini is a text-to-image model and generative AI system designed to transform natural language descriptions into synthetic images. It functions as an image generation training toolkit and a generative model capable of creating visual representations from text prompts. The project provides a containerized deployment for consistent execution across different computing environments. It includes the necessary scripts and configuration files to train custom generative models from datasets. The system utilizes an autoregressive transformer architecture that treats visual data as discrete toke
Securo is a self-hosted personal finance management platform designed to provide users with complete control over their private financial data. By deploying the application within their own infrastructure, users can aggregate bank accounts, track income and expenses, and monitor investment portfolios while maintaining data privacy. The system supports multi-user access, allowing for collaborative expense tracking and shared financial management within a single environment. The platform distinguishes itself through the integration of local artificial intelligence, which enables users to query
BabyBuddy is a self-hosted infant care tracking application designed for logging feedings, diaper changes, and growth metrics to monitor child development. It functions as a private data store for sensitive health and activity records, providing a containerized environment for managing childcare data across different hardware architectures. The system integrates with home automation hubs and provides a RESTful API to enable programmatic recording and querying of care data. It supports collaborative caregiver management, allowing multiple family members or professional caregivers to share acce
Relax is a headless content management system and visual website builder. It allows for the creation and organization of digital content delivered via a GraphQL API to a React frontend, utilizing a component-based visual editor to construct web pages and layouts without writing manual code. The platform integrates a dockerized content manager for simplified deployment and uses a schema-driven approach to structure data architecture and entry types. It features a dynamic administrative experience built with React and Redux to manage site states and user interfaces. The system provides broad c
PROXY-List is a public proxy aggregator that provides data structures for storing and aggregating publicly available HTTP and SOCKS proxy server addresses. It serves as a source for retrieving network traffic routing lists used to mask origin IP addresses during web requests. The project utilizes a data pipeline to automatically scrape, poll, and serialize proxy lists from multiple public websites. This infrastructure ensures the availability of active servers through scheduled periodic polling and automated content refreshes, delivering the resulting lists as plain text files. These capabil
Blackcandy is a self-hosted music streaming server and digital music library manager. It allows users to host personal collections of audio files on a private server for streaming to web browsers and dedicated mobile applications. The system includes a media library sync engine that monitors file system changes in the background and uses parallel processing to keep the music database synchronized. It manages library organization by extracting audio metadata and fetching artist and album imagery from external databases. The platform provides capabilities for remote audio playback, music libra
Astron-agent is an orchestration platform for designing and executing complex agentic workflows that combine language models with external tools and business systems. It provides a production-ready environment for deploying AI services within private intranets using container orchestration for scalable management. The platform distinguishes itself by linking large language model decision-making with robotic process automation to execute tasks across enterprise applications. It further supports enterprise requirements through a multi-tenant infrastructure that utilizes isolated memory and iden
Datasette is a tool for publishing and sharing SQLite databases as public websites. It functions as a data publishing system that provides searchable interfaces and JSON APIs to expose the contents of SQLite files. The project enables both server-side and client-side execution. It can operate as an API server or as a database browser that runs entirely within a web browser using WebAssembly, allowing for serverless database access. The system supports a variety of deployment strategies, including containerized images for cloud hosting and a local development server for testing. It includes c
This project is a GraphQL web application with a React frontend that utilizes server-side rendering to generate HTML on the server for improved initial load times and search engine indexing. The application supports both static site generation for fast delivery via pre-rendered HTML files and containerized deployment to ensure consistent runtime behavior across different environments. The project includes capabilities for GraphQL data integration, frontend asset optimization through code-splitting, and component UI verification using snapshot testing. It also provides a mechanism for managin
Mamba is a package manager for scientific and data science workflows that implements a high-performance dependency solver in C++. It uses a SAT-based resolution model and a specialized library for metadata processing to calculate compatible package versions across different operating systems. The project provides a standalone executable runtime, allowing the creation of isolated package environments without requiring a pre-existing system installation. It ensures reproducible environment setup by utilizing lock files to pin exact package versions and channels. The system supports containeriz