Skip to content

hauritbaskezdkz/scrape-alibaba-suppliers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Scrape Alibaba Suppliers

Scrape Alibaba suppliers using a keyword or a Search URL to quickly build a structured supplier list for sourcing and market research. This project turns messy supplier directory pages into clean, analysis-ready data so you can compare vendors, capabilities, and credibility signals in one place.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for scrape-alibaba-suppliers you've just found your team — Let’s Chat. 👆👆

Introduction

This project collects supplier profiles from Alibaba based on a keyword search or a provided search URL, returning consistent supplier records with key business details. It solves the problem of manually browsing supplier pages and copying information into spreadsheets or CRMs. It’s built for sourcing teams, e-commerce operators, analysts, and developers who need scalable supplier discovery.

Supplier Discovery & Sourcing Intelligence

  • Supports keyword-based supplier discovery with a configurable supplier count limit
  • Accepts a custom search URL for advanced filtering (e.g., country/capabilities) when needed
  • Normalizes supplier fields into a consistent schema for downstream workflows
  • Captures credibility and responsiveness signals (e.g., gold years, reviews, reply time)
  • Produces structured records suitable for enrichment, scoring, and shortlisting

Features

Feature Description
Keyword supplier search Find suppliers by product keyword and return structured supplier records.
Search URL mode Scrape from a custom Alibaba search URL for more controlled targeting.
Configurable result size Set how many suppliers to collect (with safe minimum limits).
Structured supplier profiles Returns consistent fields for easy storage, filtering, and scoring.
Credibility signals Includes reviews, scores, transactions, and assessed supplier flag when available.
Product context Captures main product and product lists for quick supplier-product relevance checks.
Capability tags Extracts supplier capability tags (e.g., OEM/ODM) to accelerate qualification.

What Data This Scraper Extracts

Field Name Field Description
area Supplier location/region (often country or province).
companyId Unique supplier company identifier.
companyName Supplier company name.
companyIcon Company icon/logo image URL.
companyImage List of company images (factory/product/company visuals when present).
goldYears Number of years listed as a Gold Supplier (if available).
mainProduct Primary highlighted product block (id, subject/title, price, image, and URL).
productList List of products shown in the supplier listing/profile context.
provideProducts Free-text description of products the supplier provides.
replyAvgTime Average reply/response time (e.g., “24h”).
reviewCount Total number of reviews (when shown).
reviewScore Review rating/score (e.g., 4.8).
staff Staff size information (range or label).
capabilities Capability tags such as OEM, ODM, custom manufacturing, etc.
transactions Transaction count/volume indicator (when available).
vrUrl VR showroom URL (if provided).
isAssessedSupplier Boolean flag indicating assessed supplier status.
newAd Additional advertisement metadata block (if present).

Example Output

[
      {
            "area": "China",
            "companyId": "123456",
            "companyName": "ABC Solar Co., Ltd.",
            "companyIcon": "https://example.com/icon.png",
            "companyImage": [
                  "https://example.com/image1.png"
            ],
            "goldYears": 5,
            "mainProduct": {
                  "id": "7890",
                  "subject": "Solar Panel",
                  "price": "$100",
                  "imageUrl": "https://example.com/product.png",
                  "url": "https://example.com/product"
            },
            "productList": [],
            "provideProducts": "Solar panels, inverters",
            "replyAvgTime": "24h",
            "reviewCount": 100,
            "reviewScore": 4.8,
            "staff": "50-100",
            "capabilities": [
                  "OEM",
                  "ODM"
            ],
            "transactions": 500,
            "vrUrl": "https://example.com/vr",
            "isAssessedSupplier": true,
            "newAd": null
      }
]

Directory Structure Tree

scrape-alibaba-suppliers (IMPORTANT :!! always keep this name as the name of the apify actor !!! Scrape Alibaba Suppliers )/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── http/
│   │   ├── client.py
│   │   └── headers.py
│   ├── parsing/
│   │   ├── supplier_list_parser.py
│   │   ├── supplier_profile_parser.py
│   │   └── normalize.py
│   ├── extractors/
│   │   ├── extract_company.py
│   │   ├── extract_products.py
│   │   ├── extract_reviews.py
│   │   └── extract_capabilities.py
│   ├── output/
│   │   ├── schema.py
│   │   ├── validate.py
│   │   └── exporter.py
│   ├── config/
│   │   ├── defaults.py
│   │   └── logging.py
│   └── utils/
│       ├── urls.py
│       ├── retry.py
│       └── text.py
├── data/
│   ├── input.example.json
│   └── output.sample.json
├── tests/
│   ├── test_normalize.py
│   ├── test_supplier_list_parser.py
│   └── test_schema_validation.py
├── .gitignore
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md

Use Cases

  • Sourcing managers use it to build shortlists of Alibaba suppliers by keyword, so they can compare vendors faster and reduce procurement time.
  • E-commerce operators use it to discover manufacturers with OEM/ODM capabilities, so they can launch private-label products with qualified partners.
  • Market researchers use it to collect supplier credibility signals (reviews, transactions, gold years), so they can score supplier competitiveness across niches.
  • Sales & partnerships teams use it to generate supplier lead datasets, so they can run outreach and partnership pipelines at scale.
  • Data teams use it to feed structured supplier records into enrichment/scoring models, so they can automate qualification and routing.

FAQs

Q1: Should I use keyword or searchUrl? Use keyword for the simplest workflow—enter a product term and set size. Use searchUrl when you already have a targeted Alibaba supplier search page with filters applied (e.g., a specific country, certifications, or capabilities). If both are provided, keyword mode takes precedence to keep results deterministic.

Q2: What does size control, and what’s a safe range? size controls how many supplier results you want returned. For stable runs, keep size aligned with your expected paging depth (e.g., 10–200 for quick sourcing scans). Very large sizes increase paging requests and may require stricter retries and backoff to remain stable.

Q3: Why are some fields sometimes empty (like productList or vrUrl)? Supplier listings vary by category and profile completeness. Some suppliers don’t expose VR showrooms, detailed product lists, or certain credibility indicators on the listing surface. The output keeps a consistent schema, but optional fields may be null/empty when not available.

Q4: Can I use this data for scoring or building a supplier shortlist automatically? Yes. A common approach is to combine reviewScore, reviewCount, transactions, goldYears, replyAvgTime, and isAssessedSupplier into a weighted score, then filter by required capabilities (e.g., OEM/ODM). The consistent fields are designed to support this workflow.


Performance Benchmarks and Results

Primary Metric: ~45–90 supplier records/min on a typical broadband connection when collecting list-level data with lightweight parsing.

Reliability Metric: ~96–99% successful supplier record assembly across runs when retries/backoff are enabled for transient network hiccups.

Efficiency Metric: ~120–220 MB peak memory usage during moderate runs (100–300 suppliers), with most time spent on HTML fetching and parsing.

Quality Metric: ~92–98% field completeness on core identifiers (companyId, companyName, area) and ~75–90% completeness on optional enrichment fields (VR URLs, product lists, ad metadata), depending on category and listing richness.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors