Scrape Alibaba suppliers using a keyword or a Search URL to quickly build a structured supplier list for sourcing and market research. This project turns messy supplier directory pages into clean, analysis-ready data so you can compare vendors, capabilities, and credibility signals in one place.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for scrape-alibaba-suppliers you've just found your team — Let’s Chat. 👆👆
This project collects supplier profiles from Alibaba based on a keyword search or a provided search URL, returning consistent supplier records with key business details. It solves the problem of manually browsing supplier pages and copying information into spreadsheets or CRMs. It’s built for sourcing teams, e-commerce operators, analysts, and developers who need scalable supplier discovery.
- Supports keyword-based supplier discovery with a configurable supplier count limit
- Accepts a custom search URL for advanced filtering (e.g., country/capabilities) when needed
- Normalizes supplier fields into a consistent schema for downstream workflows
- Captures credibility and responsiveness signals (e.g., gold years, reviews, reply time)
- Produces structured records suitable for enrichment, scoring, and shortlisting
| Feature | Description |
|---|---|
| Keyword supplier search | Find suppliers by product keyword and return structured supplier records. |
| Search URL mode | Scrape from a custom Alibaba search URL for more controlled targeting. |
| Configurable result size | Set how many suppliers to collect (with safe minimum limits). |
| Structured supplier profiles | Returns consistent fields for easy storage, filtering, and scoring. |
| Credibility signals | Includes reviews, scores, transactions, and assessed supplier flag when available. |
| Product context | Captures main product and product lists for quick supplier-product relevance checks. |
| Capability tags | Extracts supplier capability tags (e.g., OEM/ODM) to accelerate qualification. |
| Field Name | Field Description |
|---|---|
| area | Supplier location/region (often country or province). |
| companyId | Unique supplier company identifier. |
| companyName | Supplier company name. |
| companyIcon | Company icon/logo image URL. |
| companyImage | List of company images (factory/product/company visuals when present). |
| goldYears | Number of years listed as a Gold Supplier (if available). |
| mainProduct | Primary highlighted product block (id, subject/title, price, image, and URL). |
| productList | List of products shown in the supplier listing/profile context. |
| provideProducts | Free-text description of products the supplier provides. |
| replyAvgTime | Average reply/response time (e.g., “24h”). |
| reviewCount | Total number of reviews (when shown). |
| reviewScore | Review rating/score (e.g., 4.8). |
| staff | Staff size information (range or label). |
| capabilities | Capability tags such as OEM, ODM, custom manufacturing, etc. |
| transactions | Transaction count/volume indicator (when available). |
| vrUrl | VR showroom URL (if provided). |
| isAssessedSupplier | Boolean flag indicating assessed supplier status. |
| newAd | Additional advertisement metadata block (if present). |
[
{
"area": "China",
"companyId": "123456",
"companyName": "ABC Solar Co., Ltd.",
"companyIcon": "https://example.com/icon.png",
"companyImage": [
"https://example.com/image1.png"
],
"goldYears": 5,
"mainProduct": {
"id": "7890",
"subject": "Solar Panel",
"price": "$100",
"imageUrl": "https://example.com/product.png",
"url": "https://example.com/product"
},
"productList": [],
"provideProducts": "Solar panels, inverters",
"replyAvgTime": "24h",
"reviewCount": 100,
"reviewScore": 4.8,
"staff": "50-100",
"capabilities": [
"OEM",
"ODM"
],
"transactions": 500,
"vrUrl": "https://example.com/vr",
"isAssessedSupplier": true,
"newAd": null
}
]
scrape-alibaba-suppliers (IMPORTANT :!! always keep this name as the name of the apify actor !!! Scrape Alibaba Suppliers )/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── http/
│ │ ├── client.py
│ │ └── headers.py
│ ├── parsing/
│ │ ├── supplier_list_parser.py
│ │ ├── supplier_profile_parser.py
│ │ └── normalize.py
│ ├── extractors/
│ │ ├── extract_company.py
│ │ ├── extract_products.py
│ │ ├── extract_reviews.py
│ │ └── extract_capabilities.py
│ ├── output/
│ │ ├── schema.py
│ │ ├── validate.py
│ │ └── exporter.py
│ ├── config/
│ │ ├── defaults.py
│ │ └── logging.py
│ └── utils/
│ ├── urls.py
│ ├── retry.py
│ └── text.py
├── data/
│ ├── input.example.json
│ └── output.sample.json
├── tests/
│ ├── test_normalize.py
│ ├── test_supplier_list_parser.py
│ └── test_schema_validation.py
├── .gitignore
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md
- Sourcing managers use it to build shortlists of Alibaba suppliers by keyword, so they can compare vendors faster and reduce procurement time.
- E-commerce operators use it to discover manufacturers with OEM/ODM capabilities, so they can launch private-label products with qualified partners.
- Market researchers use it to collect supplier credibility signals (reviews, transactions, gold years), so they can score supplier competitiveness across niches.
- Sales & partnerships teams use it to generate supplier lead datasets, so they can run outreach and partnership pipelines at scale.
- Data teams use it to feed structured supplier records into enrichment/scoring models, so they can automate qualification and routing.
Q1: Should I use keyword or searchUrl?
Use keyword for the simplest workflow—enter a product term and set size. Use searchUrl when you already have a targeted Alibaba supplier search page with filters applied (e.g., a specific country, certifications, or capabilities). If both are provided, keyword mode takes precedence to keep results deterministic.
Q2: What does size control, and what’s a safe range?
size controls how many supplier results you want returned. For stable runs, keep size aligned with your expected paging depth (e.g., 10–200 for quick sourcing scans). Very large sizes increase paging requests and may require stricter retries and backoff to remain stable.
Q3: Why are some fields sometimes empty (like productList or vrUrl)?
Supplier listings vary by category and profile completeness. Some suppliers don’t expose VR showrooms, detailed product lists, or certain credibility indicators on the listing surface. The output keeps a consistent schema, but optional fields may be null/empty when not available.
Q4: Can I use this data for scoring or building a supplier shortlist automatically?
Yes. A common approach is to combine reviewScore, reviewCount, transactions, goldYears, replyAvgTime, and isAssessedSupplier into a weighted score, then filter by required capabilities (e.g., OEM/ODM). The consistent fields are designed to support this workflow.
Primary Metric: ~45–90 supplier records/min on a typical broadband connection when collecting list-level data with lightweight parsing.
Reliability Metric: ~96–99% successful supplier record assembly across runs when retries/backoff are enabled for transient network hiccups.
Efficiency Metric: ~120–220 MB peak memory usage during moderate runs (100–300 suppliers), with most time spent on HTML fetching and parsing.
Quality Metric: ~92–98% field completeness on core identifiers (companyId, companyName, area) and ~75–90% completeness on optional enrichment fields (VR URLs, product lists, ad metadata), depending on category and listing richness.
