Skip to content

Commit 7490ad3

Browse files
authored
Update README.md
1 parent 322b61f commit 7490ad3

File tree

1 file changed

+33
-15
lines changed

1 file changed

+33
-15
lines changed

README.md

+33-15
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,33 @@
2424

2525
## Description
2626

27-
[Nest](https://github.com/nestjs/nest) framework TypeScript starter repository.
27+
This repository contains the implementation of a web scraping API designed to retrieve product information from a specified URL. The API is built using NestJS and employs asynchronous processing to handle requests efficiently.
28+
29+
Features
30+
- Receives requests containing a product ID and initiates asynchronous processing.
31+
- Responds with an HTTP 200 status code and a unique process identifier upon request reception.
32+
- Initiates the scraping process of the target URL and transforms the website data into a unified JSON format.
33+
- Includes a 10-second timeout to simulate data processing.
34+
- Responds with a "not ready" status if queried with the process identifier during the timeout period.
35+
- Provides the final result via the same endpoint after the processing is complete.
36+
37+
Note
38+
39+
- This project uses NestJS and cache for processing purposes. In a real-world scenario, Redis would be used for processing, and PostgreSQL for storing results.
40+
41+
Data Retrieval Methods
42+
43+
To retrieve product information as per the requirements outlined in the task, the following methods were considered:
44+
45+
1. Open Graph in Meta Tags: Parsing meta tags with Open Graph protocol to extract product information.
46+
47+
2. Schema Parsing: Extracting product details from structured data using schema markup.
48+
49+
3. HTML Markup Parsing: Parsing HTML markup to identify and extract product information.
50+
51+
4. Script Tag Parsing: Extracting data from JavaScript scripts embedded within the HTML.
52+
53+
For the given task, the preferred method of data retrieval was Script Tag Parsing. This method was chosen because it provided the necessary information required by the task. Specifically, it allowed for the extraction of product identifiers and specifications required for further processing.
2854

2955
## Installation
3056

@@ -35,27 +61,19 @@ $ npm install
3561
## Running the app
3662

3763
```bash
38-
# development
39-
$ npm run start
4064

4165
# watch mode
4266
$ npm run start:dev
43-
44-
# production mode
45-
$ npm run start:prod
4667
```
4768

48-
## Test
69+
## Using documentation
4970

50-
```bash
51-
# unit tests
52-
$ npm run test
53-
54-
# e2e tests
55-
$ npm run test:e2e
71+
Open swagger http://localhost:3000/swagger/#/scraper/ScraperController_scrapeProduct and try to send post method with data:
5672

57-
# test coverage
58-
$ npm run test:cov
73+
```bash
74+
{
75+
"productId": "air-presto-mens-shoes-JlLlWz"
76+
}
5977
```
6078

6179
## Support

0 commit comments

Comments
 (0)