design – I pull data from Shopify via a C # program, but the program's performance does not convince me

As the title indicates: I extract the data from Shopify via a C # program (calling the Shopify API). First, the Shopify API has the following limitation: it uses the leaky compartment algorithm to handle incoming requests, where his bucket size is 40 request, and the the leak rate is 2 requests per second. So, if the bucket is full and a new request arrives at the API, the API itself will respond with an HTTP error message 429 (too many requests). You can read more about this in the Shopify documentation: https://help.shopify.com/api/getting-started/api-call-limit.

I pull some data from Shopify by hitting their API, and in reality, the amount of data is quite small (like 80 to 90,000 transactions), but because of the limitation of this API, it is difficult / difficult to achieve this at as little time as possible. So, basically, what my program does is broke out as 40 calls to the Shopify API, I wait for them, then schedule it for a few seconds (10 seconds) and the next 40 burst calls. Because of the next typing without waiting for the time needed to avoid HTTP 429 responses, I implemented the retry pattern for HTTP calls that fails due to a transient error (such as HTTP 429, 503, etc.) . I'm sure I'm doing my best not to recover the partial results.

So that's what my program does; I shot 85k transactions in 11 hours (which seems pretty bad to me), but I'm trying to see where I can still improve to reduce the hours of treatment. I know that there is a bottleneck in the Shopify API, but it's something that goes out of my frame … do you think it exists? a technique / approach to improve the extraction of data from an API? I would like to hear your opinions / thoughts on this subject. I am totally open to any suggestion! Also, I would be so grateful.

Check the code snippet below to see one of the features of my program that shows the logic explained above. I would also like to thank any comments on the code; like, is the performance offered by the AsParallel method of the ParallelEnumerable class sufficient for the situation I'm managing?

public void BulkInsertOrdersEvents (List orders, IPersistence persistence)
{
if (orders! = null && orders.Any ())
{
return;
}

short ordersPerBurst = 40;
int totalOrders = orders.Count;
int ordersProcessed = 0;

while (true)
{
if (ordersProcessed> = totalOrders)
{
Pause;
}

var ordersForProcess = orders.Skip (ordersProcessed) .Take (ordersPerBurst);

ordersForProcess.AsParallel (). ForAll ((orderId) =>
{
var httpCallParameters = new dictionary();
httpCallParameters.Add ("orderId", orderId);

Console.WriteLine ("Order processing {0}", orderId);

int pages = CalculatePages (ShopifyEntity.OrderEvent, httpCallParameters); // calculate the number of data pages (events) corresponding to the current command

if (pages == 0)
{
return;
}

string getOrderEventsEndpoint = string.Format (ShopifyApiEndpoints.GET_ORDER_EVENTS_BY_ORDER_ID, orderId) + $ "? limit = {ShopifyApiConstants.MAX_LIMIT_ORDER_EVENTS}";
var orderEventsBag = new ConcurrentBag();

Parallel.For (1, pages + 1, (index) =>
{
// Create an HTTP client for the Shopify API call
var httpClient = GetHttpClient ();
var httpHeaders = GetHttpHeaders ();

// Call the Shopify API for return order transactions
string orderEvents = httpClient.Get (getOrderEventsEndpoint + "& page =" + index, httpHeaders);

Console.WriteLine ("Resulting page {0} of order events {1}", index, orderId);

// Place the order events for the current page in the simultaneous bag
orderEventsBag.Add (orderEvents);
});

// Merge all event pages into one JSON.
var orderEventsJson = JsonHelper.MergeJsons (orderEventsBag.ToArray ());

persistence.Save (orderEventsJson);

Console.WriteLine ("Completed processing of the {0} command", orderId);
});

Thread.Sleep (TimeSpan.FromSeconds (5));

ordersProcessed + = ordersPerBurst;
}

}

I forgot to mention that I am also storing these results on Azure blob storage; but it's not a problem at all! My program takes so long when I pull the data from Shopify.

Thank you guys!