Implement the Rest API pagination rule with Twitter API

Question

Implement the Rest API pagination rule with Twitter API

MikaelB

Old Hand

Points: 330
More actions
March 2, 2022 at 11:05 am

#3991590
Hi everyone,
I'm trying to get tweets data in a json file with Microsoft Azure Data Factory. I've created a copy data activity which takes the url for the get tweets api with the needed parameters (those parameters are included in the dataset definition with a parameter that represents the keyword I'm looking for) as the source, and a json file dataset as a sink.
This works pretty well, the file is created with the rest api response.
Now I want to handle the pagination rule for twitter as I will get more that the number of tweets contained in one response.
The twitter way to handle this is to have a "meta" element in the API response containing various information and particularly a "next_token" element containing the value to add to a parameter in the next get query that has to be submitted, up until the next_token element is absent from the response.
so : first get example :
```
tweets/search/recent?query=textToSearch&max_results=100
```
You receive in the response
```
"meta": {
"next_token": "MyNextToken"
}
```
So th 2nd get should be :
```
tweets/search/recent?query=textToSearch&max_results=100&pagination_token=MyNextToken
```
I've checked the help for pagination rule but can't quite figured out what to do with that, i've seen examples with the complete url but not with a new parameter which is not included in the first get query. I'm guessing you'll have to create a variable wich will contain the new parameter but I don't know how to add the variable to the url.
I've tried this but it did not work, I've only gotten the first response in my json file and not the other pages :
Thanks in advance for any tips on how to implement that !
- This topic was modified 2 years, 9 months ago by MikaelB.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Steve Collins SSCrazy Points: 2173 More actions · Answer 1

Wading in without any knowledge of Azure Data Factory at all... it seems like maybe the system is expecting a "pagination_token" along with the initial request? You're telling it how many results to return but not which page, i.e. the 1st set of 100, 2nd set of 100, etc. What happens if you try:

tweets/search/recent?query=textToSearch&max_results=100tweets/search/recent?query=textToSearch&max_results=100&pagination_token=1

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

j500sut SSC Veteran Points: 254 More actions · Answer 2

I struggled with this for some time. My case was similar although my json body had "metadata": {"nextPageToken": ""}

This worked for me:

api example

Hopefully that should work for you.

MikaelB Old Hand Points: 330 More actions · Answer 3

I've tried some stuff like that but it did not work. What's the "pageToken" in your case ? A parameter in your dataset or in your linked service ? Or the name of the API parameter ? What does your url look like in that case ? Thanks !

MikaelB Old Hand Points: 330 More actions · Answer 4

Steve,

Thanks for your answer. The next_token parameter does not represent the page, it's a token, for example "next_token":"b26v89c19zqg8o3fpyqmxh9d31o05cu777x6eddzihful"

that represent the value of the pagination_token that needs to be sent in the 2nd query (and the ones after). So if you put this parameter in the first query it does not work, even if you put 1 or a null string or whatever.

MikaelB Old Hand Points: 330 More actions · Answer 5

So, after a few tries and some research, it doesn't seem possible for the moment in Data Factory to use the pagination rules in that way.

The solution I found is to implement the pagination manually using a "until" activity where the URL is a variable string containing the next_token value and the loop finishes when there is no next_token value in the result.

j500sut SSC Veteran Points: 254 More actions · Answer 6

Sorry for the slow response.

In my case the when the pageToken is a query parameter, e.g. ?pageToken=XXXXX. Since the the value for this is present in the body of the response and is called "metadata.nextPageToken". This is present when the pageSize parameter is used.

When the last page is reached, the nextPageToken value is empty. Hence why its added as the EndCondition.

So in your case it should look something like this:

Name: Query Parameter = pagination_token Value: Body = meta.nextToken

Name: EndCondition = $['meta'].['nextToken'] Value = Empty.

This seems to be not very well documented and different API's could be slightly different.

I hope that helps.

johny Grasshopper Points: 11 More actions · Answer 7

It sounds like you've got the initial setup for fetching tweets down pat, which is awesome. Dealing with pagination in the twitter保存 API can be a bit tricky, but you're on the right track!

To add the pagination token to your subsequent requests, you'll indeed need to create a variable to hold that token value. Then, you can include that variable in your URL for the next request.

Here's a simple breakdown:

Create a variable (let's call it nextToken) to store the pagination token value.

Update your URL for the subsequent requests by appending &pagination_token=@{variables('nextToken')} to it.

This way, each time you make a request, it'll include the pagination token from the previous response, fetching the next page of tweets.

Keep in mind to handle cases where the next_token isn't present in the response (indicating no more pages left), so your pipeline knows when to stop.

Hope this helps! Let me know if you need further assistance.

This reply was modified 10 months, 2 weeks ago by johny.