March 2, 2022 at 11:05 am
Hi everyone,
I'm trying to get tweets data in a json file with Microsoft Azure Data Factory. I've created a copy data activity which takes the url for the get tweets api with the needed parameters (those parameters are included in the dataset definition with a parameter that represents the keyword I'm looking for) as the source, and a json file dataset as a sink.
This works pretty well, the file is created with the rest api response.
Now I want to handle the pagination rule for twitter as I will get more that the number of tweets contained in one response.
The twitter way to handle this is to have a "meta" element in the API response containing various information and particularly a "next_token" element containing the value to add to a parameter in the next get query that has to be submitted, up until the next_token element is absent from the response.
so : first get example :
tweets/search/recent?query=textToSearch&max_results=100
You receive in the response
"meta": {
"next_token": "MyNextToken"
}
So th 2nd get should be :
tweets/search/recent?query=textToSearch&max_results=100&pagination_token=MyNextToken
I've checked the help for pagination rule but can't quite figured out what to do with that, i've seen examples with the complete url but not with a new parameter which is not included in the first get query. I'm guessing you'll have to create a variable wich will contain the new parameter but I don't know how to add the variable to the url.
I've tried this but it did not work, I've only gotten the first response in my json file and not the other pages :
Thanks in advance for any tips on how to implement that !
March 2, 2022 at 2:07 pm
Wading in without any knowledge of Azure Data Factory at all... it seems like maybe the system is expecting a "pagination_token" along with the initial request? You're telling it how many results to return but not which page, i.e. the 1st set of 100, 2nd set of 100, etc. What happens if you try:
tweets/search/recent?query=textToSearch&max_results=100tweets/search/recent?query=textToSearch&max_results=100&pagination_token=1
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 14, 2022 at 11:38 am
I struggled with this for some time. My case was similar although my json body had "metadata": {"nextPageToken": ""}
This worked for me:
Hopefully that should work for you.
March 18, 2022 at 3:58 pm
I've tried some stuff like that but it did not work. What's the "pageToken" in your case ? A parameter in your dataset or in your linked service ? Or the name of the API parameter ? What does your url look like in that case ? Thanks !
March 18, 2022 at 4:03 pm
Steve,
Thanks for your answer. The next_token parameter does not represent the page, it's a token, for example "next_token":"b26v89c19zqg8o3fpyqmxh9d31o05cu777x6eddzihful"
that represent the value of the pagination_token that needs to be sent in the 2nd query (and the ones after). So if you put this parameter in the first query it does not work, even if you put 1 or a null string or whatever.
April 1, 2022 at 2:07 pm
So, after a few tries and some research, it doesn't seem possible for the moment in Data Factory to use the pagination rules in that way.
The solution I found is to implement the pagination manually using a "until" activity where the URL is a variable string containing the next_token value and the loop finishes when there is no next_token value in the result.
April 11, 2022 at 10:23 am
Sorry for the slow response.
In my case the when the pageToken is a query parameter, e.g. ?pageToken=XXXXX. Since the the value for this is present in the body of the response and is called "metadata.nextPageToken". This is present when the pageSize parameter is used.
When the last page is reached, the nextPageToken value is empty. Hence why its added as the EndCondition.
So in your case it should look something like this:
Name: Query Parameter = pagination_token Value: Body = meta.nextToken
Name: EndCondition = $['meta'].['nextToken'] Value = Empty.
This seems to be not very well documented and different API's could be slightly different.
I hope that helps.
February 6, 2024 at 12:04 pm
It sounds like you've got the initial setup for fetching tweets down pat, which is awesome. Dealing with pagination in the twitter保存 API can be a bit tricky, but you're on the right track!
To add the pagination token to your subsequent requests, you'll indeed need to create a variable to hold that token value. Then, you can include that variable in your URL for the next request.
Here's a simple breakdown:
Create a variable (let's call it nextToken) to store the pagination token value.
Update your URL for the subsequent requests by appending &pagination_token=@{variables('nextToken')} to it.
This way, each time you make a request, it'll include the pagination token from the previous response, fetching the next page of tweets.
Keep in mind to handle cases where the next_token isn't present in the response (indicating no more pages left), so your pipeline knows when to stop.
Hope this helps! Let me know if you need further assistance.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply