All #twitter API calls require some sort of authentication.
Interestingly, the authorization header seems to be constant. I’m guessing this identifies the request as coming from Twitter’s own web UI:
authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA
The actual authentication happens elsewhere. For guest sessions, you need a `x-guest-token: 1649859312251027458` header, where there token is obtained by a separate call. Only a subset of calls are available in this mode. For the rest, you need to have a cookie from the logged in user:
cookie: auth_token=1234567890abcdef58dc6829393d4604b9e37c8a; ct0=1234567890abcdef0b09e38a20dcdd5cb6ec4cf8f2ba357187cda008b0f39273308a6b7ef6d318f609bc83563709c247e51daad090a116d775ef1fa55074cf5c235893a45f99d1cc49ac4fe61fec238d; x-csrf-token: 1234567890abcdef0b09e38a20dcdd5cb6ec4cf8f2ba357187cda008b0f39273308a6b7ef6d318f609bc83563709c247e51daad090a116d775ef1fa55074cf5c235893a45f99d1cc49ac4fe61fec238d
Note that the `x-csrf-token` is the same as the `ct0` cookie.
Both are obtained through a somewhat involved login workflow shown in https://github.com/trevorhobenshield/twitter-api-client/blob/main/twitter/login.py
As for the API calls themselves, https://github.com/fa0311/TwitterInternalAPIDocument and https://github.com/fa0311/twitter-openapi seem to be the closest there is to documentation. I think this should be fine for my purposes.
The two calls I need are /sLVLhk0bGj3MVFEKTdax1w/UserByScreenName
and /IWP6Zt14sARO29lJT35bBw/Following
. Unfortunately, the second one is not available with a guest token, so I’ll have to deal with the login workflow
Back at it again...
Twitter's login flow is weird. For one, rather than just sending login/password and getting the cookie, it is actually a series of API calls. Also, the purpose of some of the steps is not apparent, like LoginJsInstrumentationSubtask
or AccountDuplicationCheck
. No idea what those mean, but I can cargo-cult them.
I think I am beginning to understand the logic behind this API. Apparently, the logic flow can be a lot more varied and complicated that you would think. Here, at the beginning of the process the client sends all the different subtasks it can handle:
{
"input_flow_data": {
"flow_context": {
"debug_overrides": {},
"start_location": {
"location": "unknown"
}
}
},
"subtask_versions": {
"action_list": 2,
"alert_dialog": 1,
"app_download_cta": 1,
"check_logged_in_account": 1,
"choice_selection": 3,
"contacts_live_sync_permission_prompt": 0,
"cta": 7,
"email_verification": 2,
"end_flow": 1,
"enter_date": 1,
"enter_email": 2,
"enter_password": 5,
"enter_phone": 2,
"enter_recaptcha": 1,
"enter_text": 5,
"enter_username": 2,
"generic_urt": 3,
"in_app_notification": 1,
"interest_picker": 3,
"js_instrumentation": 1,
"menu_dialog": 1,
"notifications_permission_prompt": 2,
"open_account": 2,
"open_home_timeline": 1,
"open_link": 1,
"phone_verification": 4,
"privacy_options": 1,
"security_key": 3,
"select_avatar": 4,
"select_banner": 2,
"settings_list": 7,
"show_code": 1,
"sign_up": 2,
"sign_up_review": 4,
"tweet_selection_urt": 1,
"update_users": 1,
"upload_media": 1,
"user_recommendations_list": 4,
"user_recommendations_urt": 1,
"wait_spinner": 3,
"web_modal": 1
}
}
And server sort of commands the client which actions to offer to the user:
{
"flow_token": "g;168461104176909845:-1684611164737:Mh2XA15kcSPOvXshdM51j6Ea:1",
"status": "success",
"subtasks": [
{
"subtask_id": "LoginEnterUserIdentifierSSO",
"settings_list": {
"settings": [
{
"value_type": "button",
"value_identifier": "google_sso_button",
"value_data": {
"button": {
"navigation_link": {
"link_type": "subtask",
"link_id": "google_sso",
"label": "Continue with Google",
"subtask_id": "EnterIdGoogleSSOSubtask"
},
"style": "brand",
"icon": {
"icon": "logo_google_g_color"
},
"preferred_size": "normal"
}
}
},
{
"value_type": "button",
"value_identifier": "apple_sso_button",
"value_data": {
"button": {
"navigation_link": {
"link_type": "subtask",
"link_id": "apple_id",
"label": "Continue with Apple",
"subtask_id": "EnterIdAppleSSOSubtask"
},
"style": "brand",
"icon": {
"icon": "logo_apple"
},
"preferred_size": "normal"
}
}
},
{
"value_type": "separator",
"value_identifier": "separator",
"value_data": {
"separator": {
"label": {
"text": "or",
"entities": []
}
}
}
},
{
"value_type": "text_field",
"value_identifier": "user_identifier",
"value_data": {
"text_field": {
"content_type": "text",
"hint_text": "Phone, email, or username"
}
}
},
{
"value_type": "button",
"value_identifier": "next_button",
"value_data": {
"button": {
"navigation_link": {
"link_type": "task",
"link_id": "next_link",
"label": "Next"
},
"style": "primary",
"preferred_size": "normal"
}
}
},
{
"value_type": "button",
"value_identifier": "forgot_password",
"value_data": {
"button": {
"navigation_link": {
"link_type": "subtask",
"link_id": "forget_password",
"label": "Forgot password?",
"subtask_id": "RedirectToPasswordReset"
},
"style": "secondary",
"preferred_size": "normal"
}
}
}
],
"detail_text": {
"text": "Don't have an account? Sign up",
"entities": [
{
"from_index": 23,
"to_index": 30,
"navigation_link": {
"link_type": "deep_link_and_abort",
"link_id": "signup_deep_link",
"url": "https://twitter.com/i/flow/signup"
}
}
]
},
"style": "step",
"header": {
"primary_text": {
"text": "Sign in to Twitter",
"entities": []
}
},
"navigation_style": "hide",
"horizontal_style": "compact"
},
"subtask_back_navigation": "cancel_flow"
},
{
"subtask_id": "EnterIdGoogleSSOSubtask",
"single_sign_on": {
"provider": "google",
"scopes": [
"openid",
"email",
"profile"
],
"state": "j28nFz5x2qeOetxXP7RpW4hldQFpYIKWoEkFqBPDJqh",
"next_link": {
"link_type": "task",
"link_id": "next_link"
},
"fail_link": {
"link_type": "subtask",
"link_id": "fail_link",
"subtask_id": "LoginEnterUserIdentifierSSO"
},
"cancel_link": {
"link_type": "subtask",
"link_id": "cancel_link",
"subtask_id": "LoginEnterUserIdentifierSSO"
}
},
"subtask_back_navigation": "cancel_flow"
},
{
"subtask_id": "EnterIdAppleSSOSubtask",
"single_sign_on": {
"provider": "apple",
"scopes": [
"email",
"name"
],
"state": "TPt3CJRXQfJaN3tjB3QPEi_FS_WtsOlj68qfoeTGmx4",
"next_link": {
"link_type": "task",
"link_id": "next_link"
},
"fail_link": {
"link_type": "subtask",
"link_id": "fail_link",
"subtask_id": "LoginEnterUserIdentifierSSO"
},
"cancel_link": {
"link_type": "subtask",
"link_id": "cancel_link",
"subtask_id": "LoginEnterUserIdentifierSSO"
}
},
"subtask_back_navigation": "cancel_flow"
},
{
"subtask_id": "RedirectToPasswordReset",
"open_link": {
"link": {
"link_type": "deep_link_and_abort",
"link_id": "password_reset_deep_link",
"url": "https://twitter.com/i/flow/password_reset?input_flow_data=%7B%22requested_variant%22%3A%22eyJwbGF0Zm9ybSI6IlJ3ZWIifQ%3D%3D%22%7D"
}
}
}
]
}
One trick that always helps when reverse engineering something popular: searching for unique strings in Google and other search engines turns up other people's notes.
Some thing this time, AccountDuplicationCheck_false
pointed me at https://github.com/fa0311/TwitterFrontendFlow and https://github.com/tsukumijima/tweepy-authlib, which look like very detailed implementations of the login flow I could reference.
Part of the challenge is that Twitter sometimes shows stull like "confirm your email" when it feels suspicious activity, but I can't reliably reproduce and test such behavior. Looking at other people's code helps find such cases before they randomly break at some future point.
ran into an interesting gotcha. Some API endpoints expect GET requests rather than POST, for example https://twitter.com/i/api/graphql/zC51NksbixfctE9X0ITB-Q/Viewer
. But, if you send a POST request by mistake they won't just throw a 405 Method Not Allowed
, but act as if it didn't recognize any GET query parameters: The following features cannot be null: responsive_web_graphql_exclude_directive_enabled, verified_phone_label_enabled, responsive_web_graphql_skip_user_profile_image_extensions_enabled, responsive_web_graphql_timeline_navigation_enabled, blue_business_profile_image_shape_enabled
. Which makes me wonder if the server is interpreting it as application/x-www-form-urlencoded
?..
Not sure if anything interesting can be done with this.
I think the goal for today to get a functional OPML generator for the following list, at least as a CLI tool. It's pretty late, but how hard can it be?
Another weirdness of Twitter's API: it alludes to be graphql (like, in the URL: https://twitter.com/i/api/graphql/q4cKckK0lNxWkHfAXXXzJQ/Following
), but I don't think it actually is?
Until now I haven't had to use GraphQL for anything, so I could be wrong, but it's nothing like http://graphql.org/learn describes. Maybe they use GraphQL no the backend to generate responses for those APIs? But then I thought the whole point was to let the client make their own queries.
TBH, Twitter's API is one of the odder ones I've seen. It seems to be built around the paradigm of the backend telling the frontend what to do. Like, fetching timeline isn't just "give me a list of tweets after X", but the backend sending you instructions what to add and what to remove from the timeline.
Just to illustrate my point for how bizarre Twitter API is, where is an example of the https://twitter.com/i/api/graphql/q4cKckK0lNxWkHfAXXXzJQ/Following
response. Just look how complex this thing is for the purpose of showing a list of users.
I understand stuff like "historical reasons" and for a system as big as Twitter they definitely must be at play, but I'd really be curious to learn those reasons...
Proof of concept works, I successfully exported my twitter following list into rss, using nitter.net as the rss source.
Did it take more time than if I'd done it by hand? Yes.
Did I have more fun than if I'd done it by hand? Also yes.
Are folks interested in a web/opensource version? If there is demand, I may take some time to clean up and release it.
In the mean time I think I worked out why Twitter API is the way it is:
- Twitter is about displaying lists of stuff. Timeline is a list of stuff. Hence we can reuse timeline component to display about anything, including users, tweets, notifications, etc.
- The strange instruction-driven style of the API is probably because they wanted to offload as much behavior logic to the backend, where it can be reused between platforms. If only there was a way to render web pages on a server no matter what platform the client is on...
I found https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/rebuild_twitter_public_api_2020, which pretty much confirms my guess. Twitter’s API is only GraphQL under the hood, so that they can easily make up new endpoints for their use cases, but for third parties it’s pretty much the same deal as before, except with much messier data structures. Which is probably why they never really made it public.
The latest round of AI turf wars reminded me that I haven't finished this project, so that's gonna be my weekend then. My OPML exporter seems to be working fine, sine it actually authenticates, so all I really need is to build a minimal feed-to-rss logic and a frontend for it.
https://github.com/zedeus/nitter/issues/919 has a fair few interesting pointer (and a whole lot of moaning and groaning).
Here's a collection of links I might need later:
- https://gist.github.com/KohnoseLami/580d0f2d7f1784e9352649260d921df9 — twitter's official app API keys. Seem old-ish, but probably still valid. (As usual searching for the key stings in google turns up a lot of fun stuff)
- https://github.com/4cq2/mech/blob/main/twitter/oauth.go — example how those are used to authenticate.
- https://pkg.go.dev/github.com/sasarinomari/twitter-auth#section-readme — another library for authenticating with leaked keys.
- https://github.com/zedeus/nitter/compare/master...PrivacyDevel:nitter:master — nitter fork that supports authentication to some degree
Trying to implement fetching and parsing tweets is a bit of a headache due to their convoluted data schema. One thing that's handy is http://github.com/ChimeraCoder/gojson, which can generate Go struct definitions from a json example. I tried several tools like that and this one has the advantage of being able to "merge" schemas of different elements in an array, instead of just using the first entry.
For example, for something like:
[
{id: "123", text: "hi"},
{id: "123", video: "http://blah"}
]
Many tools would generate a Go type like:
type Foo []struct{
Id string `json:"id"`
Text string `json:"text"`
}
... which isn't able to adequately represent the second item. gojson generates the following instead:
type Foo []struct{
Id string `json:"id"`
Text string `json:"text"`
Video string `json:"video"` // Union of fields from all entries!
}
For the convoluted mess that Twitter API is, this makes a world of difference.
I’ve spent a better part of today, playing with #Twitter’s internal API and trawling through GitHub for examples of the apps that use it. I think I’ve learned enough to try and code something up. This thread will be my notebook and a journal.
The minimal objective will be to write an exporter from the following list to OPML, so that I can move most of my feed into an RSS reader. And if I don’t lose the interest by then, I' may even try to write a web app for reading the feed with no frills.