Nevkontakte: "ran into an interesting gotcha. Some API..."

@me@m.nevkontakte.com

I’ve spent a better part of today, playing with #Twitter’s internal API and trawling through GitHub for examples of the apps that use it. I think I’ve learned enough to try and code something up. This thread will be my notebook and a journal.

The minimal objective will be to write an exporter from the following list to OPML, so that I can move most of my feed into an RSS reader. And if I don’t lose the interest by then, I' may even try to write a web app for reading the feed with no frills.

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

All #twitter API calls require some sort of authentication.

Interestingly, the authorization header seems to be constant. I’m guessing this identifies the request as coming from Twitter’s own web UI:


authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

The actual authentication happens elsewhere. For guest sessions, you need a `x-guest-token: 1649859312251027458` header, where there token is obtained by a separate call. Only a subset of calls are available in this mode. For the rest, you need to have a cookie from the logged in user:


cookie: auth_token=1234567890abcdef58dc6829393d4604b9e37c8a; ct0=1234567890abcdef0b09e38a20dcdd5cb6ec4cf8f2ba357187cda008b0f39273308a6b7ef6d318f609bc83563709c247e51daad090a116d775ef1fa55074cf5c235893a45f99d1cc49ac4fe61fec238d;
x-csrf-token: 
1234567890abcdef0b09e38a20dcdd5cb6ec4cf8f2ba357187cda008b0f39273308a6b7ef6d318f609bc83563709c247e51daad090a116d775ef1fa55074cf5c235893a45f99d1cc49ac4fe61fec238d

Note that the `x-csrf-token` is the same as the `ct0` cookie.

Both are obtained through a somewhat involved login workflow shown in https://github.com/trevorhobenshield/twitter-api-client/blob/main/twitter/login.py

twitter-api-client/login.py at main · trevorhobenshield/twitter-api-client GitHub

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

As for the API calls themselves, https://github.com/fa0311/TwitterInternalAPIDocument and https://github.com/fa0311/twitter-openapi seem to be the closest there is to documentation. I think this should be fine for my purposes.

The two calls I need are /sLVLhk0bGj3MVFEKTdax1w/UserByScreenName and /IWP6Zt14sARO29lJT35bBw/Following. Unfortunately, the second one is not available with a guest token, so I’ll have to deal with the login workflow

GitHub - fa0311/TwitterInternalAPIDocument: Twitter Internal API Document GitHub

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

Back at it again...

Twitter's login flow is weird. For one, rather than just sending login/password and getting the cookie, it is actually a series of API calls. Also, the purpose of some of the steps is not apparent, like LoginJsInstrumentationSubtask or AccountDuplicationCheck. No idea what those mean, but I can cargo-cult them.

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

I think I am beginning to understand the logic behind this API. Apparently, the logic flow can be a lot more varied and complicated that you would think. Here, at the beginning of the process the client sends all the different subtasks it can handle:

{
    "input_flow_data": {
        "flow_context": {
            "debug_overrides": {},
            "start_location": {
                "location": "unknown"
            }
        }
    },
    "subtask_versions": {
        "action_list": 2,
        "alert_dialog": 1,
        "app_download_cta": 1,
        "check_logged_in_account": 1,
        "choice_selection": 3,
        "contacts_live_sync_permission_prompt": 0,
        "cta": 7,
        "email_verification": 2,
        "end_flow": 1,
        "enter_date": 1,
        "enter_email": 2,
        "enter_password": 5,
        "enter_phone": 2,
        "enter_recaptcha": 1,
        "enter_text": 5,
        "enter_username": 2,
        "generic_urt": 3,
        "in_app_notification": 1,
        "interest_picker": 3,
        "js_instrumentation": 1,
        "menu_dialog": 1,
        "notifications_permission_prompt": 2,
        "open_account": 2,
        "open_home_timeline": 1,
        "open_link": 1,
        "phone_verification": 4,
        "privacy_options": 1,
        "security_key": 3,
        "select_avatar": 4,
        "select_banner": 2,
        "settings_list": 7,
        "show_code": 1,
        "sign_up": 2,
        "sign_up_review": 4,
        "tweet_selection_urt": 1,
        "update_users": 1,
        "upload_media": 1,
        "user_recommendations_list": 4,
        "user_recommendations_urt": 1,
        "wait_spinner": 3,
        "web_modal": 1
    }
}

And server sort of commands the client which actions to offer to the user:

{
    "flow_token": "g;168461104176909845:-1684611164737:Mh2XA15kcSPOvXshdM51j6Ea:1",
    "status": "success",
    "subtasks": [
        {
            "subtask_id": "LoginEnterUserIdentifierSSO",
            "settings_list": {
                "settings": [
                    {
                        "value_type": "button",
                        "value_identifier": "google_sso_button",
                        "value_data": {
                            "button": {
                                "navigation_link": {
                                    "link_type": "subtask",
                                    "link_id": "google_sso",
                                    "label": "Continue with Google",
                                    "subtask_id": "EnterIdGoogleSSOSubtask"
                                },
                                "style": "brand",
                                "icon": {
                                    "icon": "logo_google_g_color"
                                },
                                "preferred_size": "normal"
                            }
                        }
                    },
                    {
                        "value_type": "button",
                        "value_identifier": "apple_sso_button",
                        "value_data": {
                            "button": {
                                "navigation_link": {
                                    "link_type": "subtask",
                                    "link_id": "apple_id",
                                    "label": "Continue with Apple",
                                    "subtask_id": "EnterIdAppleSSOSubtask"
                                },
                                "style": "brand",
                                "icon": {
                                    "icon": "logo_apple"
                                },
                                "preferred_size": "normal"
                            }
                        }
                    },
                    {
                        "value_type": "separator",
                        "value_identifier": "separator",
                        "value_data": {
                            "separator": {
                                "label": {
                                    "text": "or",
                                    "entities": []
                                }
                            }
                        }
                    },
                    {
                        "value_type": "text_field",
                        "value_identifier": "user_identifier",
                        "value_data": {
                            "text_field": {
                                "content_type": "text",
                                "hint_text": "Phone, email, or username"
                            }
                        }
                    },
                    {
                        "value_type": "button",
                        "value_identifier": "next_button",
                        "value_data": {
                            "button": {
                                "navigation_link": {
                                    "link_type": "task",
                                    "link_id": "next_link",
                                    "label": "Next"
                                },
                                "style": "primary",
                                "preferred_size": "normal"
                            }
                        }
                    },
                    {
                        "value_type": "button",
                        "value_identifier": "forgot_password",
                        "value_data": {
                            "button": {
                                "navigation_link": {
                                    "link_type": "subtask",
                                    "link_id": "forget_password",
                                    "label": "Forgot password?",
                                    "subtask_id": "RedirectToPasswordReset"
                                },
                                "style": "secondary",
                                "preferred_size": "normal"
                            }
                        }
                    }
                ],
                "detail_text": {
                    "text": "Don't have an account? Sign up",
                    "entities": [
                        {
                            "from_index": 23,
                            "to_index": 30,
                            "navigation_link": {
                                "link_type": "deep_link_and_abort",
                                "link_id": "signup_deep_link",
                                "url": "https://twitter.com/i/flow/signup"
                            }
                        }
                    ]
                },
                "style": "step",
                "header": {
                    "primary_text": {
                        "text": "Sign in to Twitter",
                        "entities": []
                    }
                },
                "navigation_style": "hide",
                "horizontal_style": "compact"
            },
            "subtask_back_navigation": "cancel_flow"
        },
        {
            "subtask_id": "EnterIdGoogleSSOSubtask",
            "single_sign_on": {
                "provider": "google",
                "scopes": [
                    "openid",
                    "email",
                    "profile"
                ],
                "state": "j28nFz5x2qeOetxXP7RpW4hldQFpYIKWoEkFqBPDJqh",
                "next_link": {
                    "link_type": "task",
                    "link_id": "next_link"
                },
                "fail_link": {
                    "link_type": "subtask",
                    "link_id": "fail_link",
                    "subtask_id": "LoginEnterUserIdentifierSSO"
                },
                "cancel_link": {
                    "link_type": "subtask",
                    "link_id": "cancel_link",
                    "subtask_id": "LoginEnterUserIdentifierSSO"
                }
            },
            "subtask_back_navigation": "cancel_flow"
        },
        {
            "subtask_id": "EnterIdAppleSSOSubtask",
            "single_sign_on": {
                "provider": "apple",
                "scopes": [
                    "email",
                    "name"
                ],
                "state": "TPt3CJRXQfJaN3tjB3QPEi_FS_WtsOlj68qfoeTGmx4",
                "next_link": {
                    "link_type": "task",
                    "link_id": "next_link"
                },
                "fail_link": {
                    "link_type": "subtask",
                    "link_id": "fail_link",
                    "subtask_id": "LoginEnterUserIdentifierSSO"
                },
                "cancel_link": {
                    "link_type": "subtask",
                    "link_id": "cancel_link",
                    "subtask_id": "LoginEnterUserIdentifierSSO"
                }
            },
            "subtask_back_navigation": "cancel_flow"
        },
        {
            "subtask_id": "RedirectToPasswordReset",
            "open_link": {
                "link": {
                    "link_type": "deep_link_and_abort",
                    "link_id": "password_reset_deep_link",
                    "url": "https://twitter.com/i/flow/password_reset?input_flow_data=%7B%22requested_variant%22%3A%22eyJwbGF0Zm9ybSI6IlJ3ZWIifQ%3D%3D%22%7D"
                }
            }
        }
    ]
}

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

One trick that always helps when reverse engineering something popular: searching for unique strings in Google and other search engines turns up other people's notes.

Some thing this time, AccountDuplicationCheck_false pointed me at https://github.com/fa0311/TwitterFrontendFlow and https://github.com/tsukumijima/tweepy-authlib, which look like very detailed implementations of the login flow I could reference.

Part of the challenge is that Twitter sometimes shows stull like "confirm your email" when it feels suspicious activity, but I can't reliably reproduce and test such behavior. Looking at other people's code helps find such cases before they randomly break at some future point.

GitHub - fa0311/TwitterFrontendFlow: Unofficial Client for Twitter Internal API GitHub

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

I think the goal for today to get a functional OPML generator for the following list, at least as a CLI tool. It's pretty late, but how hard can it be?

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

Another weirdness of Twitter's API: it alludes to be graphql (like, in the URL: https://twitter.com/i/api/graphql/q4cKckK0lNxWkHfAXXXzJQ/Following), but I don't think it actually is?

Until now I haven't had to use GraphQL for anything, so I could be wrong, but it's nothing like http://graphql.org/learn describes. Maybe they use GraphQL no the backend to generate responses for those APIs? But then I thought the whole point was to let the client make their own queries.

TBH, Twitter's API is one of the odder ones I've seen. It seems to be built around the paradigm of the backend telling the frontend what to do. Like, fetching timeline isn't just "give me a list of tweets after X", but the backend sending you instructions what to add and what to remove from the timeline.

Introduction to GraphQL | GraphQL graphql.org

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

Just to illustrate my point for how bizarre Twitter API is, where is an example of the https://twitter.com/i/api/graphql/q4cKckK0lNxWkHfAXXXzJQ/Following response. Just look how complex this thing is for the purpose of showing a list of users.

I understand stuff like "historical reasons" and for a system as big as Twitter they definitely must be at play, but I'd really be curious to learn those reasons...

A deeply nested JSON object of bizarrness.

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

Proof of concept works, I successfully exported my twitter following list into rss, using nitter.net as the rss source.

Did it take more time than if I'd done it by hand? Yes.

Did I have more fun than if I'd done it by hand? Also yes.

Are folks interested in a web/opensource version? If there is demand, I may take some time to clean up and release it.

A screenshot of RSS reader full of my twitter feed content.

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

In the mean time I think I worked out why Twitter API is the way it is:

Twitter is about displaying lists of stuff. Timeline is a list of stuff. Hence we can reuse timeline component to display about anything, including users, tweets, notifications, etc.
The strange instruction-driven style of the API is probably because they wanted to offload as much behavior logic to the backend, where it can be reused between platforms. If only there was a way to render web pages on a server no matter what platform the client is on...

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

I found https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/rebuild_twitter_public_api_2020, which pretty much confirms my guess. Twitter’s API is only GraphQL under the hood, so that they can easily make up new endpoints for their use cases, but for third parties it’s pretty much the same deal as before, except with much messier data structures. Which is probably why they never really made it public.

Rebuilding Twitter’s public API blog.twitter.com

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

The latest round of AI turf wars reminded me that I haven't finished this project, so that's gonna be my weekend then. My OPML exporter seems to be working fine, sine it actually authenticates, so all I really need is to build a minimal feed-to-rss logic and a frontend for it.

https://github.com/zedeus/nitter/issues/919 has a fair few interesting pointer (and a whole lot of moaning and groaning).

Here's a collection of links I might need later:

https://gist.github.com/KohnoseLami/580d0f2d7f1784e9352649260d921df9 — twitter's official app API keys. Seem old-ish, but probably still valid. (As usual searching for the key stings in google turns up a lot of fun stuff)
https://github.com/4cq2/mech/blob/main/twitter/oauth.go — example how those are used to authenticate.
https://pkg.go.dev/github.com/sasarinomari/twitter-auth#section-readme — another library for authenticating with leaked keys.
https://github.com/zedeus/nitter/compare/master...PrivacyDevel:nitter:master — nitter fork that supports authentication to some degree

Twitter (un)official Consumer Key Gist

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

Trying to implement fetching and parsing tweets is a bit of a headache due to their convoluted data schema. One thing that's handy is http://github.com/ChimeraCoder/gojson, which can generate Go struct definitions from a json example. I tried several tools like that and this one has the advantage of being able to "merge" schemas of different elements in an array, instead of just using the first entry.

For example, for something like:

[
  {id: "123", text: "hi"},
  {id: "123", video: "http://blah"}
]

Many tools would generate a Go type like:

type Foo []struct{
  Id string `json:"id"`
  Text string `json:"text"`
}

... which isn't able to adequately represent the second item. gojson generates the following instead:

type Foo []struct{
  Id string `json:"id"`
  Text string `json:"text"`
  Video string `json:"video"`  // Union of fields from all entries!
}

For the convoluted mess that Twitter API is, this makes a world of difference.

GitHub - ChimeraCoder/gojson: Automatically generate Go (golang) struct definitions from example JSON GitHub

Nevkontakte

@me@m.nevkontakte.com

in reply to this object

All that said, after a couple of hours of carefully untangling and documenting the auto-generated schema by hand, I figured that's not the best way of spending time.

Luckily for me, it seems like somebody has already done a lot of heavy lifting: https://github.com/fa0311/twitter-openapi. I don't want the auto-generated openapi client, but I could use the data model.

Annoyingly, it seems like the official openapi Go generator is a bit fiddly and I couldn't get it to generate only the parts I needed. https://github.com/deepmap/oapi-codegen appears to be a fairly popular alternative, but it apparently didn't handle cross-file references very well and was missing a bunch of important type definitions.

In the end, I cobbled together this recipe:

$ go install github.com/deepmap/oapi-codegen/cmd/oapi-codegen@latest
$ npx @redocly/cli bundle https://raw.githubusercontent.com/fa0311/twitter-openapi/main/dist/docs/openapi-3.0.yaml &gt; twitter-openapi.yaml
$ npx @redocly/cli build-docs twitter-openapi.yaml -o twitter-openapi.html
$ oapi-codegen -generate=types,skip-prune -o twitter.gen.go -package=protocol twitter-openapi.yaml

The output seems to be good enough for my purposes and, what's nice, is just a single file.

GitHub - deepmap/oapi-codegen: Generate Go client and server boilerplate from OpenAPI 3 specifications GitHub