Streaming¶
Stream
allows filtering and sampling of realtime Tweets using
Twitter’s API.
Streams utilize Streaming HTTP protocol to deliver data through an open, streaming API connection. Rather than delivering data in batches through repeated requests by your client app, as might be expected from a REST API, a single connection is opened between your app and the API, with new results being sent through that connection whenever new matches occur. This results in a low-latency delivery mechanism that can support very high throughput. For futher information, see https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data
Using Stream
¶
To use Stream
, an instance of it needs to be initialized with Twitter
API credentials (Consumer Key, Consumer Secret, Access Token, Access Token
Secret):
import tweepy
stream = tweepy.Stream(
"Consumer Key here", "Consumer Secret here",
"Access Token here", "Access Token Secret here"
)
Then, Stream.filter()
or Stream.sample()
can be used to connect to
and run a stream:
stream.filter(track=["Tweepy"])
Data received from the stream is passed to Stream.on_data()
. This method
handles sending the data to other methods based on the message type. For
example, if a Tweet is received from the stream, the raw data is sent to
Stream.on_data()
, which constructs a Status
object and passes it
to Stream.on_status()
. By default, the other methods, besides
Stream.on_data()
, that receive the data from the stream, simply log the
data received, with the logging level dependent on the type of the data.
To customize the processing of the stream data, Stream
needs to be
subclassed. For example, to print the IDs of every Tweet received:
class IDPrinter(tweepy.Stream):
def on_status(self, status):
print(status.id)
printer = IDPrinter(
"Consumer Key here", "Consumer Secret here",
"Access Token here", "Access Token Secret here"
)
printer.sample()
Threading¶
Both Stream.filter()
and Stream.sample()
have a threaded
parameter. When set to True
, the stream will run in a separate thread,
which is returned by the call to either method. For example:
thread = stream.filter(follow=[1072250532645998596], threaded=True)
Handling Errors¶
Stream
has multiple methods to handle errors during streaming.
Stream.on_closed()
is called when the stream is closed by Twitter.
Stream.on_connection_error()
is called when the stream encounters a
connection error. Stream.on_request_error()
is called when an error is
encountered while trying to connect to the stream. When these errors are
encountered and max_retries
, which defaults to infinite, hasn’t been
exceeded yet, the Stream
instance will attempt to reconnect the stream
after an appropriate amount of time. By default, all three of these methods log
an error. To customize that handling, they can be overriden in a subclass:
class ConnectionTester(tweepy.Stream):
def on_connection_error(self):
self.disconnect()
Stream.on_request_error()
is also passed the HTTP status code that was
encountered. The HTTP status codes reference for the Twitter API can be found
at https://developer.twitter.com/en/support/twitter-api/error-troubleshooting.
Stream.on_exception()
is called when an unhandled exception occurs. This
is fatal to the stream, and by default, an exception is logged.