스트리밍
스트림은 스트리밍 HTTP 프로토콜을 이용하며, 스트리밍 API 연결을 통해데이터를 전송합니다. REST API에서 볼 수 있듯 클라이언트 앱과 API가 매 요청마다 다량의 데이터를 전송하는 대신, 앱과 API가 단일 연결을 구성하고 새 요청이 발생할 때마다 해당 요청과 그에 대한 결과만을 주고받습니다. 이와 같은 방식을 이용하면 높은 처리량을 요하는 작업을 낮은 지연시간으로 수행할 수 있습니다. 더 자세한 정보는 다음 URL을 참고하세요. https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data
Stream
allows filtering and
sampling of realtime Tweets using Twitter API v1.1.
StreamingClient
allows filtering and
sampling of realtime Tweets using Twitter API v2.
Stream
클래스 사용하기
Stream
클래스를 사용하려면, 해당 인스턴스가 트위터 API 자격증명으로 사전에 초기화되어 있어야 합니다. (Consumer Key, Consumer Secret, Access Token, Access Token Secret):
import tweepy
stream = tweepy.Stream(
"Consumer Key here", "Consumer Secret here",
"Access Token here", "Access Token Secret here"
)
그러면, Stream.filter()
메서드나 Stream.sample()
메서드를 사용해, 스트림에 연결하거나 스트림을 실행할 수 있습니다:
stream.filter(track=["Tweepy"])
Data received from the stream is passed to Stream.on_data()
. This method
handles sending the data to other methods based on the message type. For
example, if a Tweet is received from the stream, the raw data is sent to
Stream.on_data()
, which constructs a Status
object and passes it
to Stream.on_status()
. By default, the other methods, besides
Stream.on_data()
, that receive the data from the stream, simply log the
data received, with the logging level dependent on the
type of the data.
스트림 데이터의 처리를 사용자 정의하려면, Stream
클래스를 기반으로 서브클래스(상속과 같음)를 만들어야 합니다. 예로, 받아온 트윗들의 ID를 모두 출력하려면 다음과 같은 소스코드를 사용하면 됩니다:
class IDPrinter(tweepy.Stream):
def on_status(self, status):
print(status.id)
printer = IDPrinter(
"Consumer Key here", "Consumer Secret here",
"Access Token here", "Access Token Secret here"
)
printer.sample()
Using StreamingClient
To use StreamingClient
, an instance of it needs to be initialized with
a Twitter API Bearer Token:
import tweepy
streaming_client = tweepy.StreamingClient("Bearer Token here")
Then, StreamingClient.sample()
can be used to connect to and run a
sampling stream:
streaming_client.sample()
Or StreamingClient.add_rules()
can be used to add rules before using
StreamingClient.filter()
to connect to and run a filtered stream:
streaming_client.add_rules(tweepy.StreamRule("Tweepy"))
streaming_client.filter()
StreamingClient.get_rules()
can be used to retrieve existing rules and
StreamingClient.delete_rules()
can be used to delete rules.
To learn how build rules, refer to the Twitter API Building rules for filtered stream documentation.
Data received from the stream is passed to StreamingClient.on_data()
.
This method handles sending the data to other methods. Tweets recieved are sent
to StreamingClient.on_tweet()
, includes
data are sent to
StreamingClient.on_includes()
, errors are sent to
StreamingClient.on_errors()
, and matching rules are sent to
StreamingClient.on_matching_rules()
. A StreamResponse
instance
containing all four fields is sent to StreamingClient.on_response()
. By
default, only StreamingClient.on_response()
logs the data received, at
the DEBUG
logging level.
To customize the processing of the stream data, StreamingClient
needs to be
subclassed. For example, to print the IDs of every Tweet received:
class IDPrinter(tweepy.StreamingClient):
def on_tweet(self, tweet):
print(tweet.id)
printer = IDPrinter("Bearer Token here")
printer.sample()
스레딩
Stream.filter()
, Stream.sample()
, StreamingClient.filter()
,
and StreamingClient.sample()
all have a threaded
parameter. When set
to True
, the stream will run in a separate
thread, which is returned by the call to the
method. For example:
thread = stream.filter(follow=[1072250532645998596], threaded=True)
or:
thread = streaming_client.sample(threaded=True)
오류(Error) 처리
Both Stream
and StreamingClient
have multiple methods to
handle errors during streaming.
Stream.on_closed()
/ StreamingClient.on_closed()
is called when the
stream is closed by Twitter.
Stream.on_connection_error()
/
StreamingClient.on_connection_error()
is called when the stream
encounters a connection error.
Stream.on_request_error()
/ StreamingClient.on_request_error()
is
called when an error is encountered while trying to connect to the stream.
When these errors are encountered and max_retries
, which defaults to
infinite, hasn’t been exceeded yet, the Stream
/
StreamingClient
instance will attempt to reconnect the stream after an
appropriate amount of time. By default, both versions of all three of these
methods log an error. To customize that handling, they can be overridden in a
subclass:
class ConnectionTester(tweepy.Stream):
def on_connection_error(self):
self.disconnect()
class ConnectionTester(tweepy.StreamingClient):
def on_connection_error(self):
self.disconnect()
Stream.on_request_error()
/ StreamingClient.on_request_error()
is
also passed the HTTP status code that was encountered. The HTTP status codes
reference for the Twitter API can be found at
https://developer.twitter.com/en/support/twitter-api/error-troubleshooting.
Stream.on_exception()
/ StreamingClient.on_exception()
is called
when an unhandled exception occurs. This is fatal to the stream, and by
default, an exception is logged.