Open Knowledge is Our Main Pillar › Forums › Natural Language Processing – Text Mining › Social Media Analytics (SMA)
- This topic has 3 replies, 2 voices, and was last updated 2 months, 4 weeks ago by
Taufik Sutanto.
-
AuthorPosts
-
-
25/10/2020 at 20:48 #6802
Nurul Jannah
ParticipantSaya Nurul Jannah, izin bertanya.
Saya mencoba melakukan streaming data twitter dengan beberapa keyword spesifik dan bermaksud langsung memasukkannya ke database elasticsearch. Namun setelah di run beberapa saat muncul error ‘ChunkedEncodingError: (‘Connection broken: IncompleteRead(0 bytes read, 1 more expected)’, IncompleteRead(0 bytes read, 1 more expected))‘. Bagaimana solusinya ya? Dan apakah data yang sudah terambil dan masuk ke database sebelum muncul error dapat/boleh digunakan untuk penelitian?
Adapun code yang saya gunakan merupakan code pada modul sma_01 (dan sedikit modifikasi) atau pun sebagai berikut:
import warnings; warnings.simplefilter('ignore') from elasticsearch import Elasticsearch as Es server, port, timeout = 'localhost', 9200, 30 # local host = 172.27.0.1 try: conEs = Es( [ {'host':server,'port':port,'timeout':timeout} ], verify_certs=True) if conEs.ping(): print('Connected to ElasticSearch, koneksi = "conEs"') else: raise ValueError("Error 01, tidak bisa terkoneksi ke ElasticSearch. Yakinkan server ES sudah berjalan dengan baik dan port serta ip server benar") except: print('Error 02, tidak bisa terkoneksi ke ElasticSearch. Yakinkan server Es sudah berjalan dengan baik dan port serta ip server benar') def loadKeys(file='twitter_API.txt'): file = open(file, 'r', encoding="utf-8", errors='replace') keys = file.readlines() file.close() keys = [k.strip() for k in keys] return keys Ck, Cs, At, As = loadKeys() 'Done' from twython import Twython try: twitter = Twython(Ck, Cs, At, As) user = twitter.verify_credentials() print('Welcome "%s" you are now connected to twitter server' %user['name']) except: print("Connection failed, please check your API keys or connection") from twython import TwythonStreamer def streamToElastic(topicS, lang): class MyStreamer(TwythonStreamer): def on_success(self, data): global count count+=1 D = {"created_at":data['created_at'], "username":data['user']['screen_name'], "tweet":data['text'], "id":data['id'], "hashtags":data['entities']['hashtags'], "user_mentions":data['entities']['user_mentions'], "fav_count":data['user']['favourites_count'], "statuses_count": data['user']['statuses_count'], "followers_count":data['user']['followers_count'], "friends_count": data['user']['friends_count'], "favourites_count": data['user']['favourites_count'], "verified": data['user']['verified'], "statuses_count": data['user']['statuses_count'], "retweet_count": data['retweet_count'], "favorite_count": data['favorite_count']} conEs.index(index="tokped_tweet", body=D) if count==maxTweet: print('\nFinished streaming %.0f tweets' %(maxTweet)); self.disconnect() def on_error(self, status_code, data): print('Error Status = %s' %status_code); self.disconnect() while count<maxTweet: stream = MyStreamer(Ck, Cs, At, As) stream.statuses.filter(track=topicS) maxTweet, count = 5000, 0 lang = set(['id','en']) topicS = ['XXXX', 'XXXXXX', '@XXXXXX', '#XXXXXX'] streamToElastic(topicS, lang)
Mohon solusinya. Terima kasih.
-
This topic was modified 4 months, 1 week ago by
Nurul Jannah.
-
This topic was modified 4 months, 1 week ago by
-
31/10/2020 at 10:46 #6814
Taufik Sutanto
KeymasterSepertinya kesalahan terjadi bukan di elasticsearch-nya … tapi setting di streamer. Request tidak bisa melepas semua resources connection-nya saat response ditutup. Silahkan coba solusi disini: https://stackoverflow.com/questions/26638329/incompleteread-error-when-retrieving-twitter-data-using-python
-
01/12/2020 at 21:40 #6967
Nurul Jannah
ParticipantSebelumnya terima kasih pak atas jawabannya. Saya punya pertanyaan lain yang terkait pak.
Perihal load data hasil streaming yang disimpan dengan format json kemudian dibuat dataframenya. Saat saya coba buat dataframenya muncul errorKeyError Traceback (most recent call last) <ipython-input-5-ffdca58b7d32> in <module> 1 tweets = pd.DataFrame() 2 ----> 3 tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data)) 4 tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data)) 5 tweets['created_at'] = list(map(lambda tweet: tweet['created_at'], tweets_data)) <ipython-input-5-ffdca58b7d32> in <lambda>(tweet) 1 tweets = pd.DataFrame() 2 ----> 3 tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data)) 4 tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data)) 5 tweets['created_at'] = list(map(lambda tweet: tweet['created_at'], tweets_data)) KeyError: 'text'
Padahal sebelumnya sudah saya pastikan bahwa key yang saya masukan terdapat di data yang saya miliki. Berikut code yang saya gunakan.
tweets_data = [] def choose(x): path = "C:/Users/.txt" tweets_file = open(path,"r") for line in tweets_file: try: tweet = json.loads(line) tweets_data.append(tweet) except: continue data = 'Full' choose(data) print('Loading dataset...') tweet = tweets_data[0] print(tweet.keys()) tweets = pd.DataFrame() tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data)) tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data)) tweets['created_at'] = list(map(lambda tweet: tweet['created_at'], tweets_data)) tweets['favorite_count'] = list(map(lambda tweet: tweet['favorite_count'], tweets_data)) tweets['retweet_count'] = list(map(lambda tweet: tweet['retweet_count'], tweets_data)) tweets['mentioned_user'] = list(map(lambda tweet: tweet['entities']['user_mentions'] if tweet['entities'] != None else None, tweets_data)) tweets['hashtags'] = list(map(lambda tweet: tweet ['entities']['hashtags'] if tweet['entities'] != None else None, tweets_data)) tweets['screen_name'] = list(map(lambda tweet: tweet['user']['screen_name'] if tweet['user'] != None else None, tweets_data)) tweets['name'] = list(map(lambda tweet: tweet['user']['name'] if tweet['user'] != None else None, tweets_data)) tweets['followers_count'] = list(map(lambda tweet: tweet['user']['followers_count'] if tweet['user'] != None else None, tweets_data)) tweets['friends_count'] = list(map(lambda tweet: tweet['user']['friends_count'] if tweet['user'] != None else None, tweets_data)) tweets['statuses_count'] = list(map(lambda tweet: tweet['user']['statuses_count'] if tweet['user'] != None else None, tweets_data)) print('Jumlah data tweets: ', len(tweets)) tweets.head()
-
This reply was modified 3 months ago by
Nurul Jannah.
-
This reply was modified 3 months ago by
Nurul Jannah.
-
This reply was modified 3 months ago by
-
06/12/2020 at 10:34 #6975
Taufik Sutanto
KeymasterSilahkan jalankan perintah berikut:
print(tweets.columns)
Most likely column “text” tidak ada di dataframe tersebut.
-
This reply was modified 2 months, 4 weeks ago by
Taufik Sutanto.
-
This reply was modified 2 months, 4 weeks ago by
-
-
-
AuthorPosts
- You must be logged in to reply to this topic.