So i'm writing a script who parses sites and write retrieved data on a csv file.
here is the code:from bs4 import BeautifulSoup
import requests
import csv
r= requests.get('http://www.mediadata.it/en/aziende-comunicatori/elenco/{}/')
data = r.text
soup = BeautifulSoup(data, "html.parser")
with open('mbsmediadata.csv', 'w') as csvfile:
fieldnames=['nome', 'responsabili', 'email','posizione']
writer=csv.DictWriter(csvfile, fieldnames=fieldnames)
for i,j,z,y in zip(soup.find_all('h5',attrs={'class': 'ng-binding'})):
writer.writeheader()
writer.writerow({'nome':i.text,'responsabili':j.text,'email':z.text,'posizione':y.text})
but the format is shit tier. i've tried reading many documentations and previous questions but even if the .format() don't give syntax errors it don't format at all.
second issue is that i am committed to write fieldnames in every row, and google sheets import only those fieldnames.
do you know how to figure out the solution?
pic related, it's the shitty format output
>>62279848
this is the retarded way of doing it
read the page source and find the actual data source. the webpage is angular so there is obviously some sort of rest endpoint providing the data. find the endpoint scrape the endpoint
>>62279893
i wrote the wronge code bro:from bs4 import BeautifulSoup
import requests
import csv
r = requests.get('https://www.paginegialle.it/ricerca/pizzerie/Milano?mr=50')
data = r.text
soup = BeautifulSoup(data,"html.parser")
with open('mbsprprova.csv', 'w') as csvfile:
fieldnames = ['nome','indirizzo','telefono']
writer=csv.DictWriter(csvfile, fieldnames=fieldnames)
for i,j,z in zip(soup.find_all('span', attrs={'itemprop':'name'}),soup.find_all('span', attrs={'class':'street-address'}), soup.find_all('div', attrs={'class':'tel elementPhone'})):
writer.writeheader()
writer.writerow ({'nome':i.text,'telefono':j.text,'indirizzo':z.text})
Here I like your chaining solution but I'm not sure how you will fix the address like that
Pbin ZRfd5Kch
here make something of yourself kiddofrom bs4 import BeautifulSoup
import requests
data = requests.get('https://www.paginegialle.it/ricerca/pizzerie/Milano?mr=50')
soup = BeautifulSoup(data.text,"lxml")
businesses = []
mapping = {
'street-address' : 'address',
'postal-code': 'postcode',
'locality': 'city',
'region': 'state'
}
for i,j,z in zip(soup.find_all('span', attrs={'itemprop':'name'}),soup.find_all('div', attrs={'itemprop':'address'}), soup.find_all('div', attrs={'class':'tel elementPhone'})):
data = {}
data['name']=i.text.strip()
for addressfield in j.find_all('span'):
tomap = str(addressfield.attrs['class'][0])
data[mapping[tomap]] = addressfield.text.strip()
data['telephones'] = z.text.strip().split(',')
map(lambda x: x.strip(),data['telephones'])
# print(z.text)
print(data)
businesses.append(data)