[python] 병아리의 파이썬 웹스크래핑 사용법

웹스크래핑....
정말 오랜만이다...

파이썬으로 웹스크래핑은 처음해본다

아자 - !

기본 사용법

from bs4 import BeautifulSoup

# 읽으려는 파일 이름으로 ㄱㄱ
with open("website.html", encoding= 'utf-8') as file :
    contents = file.read()

soup = BeautifulSoup(contents, 'html.parser')

# 보기 편하게 들여쓰기
# print(soup.prettify())

# 모든 앵커 태크를 불러올 수 있음
all_a = soup.findAll(name = "a")

for tag in all_a :
    # 태그말고 내용만 보기
    print(tag.getText())

    # 하이퍼링크만 보기
    print(tag.get("href"))

# 특정 태그의 특정 아이디를 불러올 수 있음
heading = soup.find(name = "h1", id = "name")
print(heading)

# 특정 태그의 특정 클래스를 불러올 수 있음
section_heading = soup.find(name="h3", class_="heading")
print(section_heading.getText())

# 첫번째 일치하는 값 가져옴 p 태그 내 a
company_url = soup.select_one("p a")
print(company_url)

# css 선택기 이용해 클래스 기준으로 요소 선택
name = soup.select_one(selector="#name")
print(name)

names = soup.select(".heading")
print(names)

웹 스크래핑으로 기사 제목, 링크, 좋아요수 가져오기

from bs4 import BeautifulSoup
import requests

response = requests.get("https://news.ycombinator.com/news")
yc_web_page = response.text

soup = BeautifulSoup(yc_web_page, "html.parser")
body = soup.find(name='body')
titleline = body.findAll(name='span', class_='titleline')
article_texts = []
article_links = []

upvote = body.findAll(name='span' , class_ ="score")
article_upvotes = []

for tag in titleline :
    article_texts.append(tag.find(name ="a").getText())
    article_links.append(tag.find(name= "a").get("href"))

article_upvotes = [int(tag.getText().split()[0]) for tag in upvote]


# 가장 추천수 많은 기사의 인덱스 추출
lar_num = max(article_upvotes)
lar_index = article_upvotes.index(lar_num)


# print(article_texts)
# print(article_links)
# print(article_upvotes)

'Python 🎧' 카테고리의 다른 글

[python] pixela api 사용하여 해빗트래커 기록하기 !! - api/requests/http헤더 (1)	2024.11.12
[python] 파이썬 환경변수 설정하여 불러오기 - os (1)	2024.11.03
[python] 파이썬 타입 힌트 (3)	2024.10.29
[python/HTML] unescape 언이스케이핑 하는법 (2)	2024.10.29
[python] 파이썬 칸예 명언 어플리케이션 - API/requests/thinker (3)	2024.10.28

쉽게만 살아가도 재미있어 빙고

[python] 병아리의 파이썬 웹스크래핑 사용법 - request/beautifulsoup

'Python 🎧' 카테고리의 다른 글

티스토리툴바

[python] 병아리의 파이썬 웹스크래핑 사용법 - request/beautifulsoup

'Python 🎧' 카테고리의 다른 글

'Python 🎧' Related Articles

티스토리툴바