在奥地利,我们主要有 willhaben.at、immowelt.at、immobilienscout24.at,这些都没有公开可用的 API。
[DE] Eine Kindheit wie im Bilderbuch。 Am Wochenende aufs Radl schwingen und direct von zu Hause die Natur entdecken, alte Donau, Lobau, Donauinsel, alles ums Eck. Blumig auch die Straßennamen:Zinienweg、Fuchsienweg、Palargonienweg、Oleanderweg、Azaleengasse、Ginsterweg 和 AGAVENWEG …。 Duftiger Geht 的wohl nicht。
[CN] 体验如画般的童年!想象一下周末轻松地骑上自行车从家门口探索大自然的奇观。迷人的老多瑙河仅一箭之遥,冒险总是触手可及。就连街道名称也充满了花香:Zinienweg、Fuchsienweg、Oleanderweg、Azaleengasse、Ginsterweg 和精致的 AGAVENWEG……你能想象更芬芳、更田园诗般的环境吗?
计算到 POI 的距离
- 幼儿园 <500 m
- 学校 <1,500 m
- 超市 <1,000 m
- 面包店 <2,500 m
奥地利 99% 的房源没有任何可用的地址信息,甚至没有街道名称。您可以想象,在一个村庄内,与在小街上相比,在主要街道上的噪音水平和交通将会有巨大的差异。根据完整列表,不可能找到该信息。
住在维也纳,但要寻找大约 45 分钟路程外的地块,这使得安排看房成为一项挑战。即使我们集中预约,这个过程仍然耗时且压力很大。在许多情况下,只要看到村庄往往就足以消除很多问题:高速公路噪音、明显的电线或陡峭的斜坡都可以立即排除它。
- 支持不同的视图:Excel视图、看板视图、地图视图
- 有结构化数据进行过滤和排序,并进行基本计算
- 能够附加图像和 PDF
- 能够为每个列表添加注释
- 能够管理每个列表的状态(已查看、感兴趣、已访问等)
- 可以和我的未婚妻分享吗
- 使其在旅途中易于使用(用于乘客座椅)
一个简单的 Telegram 机器人
每当我们收到每封电子邮件的新列表时,我们都会手动检查每个列表,并对整体氛围、价格和村庄位置进行第一次检查。只有当我们真正感兴趣时,我们才会将其添加到我们的 Airtable 中。
因此,我编写了一个简单的 Telegram 机器人,我们可以向其发送列表链接,它会为我们处理该链接。
保留列表副本的最简单、最直接的方法是使用无头浏览器访问列表的描述及其图像。为此,我简单地使用了ferrum Ruby gem,但任何类似的技术都可以使用。首先,我们打开页面并准备网站截图:
browser = Ferrum :: Browser . new browser . goto ( "https://immowelt.at/expose/123456789" ) # Open the listing # Prepare the website: Depending on the page, you might want to remove some elements to see the full content if browser . current_url . include? ( "immowelt.at" ) browser . execute ( "document.getElementById('usercentrics-root').remove()" ) rescue nil browser . execute ( "document.querySelectorAll('.link--read-more').forEach(function(el) { el.click() })" ) # Press all links with the class ".link--read-more", trigger via js as it doesn't work with the driver elsif browser . current_url . include? ( "immobilienscout24.at" ) browser . execute ( "document.querySelectorAll('button').forEach(function(el) { if (el.innerText.includes('Beschreibung lesen')) { el.click() } })" ) end
网站准备就绪后,我们只需截取整个页面的屏幕截图,并保存 HTML 以便稍后访问:
# Take a screenshot of the full page browser . screenshot ( path: screenshot_path , full: true ) # Save the HTML to have access later File . write ( "listing.html" , browser . body ) # Find all images referenced on the page all_images = image_links = browser . css ( "img" ). map do | img | { name: img [ "alt" ], src: img [ "src" ] } end # The above `all_images` will contain a lot of non-relevant images, such as logos, etc. # Below some messy code to get rid of the majority image_links = image_links . select do | node | node [ :src ]. start_with? ( "http" ) && ! node [ :src ]. include? ( ".svg" ) && ! node [ :src ]. include? ( "facebook.com" ) end
重要提示:对于我们真正感兴趣的列表,所有数据处理都是根据具体情况手动完成的。我们在几个月内处理了 3 个不同网站的总共 55 个列表,从未参与自动抓取或违反任何平台的服务条款。
generic_context = "You are helping a customer search for a property. The customer has shown you a listing for a property they want to buy. You want to help them find the most important information about this property. For each bullet point, please use the specified JSON key. Please answer the following questions:" prompts = [ "title: The title of the listing" , "price: How much does this property cost? Please only provide the number, without any currency or other symbols." , "size: The total plot area (Gesamtgrundfläche) of the property in m². If multiple areas are provided, please specify '-1'." , "building_size: The buildable area or developable area—or the building site—in m². If percentages for buildability are mentioned, please provide those. If no information is available, please provide '-1'." , "address: The address, or the street + locality. Please format it in the customary Austrian way. If no exact street or street number is available, please only provide the locality." , "other_fees: Any additional fees or costs (excluding broker's fees) that arise either upon purchase or afterward. Please answer in text form. If no information is available, please respond with an empty string ''." , "connected: Is the property already connected (for example, electricity, water, road)? If no information is available, please respond with an empty string ''." , "noise: Please describe how quiet or how loud the property is. Additionally, please mention if the property is located on a cul-de-sac. If no details are provided, please use an empty string ''. Please use the exact wording from the advertisement." , "accessible: Please reproduce, word-for-word, how the listing describes the accessibility of the property. Include information on how well public facilities can be reached, whether by public transport, by car, or on foot. If available, please include the distance to the nearest bus or train station." , "nature: Please describe whether the property is near nature—whether there is a forest or green space nearby, or if it is located in a development, etc. If no information is available, respond with an empty string ''." , "orientation: Please describe the orientation of the property. Is it facing south, north, east, west, or a combination? If no information is available, respond with an empty string ''." , "slope: Please describe whether the property is situated on a slope or is flat. If it is on a slope, please include details on how steep it is. If no information is available, respond with an empty string ''." , "existingBuilding: Please describe whether there is an existing old building on the property. If there is, please include details. If no information is available, respond with an empty string ''." , "summary: A summary of this property's advertisement in bullet points. Please include all important and relevant information that would help a buyer make a decision, specifically regarding price, other costs, zoning, building restrictions, any old building, a location description, public transport accessibility, proximity to Vienna, neighborhood information, advantages or special features, and other standout aspects. Do not mention any brokerage commission or broker's fee. Provide the information as a bullet-point list. If there is no information about a specific topic, please omit that bullet point entirely. Never say 'not specified' or 'not mentioned' or anything similar. Please do not use Markdown." ]
现在我们需要列表的全文。 ferrum
gem 具有很大的魔力,可以轻松访问文本,而无需自己解析 HTML。
full_text = browser . at_css ( "body" ). text
剩下的就是实际访问 OpenAI API(或类似的)来获取问题的答案:
ai_responses = ai . ask ( prompts: prompts , context: full_text )
为了将生成的列表上传到 Airtable,我使用了airrecord gem。
create_hash = { "Title" => ai_responses [ "title" ], "Price" => ai_responses [ "price" ]. to_i , "Noise" => ai_responses [ "noise" ], "URL" => browser . url , "Summary" => ( "- " + Array ( ai_responses [ "summary" ]). join ( " \n - " )) } new_entry = MyEntry . create ( create_hash )
对于屏幕截图,您需要首先下载一些额外的样板代码,然后将图像上传到临时 S3 存储桶,然后使用 Airtable API 上传到 Airtable。
您可以在下面看到 Airtable(德语)中结构精美的数据,其中已包括公共交通时间:
如果列表中有地图,房地产经纪人通常会主动模糊任何街道名称或其他指示符。可能没有好的自动化方法可以做到这一点。由于该项目的目的只是实际解析我已经感兴趣的列表,因此我总共只有 55 个列表需要手动查找地址。
事实证明,对于大约 80% 的列表,我能够使用以下方法之一找到确切的地址:
变体 A:使用geoland.at
变体 B:通过分析道路和河流的角度
我唯一的方向就是河流。这个村子有几条河流,但只有两条河流的走向大致如图所示。所以我手动穿过这些河流,看看河流的形状与地图相匹配,以及中心的浅绿色背景和外部的灰色。大约 30 分钟后,我找到了确切的地点(左:列表,右:我的地图)
方式 C:向房地产经纪人索取地址
计算到 POI 距离的方法
手动输入地址后,Ruby 脚本就会获取该信息,并使用 Google Maps API 计算到达预定义地点列表的通勤时间。这部分代码主要是与 API 交互并解析其响应的样板。
我能够解决的一个关键问题是“前往火车站”部分。在大多数情况下,我们希望能够乘坐公共交通,但对于 Google 地图来说,这是一个“全有或全无”的问题,例如,您要么在整个路线上使用公共交通,要么不使用。
下面的代码显示了我能够实现此目的的简单方法。我很清楚这可能并不适用于所有情况,但它对我使用它的所有 55 个地方都很有效。
if mode == "transit" # For all routes calculated for public transit, first extract the "walking to the train station" part # In the above screenshot, this would be 30mins and 2.3km res [ :walking_to_closest_station_time_seconds ] = data [ "routes" ][ 0 ][ "legs" ][ 0 ][ "steps" ][ 0 ][ "duration" ][ "value" ] res [ :walking_to_closest_station_distance_meters ] = data [ "routes" ][ 0 ][ "legs" ][ 0 ][ "steps" ][ 0 ][ "distance" ][ "value" ] # Get the start and end location of the walking part start_location = data [ "routes" ][ 0 ][ "legs" ][ 0 ][ "steps" ][ 0 ][ "start_location" ] end_location = data [ "routes" ][ 0 ][ "legs" ][ 0 ][ "steps" ][ 0 ][ "end_location" ] # Now calculate the driving distance to the nearest station res [ :drive_to_nearest_station_duration_seconds ] = self . calculate_commute_duration ( from: " #{ start_location [ "lat" ] } , #{ start_location [ "lng" ] } " , to: " #{ end_location [ "lat" ] } , #{ end_location [ "lng" ] } " , mode: "driving" )[ :total_duration_seconds ] end
一旦我们列出了大约 15 个我们感兴趣的拍品,我们就计划用一天的时间参观所有这些拍品。因为我们有确切的地址,所以不需要预约。
为了找到最有效的路线,我使用了RouteXL 。您可以上传您需要访问的地址列表,并定义精确的规则,它会计算出最(燃料和时间)有效的路线,您可以直接导入到谷歌地图进行导航。
在开车前往下一站时,我的未婚妻阅读了 Airtable 应用程序中的摘要注释,因此当我们到达时,我们已经知道了该拍品的价格、描述、尺寸和其他特征。
这种方法为我们节省了大量时间。我们到达后可以立即排除大约 75% 的拍品。有时,附近有一条吵闹的道路、一个陡峭的斜坡、一条电线、一个吵闹的工厂,或者最重要的是:感觉不太对劲。当你站在很多人面前时,共鸣会产生巨大的差异。
我们始终尊重财产边界 – 站在地块前面,在该地区周围走动一下,以获得非常清晰的图片就足够了。
在 3 天的驾车旅行中亲自查看了 42 个地块后,我们找到了最适合我们的地块,并联系了房地产经纪人进行了适当的查看。我们立即知道这是合适的,并会见了业主,并在几周后签署了合同。